History log of /linux-master/net/netfilter/ipvs/ip_vs_sched.c
Revision Date Author Comments
# 62931f59 19-Oct-2019 Davide Caratti <dcaratti@redhat.com>

ipvs: don't ignore errors in case refcounting ip_vs module fails

if the IPVS module is removed while the sync daemon is starting, there is
a small gap where try_module_get() might fail getting the refcount inside
ip_vs_use_count_inc(). Then, the refcounts of IPVS module are unbalanced,
and the subsequent call to stop_sync_thread() causes the following splat:

WARNING: CPU: 0 PID: 4013 at kernel/module.c:1146 module_put.part.44+0x15b/0x290
Modules linked in: ip_vs(-) nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 veth ip6table_filter ip6_tables iptable_filter binfmt_misc intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul ext4 mbcache jbd2 ghash_clmulni_intel snd_hda_codec_generic ledtrig_audio snd_hda_intel snd_intel_nhlt snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm aesni_intel crypto_simd cryptd glue_helper joydev pcspkr snd_timer virtio_balloon snd soundcore i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c ata_generic pata_acpi virtio_net net_failover virtio_blk failover virtio_console qxl drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ata_piix ttm crc32c_intel serio_raw drm virtio_pci libata virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod [last unloaded: nf_defrag_ipv6]
CPU: 0 PID: 4013 Comm: modprobe Tainted: G W 5.4.0-rc1.upstream+ #741
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
RIP: 0010:module_put.part.44+0x15b/0x290
Code: 04 25 28 00 00 00 0f 85 18 01 00 00 48 83 c4 68 5b 5d 41 5c 41 5d 41 5e 41 5f c3 89 44 24 28 83 e8 01 89 c5 0f 89 57 ff ff ff <0f> 0b e9 78 ff ff ff 65 8b 1d 67 83 26 4a 89 db be 08 00 00 00 48
RSP: 0018:ffff888050607c78 EFLAGS: 00010297
RAX: 0000000000000003 RBX: ffffffffc1420590 RCX: ffffffffb5db0ef9
RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffffffffc1420590
RBP: 00000000ffffffff R08: fffffbfff82840b3 R09: fffffbfff82840b3
R10: 0000000000000001 R11: fffffbfff82840b2 R12: 1ffff1100a0c0f90
R13: ffffffffc1420200 R14: ffff88804f533300 R15: ffff88804f533ca0
FS: 00007f8ea9720740(0000) GS:ffff888053800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f3245abe000 CR3: 000000004c28a006 CR4: 00000000001606f0
Call Trace:
stop_sync_thread+0x3a3/0x7c0 [ip_vs]
ip_vs_sync_net_cleanup+0x13/0x50 [ip_vs]
ops_exit_list.isra.5+0x94/0x140
unregister_pernet_operations+0x29d/0x460
unregister_pernet_device+0x26/0x60
ip_vs_cleanup+0x11/0x38 [ip_vs]
__x64_sys_delete_module+0x2d5/0x400
do_syscall_64+0xa5/0x4e0
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x7f8ea8bf0db7
Code: 73 01 c3 48 8b 0d b9 80 2c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 89 80 2c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcd38d2fe8 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
RAX: ffffffffffffffda RBX: 0000000002436240 RCX: 00007f8ea8bf0db7
RDX: 0000000000000000 RSI: 0000000000000800 RDI: 00000000024362a8
RBP: 0000000000000000 R08: 00007f8ea8eba060 R09: 00007f8ea8c658a0
R10: 00007ffcd38d2a60 R11: 0000000000000206 R12: 0000000000000000
R13: 0000000000000001 R14: 00000000024362a8 R15: 0000000000000000
irq event stamp: 4538
hardirqs last enabled at (4537): [<ffffffffb6193dde>] quarantine_put+0x9e/0x170
hardirqs last disabled at (4538): [<ffffffffb5a0556a>] trace_hardirqs_off_thunk+0x1a/0x20
softirqs last enabled at (4522): [<ffffffffb6f8ebe9>] sk_common_release+0x169/0x2d0
softirqs last disabled at (4520): [<ffffffffb6f8eb3e>] sk_common_release+0xbe/0x2d0

Check the return value of ip_vs_use_count_inc() and let its caller return
proper error. Inside do_ip_vs_set_ctl() the module is already refcounted,
we don't need refcount/derefcount there. Finally, in register_ip_vs_app()
and start_sync_thread(), take the module refcount earlier and ensure it's
released in the error path.

Change since v1:
- better return values in case of failure of ip_vs_use_count_inc(),
thanks to Julian Anastasov
- no need to increase/decrease the module refcount in ip_vs_set_ctl(),
thanks to Julian Anastasov

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>


# 2874c5fd 27-May-2019 Thomas Gleixner <tglx@linutronix.de>

treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152

Based on 1 normalized pattern(s):

this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license as published by
the free software foundation either version 2 of the license or at
your option any later version

extracted by the scancode license scanner the SPDX license identifier

GPL-2.0-or-later

has been chosen to replace the boilerplate/reference in 3029 file(s).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Allison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# 0d6ef068 10-Jul-2015 Markus Elfring <elfring@users.sourceforge.net>

ipvs: Delete an unnecessary check before the function call "module_put"

The module_put() function tests whether its argument is NULL and then
returns immediately. Thus the test around the call is not needed.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 05f00505 29-Jun-2015 Julian Anastasov <ja@ssi.bg>

ipvs: fix crash if scheduler is changed

I overlooked the svc->sched_data usage from schedulers
when the services were converted to RCU in 3.10. Now
the rare ipvsadm -E command can change the scheduler
but due to the reverse order of ip_vs_bind_scheduler
and ip_vs_unbind_scheduler we provide new sched_data
to the old scheduler resulting in a crash.

To fix it without changing the scheduler methods we
have to use synchronize_rcu() only for the editing case.
It means all svc->scheduler readers should expect a
NULL value. To avoid breakage for the service listing
and ipvsadm -R we can use the "none" name to indicate
that scheduler is not assigned, a state when we drop
new connections.

Reported-by: Alexander Vasiliev <a.vasylev@404-group.com>
Fixes: ceec4c381681 ("ipvs: convert services to rcu")
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>


# 982f4051 18-Nov-2014 Markus Elfring <elfring@users.sourceforge.net>

netfilter: Deletion of unnecessary checks before two function calls

The functions free_percpu() and module_put() test whether their argument
is NULL and then return immediately. Thus the test around the call is
not needed.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Acked-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# ceec4c38 22-Mar-2013 Julian Anastasov <ja@ssi.bg>

ipvs: convert services to rcu

This is the final step in RCU conversion.

Things that are removed:

- svc->usecnt: now svc is accessed under RCU read lock
- svc->inc: and some unused code
- ip_vs_bind_pe and ip_vs_unbind_pe: no ability to replace PE
- __ip_vs_svc_lock: replaced with RCU
- IP_VS_WAIT_WHILE: now readers lookup svcs and dests under
RCU and work in parallel with configuration

Other changes:

- before now, a RCU read-side critical section included the
calling of the schedule method, now it is extended to include
service lookup
- ip_vs_svc_table and ip_vs_svc_fwm_table are now using hlist
- svc->pe and svc->scheduler remain to the end (of grace period),
the schedulers are prepared for such RCU readers
even after done_service is called but they need
to use synchronize_rcu because last ip_vs_scheduler_put
can happen while RCU read-side critical sections
use an outdated svc->scheduler pointer
- as planned, update_service is removed
- empty services can be freed immediately after grace period.
If dests were present, the services are freed from
the dest trash code

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>


# ed3ffc4e 22-Mar-2013 Julian Anastasov <ja@ssi.bg>

ipvs: do not expect result from done_service

This method releases the scheduler state,
it can not fail. Such change will help to properly
replace the scheduler in following patch.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>


# 71dfa982 22-Mar-2013 Julian Anastasov <ja@ssi.bg>

ipvs: change ip_vs_sched_lock to mutex

The global list with schedulers ip_vs_schedulers
is accessed only from user context - configuration and
scheduler module [un]registration. Use ip_vs_sched_mutex
instead.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>


# 120b9c14 26-Sep-2012 Jesper Dangaard Brouer <brouer@redhat.com>

ipvs: Trivial changes, use compressed IPv6 address in output

Have not converted the proc file output to compressed IPv6 addresses.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>


# 41ac51ee 11-Feb-2011 Patrick Schaaf <netdev@bof.de>

ipvs: make "no destination available" message more informative

When IP_VS schedulers do not find a destination, they output a terse
"WLC: no destination available" message through kernel syslog, which I
can not only make sense of because syslog puts them in a logfile
together with keepalived checker results.

This patch makes the output a bit more informative, by telling you which
virtual service failed to find a destination.

Example output:

kernel: [1539214.552233] IPVS: wlc: TCP 192.168.8.30:22 - no destination available
kernel: [1539299.674418] IPVS: wlc: FWM 22 0x00000016 - no destination available

I have tested the code for IPv4 and FWM services, as you can see from
the example; I do not have an IPv6 setup to test the third code path
with.

To avoid code duplication, I put a new function ip_vs_scheduler_err()
into ip_vs_sched.c, and use that from the schedulers instead of calling
IP_VS_ERR_RL directly.

Signed-off-by: Patrick Schaaf <netdev@bof.de>
Signed-off-by: Simon Horman <horms@verge.net.au>


# 2fabf35b 22-Aug-2010 Simon Horman <horms@verge.net.au>

IPVS: ip_vs_{un,}bind_scheduler NULL arguments

In general NULL arguments aren't passed by the few callers that exist,
so don't test for them.

The exception is to make passing NULL to ip_vs_unbind_scheduler() a noop.

Signed-off-by: Simon Horman <horms@verge.net.au>
Acked-by: Julian Anastasov <ja@ssi.bg>


# 6e08bfb8 22-Aug-2010 Simon Horman <horms@verge.net.au>

IPVS: Allow null argument to ip_vs_scheduler_put()

This simplifies caller logic sightly.

Signed-off-by: Simon Horman <horms@verge.net.au>
Acked-by: Julian Anastasov <ja@ssi.bg>


# bd144550 25-Aug-2010 Simon Horman <horms@verge.net.au>

IPVS: convert __ip_vs_sched_lock to a spinlock

Also rename __ip_vs_sched_lock to ip_vs_sched_lock.

Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 1e3e238e 02-Aug-2009 Hannes Eder <heder@google.com>

IPVS: use pr_err and friends instead of IP_VS_ERR and friends

Since pr_err and friends are used instead of printk there is no point
in keeping IP_VS_ERR and friends. Furthermore make use of '__func__'
instead of hard coded function names.

Signed-off-by: Hannes Eder <heder@google.com>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 9aada7ac 30-Jul-2009 Hannes Eder <heder@google.com>

IPVS: use pr_fmt

While being at it cleanup whitespace.

Signed-off-by: Hannes Eder <heder@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# cb7f6a7b 18-Sep-2008 Julius Volz <juliusv@google.com>

IPVS: Move IPVS to net/netfilter/ipvs

Since IPVS now has partial IPv6 support, this patch moves IPVS from
net/ipv4/ipvs to net/netfilter/ipvs. It's a result of:

$ git mv net/ipv4/ipvs net/netfilter

and adapting the relevant Kconfigs/Makefiles to the new path.

Signed-off-by: Julius Volz <juliusv@google.com>
Signed-off-by: Simon Horman <horms@verge.net.au>