#
0bb11a37 |
|
01-Feb-2024 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Maintain real-time response in rcu_tasks_postscan() The current code will scan the entirety of each per-CPU list of exiting tasks in ->rtp_exit_list with interrupts disabled. This is normally just fine, because each CPU typically won't have very many tasks in this state. However, if a large number of tasks block late in do_exit(), these lists could be arbitrarily long. Low probability, perhaps, but it really could happen. This commit therefore occasionally re-enables interrupts while traversing these lists, inserting a dummy element to hold the current place in the list. In kernels built with CONFIG_PREEMPT_RT=y, this re-enabling happens after each list element is processed, otherwise every one-to-two jiffies. [ paulmck: Apply Frederic Weisbecker feedback. ] Link: https://lore.kernel.org/all/ZdeI_-RfdLR8jlsm@localhost.localdomain/ Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Sebastian Siewior <bigeasy@linutronix.de> Cc: Anna-Maria Behnsen <anna-maria@linutronix.de> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
|
#
1612160b |
|
02-Feb-2024 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Eliminate deadlocks involving do_exit() and RCU tasks Holding a mutex across synchronize_rcu_tasks() and acquiring that same mutex in code called from do_exit() after its call to exit_tasks_rcu_start() but before its call to exit_tasks_rcu_stop() results in deadlock. This is by design, because tasks that are far enough into do_exit() are no longer present on the tasks list, making it a bit difficult for RCU Tasks to find them, let alone wait on them to do a voluntary context switch. However, such deadlocks are becoming more frequent. In addition, lockdep currently does not detect such deadlocks and they can be difficult to reproduce. In addition, if a task voluntarily context switches during that time (for example, if it blocks acquiring a mutex), then this task is in an RCU Tasks quiescent state. And with some adjustments, RCU Tasks could just as well take advantage of that fact. This commit therefore eliminates these deadlock by replacing the SRCU-based wait for do_exit() completion with per-CPU lists of tasks currently exiting. A given task will be on one of these per-CPU lists for the same period of time that this task would previously have been in the previous SRCU read-side critical section. These lists enable RCU Tasks to find the tasks that have already been removed from the tasks list, but that must nevertheless be waited upon. The RCU Tasks grace period gathers any of these do_exit() tasks that it must wait on, and adds them to the list of holdouts. Per-CPU locking and get_task_struct() are used to synchronize addition to and removal from these lists. Link: https://lore.kernel.org/all/20240118021842.290665-1-chenzhongjin@huawei.com/ Reported-by: Chen Zhongjin <chenzhongjin@huawei.com> Reported-by: Yang Jihong <yangjihong1@huawei.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Tested-by: Yang Jihong <yangjihong1@huawei.com> Tested-by: Chen Zhongjin <chenzhongjin@huawei.com> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
|
#
6b70399f |
|
02-Feb-2024 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Maintain lists to eliminate RCU-tasks/do_exit() deadlocks This commit continues the elimination of deadlocks involving do_exit() and RCU tasks by causing exit_tasks_rcu_start() to add the current task to a per-CPU list and causing exit_tasks_rcu_stop() to remove the current task from whatever list it is on. These lists will be used to track tasks that are exiting, while still accounting for any RCU-tasks quiescent states that these tasks pass though. [ paulmck: Apply Frederic Weisbecker feedback. ] Link: https://lore.kernel.org/all/20240118021842.290665-1-chenzhongjin@huawei.com/ Reported-by: Chen Zhongjin <chenzhongjin@huawei.com> Reported-by: Yang Jihong <yangjihong1@huawei.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Tested-by: Yang Jihong <yangjihong1@huawei.com> Tested-by: Chen Zhongjin <chenzhongjin@huawei.com> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
|
#
46faf9d8 |
|
05-Feb-2024 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Initialize data to eliminate RCU-tasks/do_exit() deadlocks Holding a mutex across synchronize_rcu_tasks() and acquiring that same mutex in code called from do_exit() after its call to exit_tasks_rcu_start() but before its call to exit_tasks_rcu_stop() results in deadlock. This is by design, because tasks that are far enough into do_exit() are no longer present on the tasks list, making it a bit difficult for RCU Tasks to find them, let alone wait on them to do a voluntary context switch. However, such deadlocks are becoming more frequent. In addition, lockdep currently does not detect such deadlocks and they can be difficult to reproduce. In addition, if a task voluntarily context switches during that time (for example, if it blocks acquiring a mutex), then this task is in an RCU Tasks quiescent state. And with some adjustments, RCU Tasks could just as well take advantage of that fact. This commit therefore initializes the data structures that will be needed to rely on these quiescent states and to eliminate these deadlocks. Link: https://lore.kernel.org/all/20240118021842.290665-1-chenzhongjin@huawei.com/ Reported-by: Chen Zhongjin <chenzhongjin@huawei.com> Reported-by: Yang Jihong <yangjihong1@huawei.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Tested-by: Yang Jihong <yangjihong1@huawei.com> Tested-by: Chen Zhongjin <chenzhongjin@huawei.com> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
|
#
30ef0963 |
|
22-Feb-2024 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Initialize callback lists at rcu_init() time In order for RCU Tasks to reliably maintain per-CPU lists of exiting tasks, those lists must be initialized before it is possible for tasks to exit, especially given that the boot CPU is not necessarily CPU 0 (an example being, powerpc kexec() kernels). And at the time that rcu_init_tasks_generic() is called, a task could potentially exit, unconventional though that sort of thing might be. This commit therefore moves the calls to cblist_init_generic() from functions called from rcu_init_tasks_generic() to a new function named tasks_cblist_init_generic() that is invoked from rcu_init(). This constituted a bug in a commit that never went to mainline, so there is no need for any backporting to -stable. Reported-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
|
#
bfe93930 |
|
05-Feb-2024 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Add data to eliminate RCU-tasks/do_exit() deadlocks Holding a mutex across synchronize_rcu_tasks() and acquiring that same mutex in code called from do_exit() after its call to exit_tasks_rcu_start() but before its call to exit_tasks_rcu_stop() results in deadlock. This is by design, because tasks that are far enough into do_exit() are no longer present on the tasks list, making it a bit difficult for RCU Tasks to find them, let alone wait on them to do a voluntary context switch. However, such deadlocks are becoming more frequent. In addition, lockdep currently does not detect such deadlocks and they can be difficult to reproduce. In addition, if a task voluntarily context switches during that time (for example, if it blocks acquiring a mutex), then this task is in an RCU Tasks quiescent state. And with some adjustments, RCU Tasks could just as well take advantage of that fact. This commit therefore adds the data structures that will be needed to rely on these quiescent states and to eliminate these deadlocks. Link: https://lore.kernel.org/all/20240118021842.290665-1-chenzhongjin@huawei.com/ Reported-by: Chen Zhongjin <chenzhongjin@huawei.com> Reported-by: Yang Jihong <yangjihong1@huawei.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Tested-by: Yang Jihong <yangjihong1@huawei.com> Tested-by: Chen Zhongjin <chenzhongjin@huawei.com> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
|
#
18966f7b |
|
11-Oct-2023 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Mark RCU Tasks accesses to current->rcu_tasks_idle_cpu The task_struct structure's ->rcu_tasks_idle_cpu can be concurrently read and written from the RCU Tasks grace-period kthread and from the CPU on which the task_struct structure's task is running. This commit therefore marks the accesses appropriately. Reported-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Neeraj Upadhyay (AMD) <neeraj.iitr10@gmail.com>
|
#
a80712b9 |
|
27-Oct-2023 |
Frederic Weisbecker <frederic@kernel.org> |
rcu/tasks-trace: Handle new PF_IDLE semantics The commit: cff9b2332ab7 ("kernel/sched: Modify initial boot task idle setup") has changed the semantics of what is to be considered an idle task in such a way that the idle task of an offline CPU may not carry the PF_IDLE flag anymore. However RCU-tasks-trace tests the opposite assertion, still assuming that idle tasks carry the PF_IDLE flag during their whole lifecycle. Remove this assumption to avoid spurious warnings but keep the initial test verifying that the idle task is the current task on any offline CPU. Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> Fixes: cff9b2332ab7 ("kernel/sched: Modify initial boot task idle setup") Suggested-by: Joel Fernandes <joel@joelfernandes.org> Suggested-by: Paul E . McKenney" <paulmck@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
|
#
9715ed50 |
|
27-Oct-2023 |
Frederic Weisbecker <frederic@kernel.org> |
rcu/tasks: Handle new PF_IDLE semantics The commit: cff9b2332ab7 ("kernel/sched: Modify initial boot task idle setup") has changed the semantics of what is to be considered an idle task in such a way that CPU boot code preceding the actual idle loop is excluded from it. This has however introduced new potential RCU-tasks stalls when either: 1) Grace period is started before init/0 had a chance to set PF_IDLE, keeping it stuck in the holdout list until idle ever schedules. 2) Grace period is started when some possible CPUs have never been online, keeping their idle tasks stuck in the holdout list until the CPU ever boots up. 3) Similar to 1) but with secondary CPUs: Grace period is started concurrently with secondary CPU booting, putting its idle task in the holdout list because PF_IDLE isn't yet observed on it. It stays then stuck in the holdout list until that CPU ever schedules. The effect is mitigated here by the hotplug AP thread that must run to bring the CPU up. Fix this with handling the new semantics of PF_IDLE, keeping in mind that it may or may not be set on an idle task. Take advantage of that to strengthen the coverage of an RCU-tasks quiescent state within an idle task, excluding the CPU boot code from it. Only the code running within the idle loop is now a quiescent state, along with offline CPUs. Fixes: cff9b2332ab7 ("kernel/sched: Modify initial boot task idle setup") Suggested-by: Joel Fernandes <joel@joelfernandes.org> Suggested-by: Paul E . McKenney" <paulmck@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
|
#
2cbc482d |
|
04-Aug-2023 |
Zhen Lei <thunder.leizhen@huawei.com> |
rcu: Dump memory object info if callback function is invalid When a structure containing an RCU callback rhp is (incorrectly) freed and reallocated after rhp is passed to call_rcu(), it is not unusual for rhp->func to be set to NULL. This defeats the debugging prints used by __call_rcu_common() in kernels built with CONFIG_DEBUG_OBJECTS_RCU_HEAD=y, which expect to identify the offending code using the identity of this function. And in kernels build without CONFIG_DEBUG_OBJECTS_RCU_HEAD=y, things are even worse, as can be seen from this splat: Unable to handle kernel NULL pointer dereference at virtual address 0 ... ... PC is at 0x0 LR is at rcu_do_batch+0x1c0/0x3b8 ... ... (rcu_do_batch) from (rcu_core+0x1d4/0x284) (rcu_core) from (__do_softirq+0x24c/0x344) (__do_softirq) from (__irq_exit_rcu+0x64/0x108) (__irq_exit_rcu) from (irq_exit+0x8/0x10) (irq_exit) from (__handle_domain_irq+0x74/0x9c) (__handle_domain_irq) from (gic_handle_irq+0x8c/0x98) (gic_handle_irq) from (__irq_svc+0x5c/0x94) (__irq_svc) from (arch_cpu_idle+0x20/0x3c) (arch_cpu_idle) from (default_idle_call+0x4c/0x78) (default_idle_call) from (do_idle+0xf8/0x150) (do_idle) from (cpu_startup_entry+0x18/0x20) (cpu_startup_entry) from (0xc01530) This commit therefore adds calls to mem_dump_obj(rhp) to output some information, for example: slab kmalloc-256 start ffff410c45019900 pointer offset 0 size 256 This provides the rough size of the memory block and the offset of the rcu_head structure, which as least provides at least a few clues to help locate the problem. If the problem is reproducible, additional slab debugging can be enabled, for example, CONFIG_DEBUG_SLAB=y, which can provide significantly more information. Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
|
#
0325e8a1 |
|
03-Aug-2023 |
Jiapeng Chong <jiapeng.chong@linux.alibaba.com> |
rcu-tasks: Make rcu_tasks_lazy_ms static The rcu_tasks_lazy_ms variable is not used outside the file tasks.h, so this commit marks it static. kernel/rcu/tasks.h:1085:5: warning: symbol 'rcu_tasks_lazy_ms' was not declared. Should it be static? Reported-by: Abaci Robot <abaci@linux.alibaba.com> Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=6086 Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
|
#
e62d8ae4 |
|
02-Aug-2023 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Pull sampling of ->percpu_dequeue_lim out of loop The rcu_tasks_need_gpcb() samples ->percpu_dequeue_lim as part of the condition clause of a "for" loop, which is a bit confusing. This commit therefore hoists this sampling out of the loop, using the result loaded in the condition clause. So why does this work in the face of a concurrent switch from single-CPU queueing to per-CPU queueing? o The call_rcu_tasks_generic() that makes the change has already enqueued its callback, which means that all of the other CPU's callback queues are empty. o For the call_rcu_tasks_generic() that first notices the switch to per-CPU queues, the smp_store_release() used to update ->percpu_enqueue_lim pairs with the raw_spin_trylock_rcu_node()'s full barrier that is between the READ_ONCE(rtp->percpu_enqueue_shift) and the rcu_segcblist_enqueue() that enqueues the callback. o Because this CPU's queue is empty (unless it happens to be the original single queue, in which case there is no need for synchronization), this call_rcu_tasks_generic() will do an irq_work_queue() to schedule a handler for the needed rcuwait_wake_up() call. This call will be ordered after the first call_rcu_tasks_generic() function's change to ->percpu_dequeue_lim. o This rcuwait_wake_up() will either happen before or after the set_current_state() in rcuwait_wait_event(). If it happens before, the "condition" argument's call to rcu_tasks_need_gpcb() will be ordered after the original change, and all callbacks on all CPUs will be visible. Otherwise, if it happens after, then the grace-period kthread's state will be set back to running, which will result in a later call to rcuwait_wait_event() and thus to rcu_tasks_need_gpcb(), which will again see the change. So it all works out. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
|
#
92a708dc |
|
27-Jul-2023 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Add printk()s to localize boot-time self-test hang Currently, rcu_tasks_initiate_self_tests() prints a message and then initiates self tests on up to three different RCU Tasks flavors. If one of the flavors has a grace-period hang, it is not easy to work out which of the three hung. This commit therefore prints a message prior to each individual test. Reported-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
|
#
9d0cce2b |
|
01-Aug-2023 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Fix boot-time RCU tasks debug-only deadlock In kernels built with CONFIG_PROVE_RCU=y (for example, lockdep kernels), the following sequence of events can occur: o rcu_init_tasks_generic() is invoked just before init is spawned. It invokes rcu_spawn_tasks_kthread() and friends. o rcu_spawn_tasks_kthread() invokes rcu_spawn_tasks_kthread_generic(), which uses kthread_run() to create the needed kthread. o Control returns to rcu_init_tasks_generic(), which, because this is a CONFIG_PROVE_RCU=y kernel, invokes the version of the rcu_tasks_initiate_self_tests() function that actually does something, including invoking synchronize_rcu_tasks(), which in turn invokes synchronize_rcu_tasks_generic(). o synchronize_rcu_tasks_generic() sees that the ->kthread_ptr is still NULL, because the newly spawned kthread has not yet started. o The new kthread starts, preempting synchronize_rcu_tasks_generic() just after its check. This kthread invokes rcu_tasks_one_gp(), which acquires ->tasks_gp_mutex, and, seeing no work, blocks in rcuwait_wait_event(). Note that this step requires either a preemptible kernel or a fault-injection-style sleep at the beginning of mutex_lock(). o synchronize_rcu_tasks_generic() resumes and invokes rcu_tasks_one_gp(). o rcu_tasks_one_gp() attempts to acquire ->tasks_gp_mutex, which is still held by the newly spawned kthread's rcu_tasks_one_gp() function. Deadlock. Because the only reason for ->tasks_gp_mutex is to handle pre-kthread synchronous grace periods, this commit avoids this deadlock by having rcu_tasks_one_gp() momentarily release ->tasks_gp_mutex while invoking rcuwait_wait_event(). This allows the call to rcu_tasks_one_gp() from synchronize_rcu_tasks_generic() proceed. Note that it is not necessary to release the mutex anywhere else in rcu_tasks_one_gp() because rcuwait_wait_event() is the only function that can block indefinitely. Reported-by: Guenter Roeck <linux@roeck-us.net> Reported-by: Roy Hopkins <rhopkins@suse.de> Reported-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Tested-by: Roy Hopkins <rhopkins@suse.de>
|
#
cb88f7f5 |
|
28-Jul-2023 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Permit use of debug-objects with RCU Tasks flavors Currently, cblist_init_generic() holds a raw spinlock when invoking INIT_WORK(). This fails in kernels built with CONFIG_DEBUG_OBJECTS=y due to memory allocation being forbidden while holding a raw spinlock. But the only reason for holding the raw spinlock is to synchronize with early boot calls to call_rcu_tasks(), call_rcu_tasks_rude, and, last but not least, call_rcu_tasks_trace(). These calls also invoke cblist_init_generic() in order to support early boot queueing of callbacks. Except that there are no early boot calls to either of these three functions, and the BPF guys confirm that they have no plans to add any such calls. This commit therefore removes the synchronization and adds a WARN_ON_ONCE() to catch the case of now-prohibited early boot RCU Tasks callback queueing. If early boot queueing is needed, an "initialized" flag may be added to the rcu_tasks structure. Then queueing a callback before this flag is set would initialize the callback list (if needed) and queue the callback. The decision as to where to queue the callback given the possibility of non-zero boot CPUs is left as an exercise for the reader. Reported-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
a15ec57c |
|
03-Jun-2023 |
Paul E. McKenney <paulmck@kernel.org> |
rcuscale: Add RCU Tasks Rude testing Add a "tasks-rude" option to the rcuscale.scale_type module parameter. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
271a8467 |
|
02-Jun-2023 |
Paul E. McKenney <paulmck@kernel.org> |
rcuscale: Measure RCU Tasks Trace grace-period kthread CPU time This commit causes RCU Tasks Trace to output the CPU time consumed by its grace-period kthread. The CPU time is whatever is in the designated task's current->stime field, and thus is controlled by whatever CPU-time accounting scheme is in effect. This output appears in microseconds as follows on the console: rcu_scale: Grace-period kthread CPU time: 42367.037 [ paulmck: Apply Willy Tarreau feedback. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
5f8e3202 |
|
17-May-2023 |
Paul E. McKenney <paulmck@kernel.org> |
rcuscale: Measure grace-period kthread CPU time This commit adds the ability to output the CPU time consumed by the grace-period kthread for the RCU variant under test. The CPU time is whatever is in the designated task's current->stime field, and thus is controlled by whatever CPU-time accounting scheme is in effect. This output appears in microseconds as follows on the console: rcu_scale: Grace-period kthread CPU time: 42367.037 [ paulmck: Apply feedback from Stephen Rothwell and kernel test robot. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Tested-by: Yujie Liu <yujie.liu@intel.com>
|
#
db13710a |
|
26-Jun-2023 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Cancel callback laziness if too many callbacks The various RCU Tasks flavors now do lazy grace periods when there are only asynchronous grace period requests. By default, the system will let 250 milliseconds elapse after the first call_rcu_tasks*() callbacki is queued before starting a grace period. In contrast, synchronous grace period requests such as synchronize_rcu_tasks*() will start a grace period immediately. However, invoking one of the call_rcu_tasks*() functions in a too-tight loop can result in a callback flood, which in turn can exhaust memory if grace periods are delayed for too long. This commit therefore sets a limit so that the grace-period kthread will be awakened when any CPU's callback list expands to contain rcupdate.rcu_task_lazy_lim callbacks elements (defaulting to 32, set to -1 to disable), the grace-period kthread will be awakened, thus cancelling any ongoing laziness and getting out in front of the potential callback flood. Cc: Alexei Starovoitov <ast@kernel.org> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
450d461a |
|
17-May-2023 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Add kernel boot parameters for callback laziness This commit adds kernel boot parameters for callback laziness, allowing the RCU Tasks flavors to be individually adjusted. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
5ae769c6 |
|
17-May-2023 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Remove redundant #ifdef CONFIG_TASKS_RCU The kernel/rcu/tasks.h file has a #endif immediately followed by an Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
d119357d |
|
14-May-2023 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Treat only synchronous grace periods urgently The performance requirements on RCU Tasks, and in particular on RCU Tasks Trace, have evolved over time as the workloads have evolved. The current implementation is designed to provide low grace-period latencies, and also to accommodate short-duration floods of callbacks. However, current workloads can also provide a constant background callback-queuing rate of a few hundred call_rcu_tasks_trace() invocations per second. This results in continuous back-to-back RCU Tasks Trace grace periods, which in turn can consume the better part of 10% of a CPU. One could take the attitude that there are several tens of other CPUs on the systems running such workloads, but energy efficiency is a thing. On these systems, although asynchronous grace-period requests happen every few milliseconds, synchronous grace-period requests are quite rare. This commit therefore arrranges for grace periods to be initiated immediately in response to calls to synchronize_rcu_tasks*() and also to calls to synchronize_rcu_mult() that are passed one of the call_rcu_tasks*() functions. These are recognized by the tell-tale wakeme_after_rcu callback function. In other cases, callbacks are gathered up for up to about 250 milliseconds before a grace period is initiated. This results in more than an order of magnitude reduction in RCU Tasks Trace grace periods, with corresponding reduction in consumption of CPU time. Reported-by: Alexei Starovoitov <ast@kernel.org> Reported-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
401b0de3 |
|
26-Apr-2023 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Stop rcu_tasks_invoke_cbs() from using never-onlined CPUs The rcu_tasks_invoke_cbs() function relies on queue_work_on() to silently fall back to WORK_CPU_UNBOUND when the specified CPU is offline. However, the queue_work_on() function's silent fallback mechanism relies on that CPU having been online at some time in the past. When queue_work_on() is passed a CPU that has never been online, workqueue lockups ensue, which can be bad for your kernel's general health and well-being. This commit therefore checks whether a given CPU has ever been online, and, if not substitutes WORK_CPU_UNBOUND in the subsequent call to queue_work_on(). Why not simply omit the queue_work_on() call entirely? Because this function is flooding callback-invocation notifications to all CPUs, and must deal with possibilities that include a sparse cpu_possible_mask. This commit also moves the setting of the rcu_data structure's ->beenonline field to rcu_cpu_starting(), which executes on the incoming CPU before that CPU has ever enabled interrupts. This ensures that the required workqueues are present. In addition, because the incoming CPU has not yet enabled its interrupts, there cannot yet have been any softirq handlers running on this CPU, which means that the WARN_ON_ONCE(!rdp->beenonline) within the RCU_SOFTIRQ handler cannot have triggered yet. Fixes: d363f833c6d88 ("rcu-tasks: Use workqueues for multiple rcu_tasks_invoke_cbs() invocations") Reported-by: Tejun Heo <tj@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
edff5e9a |
|
22-Mar-2023 |
Zqiang <qiang1.zhang@intel.com> |
rcu-tasks: Clarify the cblist_init_generic() function's pr_info() output This commit uses rtp->name instead of __func__ and outputs the value of rcu_task_cb_adjust, thus reducing console-log output. Signed-off-by: Zqiang <qiang1.zhang@intel.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
5fc8cbe4 |
|
02-Aug-2022 |
Shigeru Yoshida <syoshida@redhat.com> |
rcu-tasks: Avoid pr_info() with spin lock in cblist_init_generic() pr_info() is called with rtp->cbs_gbl_lock spin lock locked. Because pr_info() calls printk() that might sleep, this will result in BUG like below: [ 0.206455] cblist_init_generic: Setting adjustable number of callback queues. [ 0.206463] [ 0.206464] ============================= [ 0.206464] [ BUG: Invalid wait context ] [ 0.206465] 5.19.0-00428-g9de1f9c8ca51 #5 Not tainted [ 0.206466] ----------------------------- [ 0.206466] swapper/0/1 is trying to lock: [ 0.206467] ffffffffa0167a58 (&port_lock_key){....}-{3:3}, at: serial8250_console_write+0x327/0x4a0 [ 0.206473] other info that might help us debug this: [ 0.206473] context-{5:5} [ 0.206474] 3 locks held by swapper/0/1: [ 0.206474] #0: ffffffff9eb597e0 (rcu_tasks.cbs_gbl_lock){....}-{2:2}, at: cblist_init_generic.constprop.0+0x14/0x1f0 [ 0.206478] #1: ffffffff9eb579c0 (console_lock){+.+.}-{0:0}, at: _printk+0x63/0x7e [ 0.206482] #2: ffffffff9ea77780 (console_owner){....}-{0:0}, at: console_emit_next_record.constprop.0+0x111/0x330 [ 0.206485] stack backtrace: [ 0.206486] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.19.0-00428-g9de1f9c8ca51 #5 [ 0.206488] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-1.fc36 04/01/2014 [ 0.206489] Call Trace: [ 0.206490] <TASK> [ 0.206491] dump_stack_lvl+0x6a/0x9f [ 0.206493] __lock_acquire.cold+0x2d7/0x2fe [ 0.206496] ? stack_trace_save+0x46/0x70 [ 0.206497] lock_acquire+0xd1/0x2f0 [ 0.206499] ? serial8250_console_write+0x327/0x4a0 [ 0.206500] ? __lock_acquire+0x5c7/0x2720 [ 0.206502] _raw_spin_lock_irqsave+0x3d/0x90 [ 0.206504] ? serial8250_console_write+0x327/0x4a0 [ 0.206506] serial8250_console_write+0x327/0x4a0 [ 0.206508] console_emit_next_record.constprop.0+0x180/0x330 [ 0.206511] console_unlock+0xf7/0x1f0 [ 0.206512] vprintk_emit+0xf7/0x330 [ 0.206514] _printk+0x63/0x7e [ 0.206516] cblist_init_generic.constprop.0.cold+0x24/0x32 [ 0.206518] rcu_init_tasks_generic+0x5/0xd9 [ 0.206522] kernel_init_freeable+0x15b/0x2a2 [ 0.206523] ? rest_init+0x160/0x160 [ 0.206526] kernel_init+0x11/0x120 [ 0.206527] ret_from_fork+0x1f/0x30 [ 0.206530] </TASK> [ 0.207018] cblist_init_generic: Setting shift to 1 and lim to 1. This patch moves pr_info() so that it is called without rtp->cbs_gbl_lock locked. Signed-off-by: Shigeru Yoshida <syoshida@redhat.com> Tested-by: "Zhang, Qiang1" <qiang1.zhang@intel.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
a4533cc0 |
|
11-Jan-2023 |
Neeraj Upadhyay <quic_neeraju@quicinc.com> |
rcu-tasks: Report stalls during synchronize_srcu() in rcu_tasks_postscan() The call to synchronize_srcu() from rcu_tasks_postscan() can be stalled by a task getting stuck in do_exit() between that function's calls to exit_tasks_rcu_start() and exit_tasks_rcu_finish(). To ease diagnosis of this situation, print a stall warning message every rcu_task_stall_info period when rcu_tasks_postscan() is stalled. [ paulmck: Adjust to handle CONFIG_SMP=n. ] Acked-by: Frederic Weisbecker <frederic@kernel.org> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org> Reported-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/rcu/20230111212736.GA1062057@paulmck-ThinkPad-P17-Gen-1/ Signed-off-by: Neeraj Upadhyay <quic_neeraju@quicinc.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
|
#
2b4be548 |
|
19-Mar-2023 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Fix warning for unused tasks_rcu_exit_srcu The tasks_rcu_exit_srcu variable is used only by kernels built with CONFIG_TASKS_RCU=y, but is defined for all kernesl with CONFIG_TASKS_RCU_GENERIC=y. Therefore, in kernels built with CONFIG_TASKS_RCU_GENERIC=y but CONFIG_TASKS_RCU=n, this gives a "defined but not used" warning. This commit therefore moves this variable under CONFIG_TASKS_RCU. Link: https://lore.kernel.org/oe-kbuild-all/202303191536.XzMSyzTl-lkp@intel.com/ Reported-by: kernel test robot <lkp@intel.com> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Tested-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
a4fcfbee |
|
02-Dec-2022 |
Zqiang <qiang1.zhang@intel.com> |
rcu-tasks: Handle queue-shrink/callback-enqueue race condition The rcu_tasks_need_gpcb() determines whether or not: (1) There are callbacks needing another grace period, (2) There are callbacks ready to be invoked, and (3) It would be a good time to shrink back down to a single-CPU callback list. This third case is interesting because some other CPU might be adding new callbacks, which might suddenly make this a very bad time to be shrinking. This is currently handled by requiring call_rcu_tasks_generic() to enqueue callbacks under the protection of rcu_read_lock() and requiring rcu_tasks_need_gpcb() to wait for an RCU grace period to elapse before finalizing the transition. This works well in practice. Unfortunately, the current code assumes that a grace period whose end is detected by the poll_state_synchronize_rcu() in the second "if" condition actually ended before the earlier code counted the callbacks queued on CPUs other than CPU 0 (local variable "ncbsnz"). Given the current code, it is possible that a long-delayed call_rcu_tasks_generic() invocation will queue a callback on a non-zero CPU after these CPUs have had their callbacks counted and zero has been stored to ncbsnz. Such a callback would trigger the WARN_ON_ONCE() in the second "if" statement. To see this, consider the following sequence of events: o CPU 0 invokes rcu_tasks_one_gp(), and counts fewer than rcu_task_collapse_lim callbacks. It sees at least one callback queued on some other CPU, thus setting ncbsnz to a non-zero value. o CPU 1 invokes call_rcu_tasks_generic() and loads 42 from ->percpu_enqueue_lim. It therefore decides to enqueue its callback onto CPU 1's callback list, but is delayed. o CPU 0 sees the rcu_task_cb_adjust is non-zero and that the number of callbacks does not exceed rcu_task_collapse_lim. It therefore checks percpu_enqueue_lim, and sees that its value is greater than the value one. CPU 0 therefore starts the shift back to a single callback list. It sets ->percpu_enqueue_lim to 1, but CPU 1 has already read the old value of 42. It also gets a grace-period state value from get_state_synchronize_rcu(). o CPU 0 sees that ncbsnz is non-zero in its second "if" statement, so it declines to finalize the shrink operation. o CPU 0 again invokes rcu_tasks_one_gp(), and counts fewer than rcu_task_collapse_lim callbacks. It also sees that there are no callback queued on any other CPU, and thus sets ncbsnz to zero. o CPU 1 resumes execution and enqueues its callback onto its own list. This invalidates the value of ncbsnz. o CPU 0 sees the rcu_task_cb_adjust is non-zero and that the number of callbacks does not exceed rcu_task_collapse_lim. It therefore checks percpu_enqueue_lim, but sees that its value is already unity. It therefore does not get a new grace-period state value. o CPU 0 sees that rcu_task_cb_adjust is non-zero, ncbsnz is zero, and that poll_state_synchronize_rcu() says that the grace period has completed. it therefore finalizes the shrink operation, setting ->percpu_dequeue_lim to the value one. o CPU 0 does a debug check, scanning the other CPUs' callback lists. It sees that CPU 1's list has a callback, so it (rightly) triggers the WARN_ON_ONCE(). After all, the new value of ->percpu_dequeue_lim says to not bother looking at CPU 1's callback list, which means that this callback will never be invoked. This can result in hangs and maybe even OOMs. Based on long experience with rcutorture, this is an extremely low-probability race condition, but it really can happen, especially in preemptible kernels or within guest OSes. This commit therefore checks for completion of the grace period before counting callbacks. With this change, in the above failure scenario CPU 0 would know not to prematurely end the shrink operation because the grace period would not have completed before the count operation started. [ paulmck: Adjust grace-period end rather than adding RCU reader. ] [ paulmck: Avoid spurious WARN_ON_ONCE() with ->percpu_dequeue_lim check. ] Signed-off-by: Zqiang <qiang1.zhang@intel.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
ea5c8987 |
|
30-Nov-2022 |
Zqiang <qiang1.zhang@intel.com> |
rcu-tasks: Make rude RCU-Tasks work well with CPU hotplug The synchronize_rcu_tasks_rude() function invokes rcu_tasks_rude_wait_gp() to wait one rude RCU-tasks grace period. The rcu_tasks_rude_wait_gp() function in turn checks if there is only a single online CPU. If so, it will immediately return, because a call to synchronize_rcu_tasks_rude() is by definition a grace period on a single-CPU system. (We could have blocked!) Unfortunately, this check uses num_online_cpus() without synchronization, which can result in too-short grace periods. To see this, consider the following scenario: CPU0 CPU1 (going offline) migration/1 task: cpu_stopper_thread -> take_cpu_down -> _cpu_disable (dec __num_online_cpus) ->cpuhp_invoke_callback preempt_disable access old_data0 task1 del old_data0 ..... synchronize_rcu_tasks_rude() task1 schedule out .... task2 schedule in rcu_tasks_rude_wait_gp() ->__num_online_cpus == 1 ->return .... task1 schedule in ->free old_data0 preempt_enable When CPU1 decrements __num_online_cpus, its value becomes 1. However, CPU1 has not finished going offline, and will take one last trip through the scheduler and the idle loop before it actually stops executing instructions. Because synchronize_rcu_tasks_rude() is mostly used for tracing, and because both the scheduler and the idle loop can be traced, this means that CPU0's prematurely ended grace period might disrupt the tracing on CPU1. Given that this disruption might include CPU1 executing instructions in memory that was just now freed (and maybe reallocated), this is a matter of some concern. This commit therefore removes that problematic single-CPU check from the rcu_tasks_rude_wait_gp() function. This dispenses with the single-CPU optimization, but there is no evidence indicating that this optimization is important. In addition, synchronize_rcu_tasks_generic() contains a similar optimization (albeit only for early boot), which also splats. (As in exactly why are you invoking synchronize_rcu_tasks_rude() so early in boot, anyway???) It is OK for the synchronize_rcu_tasks_rude() function's check to be unsynchronized because the only times that this check can evaluate to true is when there is only a single CPU running with preemption disabled. While in the area, this commit also fixes a minor bug in which a call to synchronize_rcu_tasks_rude() would instead be attributed to synchronize_rcu_tasks(). [ paulmck: Add "synchronize_" prefix and "()" suffix. ] Signed-off-by: Zqiang <qiang1.zhang@intel.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
28319d6d |
|
25-Nov-2022 |
Frederic Weisbecker <frederic@kernel.org> |
rcu-tasks: Fix synchronize_rcu_tasks() VS zap_pid_ns_processes() RCU Tasks and PID-namespace unshare can interact in do_exit() in a complicated circular dependency: 1) TASK A calls unshare(CLONE_NEWPID), this creates a new PID namespace that every subsequent child of TASK A will belong to. But TASK A doesn't itself belong to that new PID namespace. 2) TASK A forks() and creates TASK B. TASK A stays attached to its PID namespace (let's say PID_NS1) and TASK B is the first task belonging to the new PID namespace created by unshare() (let's call it PID_NS2). 3) Since TASK B is the first task attached to PID_NS2, it becomes the PID_NS2 child reaper. 4) TASK A forks() again and creates TASK C which get attached to PID_NS2. Note how TASK C has TASK A as a parent (belonging to PID_NS1) but has TASK B (belonging to PID_NS2) as a pid_namespace child_reaper. 5) TASK B exits and since it is the child reaper for PID_NS2, it has to kill all other tasks attached to PID_NS2, and wait for all of them to die before getting reaped itself (zap_pid_ns_process()). 6) TASK A calls synchronize_rcu_tasks() which leads to synchronize_srcu(&tasks_rcu_exit_srcu). 7) TASK B is waiting for TASK C to get reaped. But TASK B is under a tasks_rcu_exit_srcu SRCU critical section (exit_notify() is between exit_tasks_rcu_start() and exit_tasks_rcu_finish()), blocking TASK A. 8) TASK C exits and since TASK A is its parent, it waits for it to reap TASK C, but it can't because TASK A waits for TASK B that waits for TASK C. Pid_namespace semantics can hardly be changed at this point. But the coverage of tasks_rcu_exit_srcu can be reduced instead. The current task is assumed not to be concurrently reapable at this stage of exit_notify() and therefore tasks_rcu_exit_srcu can be temporarily relaxed without breaking its constraints, providing a way out of the deadlock scenario. [ paulmck: Fix build failure by adding additional declaration. ] Fixes: 3f95aa81d265 ("rcu: Make TASKS_RCU handle tasks that are almost done exiting") Reported-by: Pengfei Xu <pengfei.xu@intel.com> Suggested-by: Boqun Feng <boqun.feng@gmail.com> Suggested-by: Neeraj Upadhyay <quic_neeraju@quicinc.com> Suggested-by: Paul E. McKenney <paulmck@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Eric W . Biederman <ebiederm@xmission.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
44757092 |
|
25-Nov-2022 |
Frederic Weisbecker <frederic@kernel.org> |
rcu-tasks: Remove preemption disablement around srcu_read_[un]lock() calls Ever since the following commit: 5a41344a3d83 ("srcu: Simplify __srcu_read_unlock() via this_cpu_dec()") SRCU doesn't rely anymore on preemption to be disabled in order to modify the per-CPU counter. And even then it used to be done from the API itself. Therefore and after checking further, it appears to be safe to remove the preemption disablement around __srcu_read_[un]lock() in exit_tasks_rcu_start() and exit_tasks_rcu_finish() Suggested-by: Boqun Feng <boqun.feng@gmail.com> Suggested-by: Paul E. McKenney <paulmck@kernel.org> Suggested-by: Neeraj Upadhyay <quic_neeraju@quicinc.com> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
e4e1e808 |
|
25-Nov-2022 |
Frederic Weisbecker <frederic@kernel.org> |
rcu-tasks: Improve comments explaining tasks_rcu_exit_srcu purpose Make sure we don't need to look again into the depths of git blame in order not to miss a subtle part about how rcu-tasks is dealing with exiting tasks. Suggested-by: Boqun Feng <boqun.feng@gmail.com> Suggested-by: Neeraj Upadhyay <quic_neeraju@quicinc.com> Suggested-by: Paul E. McKenney <paulmck@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
9420fb93 |
|
21-Nov-2022 |
Zqiang <qiang1.zhang@intel.com> |
rcu-tasks: Use accurate runstart time for RCU Tasks boot-time testing Currently, test_rcu_tasks_callback() reads from the jiffies counter only once when this function is invoked. This introduces inaccuracies because of the latencies induced by the synchronize_rcu_tasks*() invocations. This commit therefore re-reads the jiffies counter at the beginning of each test, thus avoiding penalizing later tests for the latencies induced by earlier tests. Therefore, this commit at the start of each RCU Tasks test, re-fetch the jiffies time as the runstart time. Signed-off-by: Zqiang <qiang1.zhang@intel.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
df83fff7 |
|
29-Sep-2022 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Make grace-period-age message human-readable This commit adds a few words to the informative message that appears every ten seconds in RCU Tasks and RCU Tasks Trace grace periods. This message currently reads as follows: rcu_tasks_wait_gp: rcu_tasks grace period 1046 is 10088 jiffies old. After this change, it provides additional context, instead reading as follows: rcu_tasks_wait_gp: rcu_tasks grace period number 1046 (since boot) is 10088 jiffies old. Reported-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
e6c86c51 |
|
14-Oct-2022 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Provide rcu_trace_implies_rcu_gp() As an accident of implementation, an RCU Tasks Trace grace period also acts as an RCU grace period. However, this could change at any time. This commit therefore creates an rcu_trace_implies_rcu_gp() that currently returns true to codify this accident. Code relying on this accident must call this function to verify that this accident is still happening. Reported-by: Hou Tao <houtao@huaweicloud.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Martin KaFai Lau <martin.lau@linux.dev> Link: https://lore.kernel.org/r/20221014113946.965131-2-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
d6ad6063 |
|
18-Jul-2022 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Ensure RCU Tasks Trace loops have quiescent states The RCU Tasks Trace grace-period kthread loops across all CPUs, and there can be quite a few CPUs, with some commercially available systems sporting well over a thousand of them. Some of these loops can feature IPIs, which can take some time. This commit therefore places a call to cond_resched_tasks_rcu_qs() in each such loop. Link: https://docs.google.com/document/d/1V0YnG1HTWMt9WHJjroiJL9lf-hMrud4v8Fn3fhyY0cI/edit?usp=sharing Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
fcd53c8a |
|
12-Jul-2022 |
Zqiang <qiang1.zhang@intel.com> |
rcu-tasks: Convert RCU_LOCKDEP_WARN() to WARN_ONCE() Kernels built with CONFIG_PROVE_RCU=y and CONFIG_DEBUG_LOCK_ALLOC=y attempt to emit a warning when the synchronize_rcu_tasks_generic() function is called during early boot while the rcu_scheduler_active variable is RCU_SCHEDULER_INACTIVE. However the warnings is not actually be printed because the debug_lockdep_rcu_enabled() returns false, exactly because the rcu_scheduler_active variable is still equal to RCU_SCHEDULER_INACTIVE. This commit therefore replaces RCU_LOCKDEP_WARN() with WARN_ONCE() to force these warnings to actually be printed. Signed-off-by: Zqiang <qiang1.zhang@intel.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
e72ee5e1 |
|
14-Jun-2022 |
Waiman Long <longman@redhat.com> |
rcu-tasks: Use delayed_work to delay rcu_tasks_verify_self_tests() Commit 2585014188d5 ("rcu-tasks: Be more patient for RCU Tasks boot-time testing") fixes false positive rcu_tasks verification check failure by repeating the test once every second until timeout using schedule_timeout_uninterruptible(). Since rcu_tasks_verify_selft_tests() is called from do_initcalls() as a late_initcall, this has the undesirable side effect of delaying other late_initcall's queued after it by a second or more. Fix this by instead using delayed_work to repeat the verification check. Fixes: 2585014188d5 ("rcu-tasks: Be more patient for RCU Tasks boot-time testing") Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
1cf1144e |
|
07-Jun-2022 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Be more patient for RCU Tasks boot-time testing The RCU-Tasks family of grace-period primitives can take some time to complete, and the amount of time can depend on the exact hardware and software configuration. Some configurations boot up fast enough that the RCU-Tasks verification process gets false-positive failures. This commit therefore allows up to 30 seconds for the grace periods to complete, with this value adjustable downwards using the rcupdate.rcu_task_stall_timeout kernel boot parameter. Reported-by: Matthew Wilcox <willy@infradead.org> Reported-by: Zhouyi Zhou <zhouzhouyi@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Tested-by: Zhouyi Zhou <zhouzhouyi@gmail.com> Tested-by: Mark Rutland <mark.rutland@arm.com>
|
#
eea3423b |
|
06-Jun-2022 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Update comments This commit updates comments to reflect the changes in the series of commits that eliminated the full task-list scan. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Martin KaFai Lau <kafai@fb.com> Cc: KP Singh <kpsingh@kernel.org>
|
#
56096ecd |
|
07-Jun-2022 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Disable and enable CPU hotplug in same function The rcu_tasks_trace_pregp_step() function invokes cpus_read_lock() to disable CPU hotplug, and a later call to the rcu_tasks_trace_postscan() function invokes cpus_read_unlock() to re-enable it. This was absolutely necessary in the past in order to protect the intervening scan of the full tasks list, but there is no longer such a scan. This commit therefore improves readability by moving the cpus_read_unlock() call to the end of the rcu_tasks_trace_pregp_step() function. This commit is a pure code-motion commit without any (intended) change in functionality. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Martin KaFai Lau <kafai@fb.com> Cc: KP Singh <kpsingh@kernel.org>
|
#
e386b672 |
|
02-Jun-2022 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Eliminate RCU Tasks Trace IPIs to online CPUs Currently, the RCU Tasks Trace grace-period kthread IPIs each online CPU using smp_call_function_single() in order to track any tasks currently in RCU Tasks Trace read-side critical sections during which the corresponding task has neither blocked nor been preempted. These IPIs are annoying and are also not strictly necessary because any task that blocks or is preempted within its current RCU Tasks Trace read-side critical section will be tracked on one of the per-CPU rcu_tasks_percpu structure's ->rtp_blkd_tasks list. So the only time that this is a problem is if one of the CPUs runs through a long-duration RCU Tasks Trace read-side critical section without a context switch. Note that the task_call_func() function cannot help here because there is no safe way to identify the target task. Of course, the task_call_func() function will be very useful later, when processing the list of tasks, but it needs to know the task. This commit therefore creates a cpu_curr_snapshot() function that returns a pointer the task_struct structure of some task that happened to be running on the specified CPU more or less during the time that the cpu_curr_snapshot() function was executing. If there was no context switch during this time, this function will return a pointer to the task_struct structure of the task that was running throughout. If there was a context switch, then the outgoing task will be taken care of by RCU's context-switch hook, and the incoming task was either already taken care during some previous context switch, or it is not currently within an RCU Tasks Trace read-side critical section. And in this latter case, the grace period already started, so there is no need to wait on this task. This new cpu_curr_snapshot() function is invoked on each CPU early in the RCU Tasks Trace grace-period processing, and the resulting tasks are queued for later quiescent-state inspection. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Martin KaFai Lau <kafai@fb.com> Cc: KP Singh <kpsingh@kernel.org>
|
#
ffcc21a3 |
|
01-Jun-2022 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Maintain a count of tasks blocking RCU Tasks Trace grace period This commit maintains a new n_trc_holdouts counter that tracks the number of tasks blocking the RCU Tasks grace period. This counter is useful for debugging, and its value has been added to a diagostic message. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Martin KaFai Lau <kafai@fb.com> Cc: KP Singh <kpsingh@kernel.org>
|
#
1a4a8153 |
|
20-May-2022 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Stop RCU Tasks Trace from scanning full tasks list This commit takes off the training wheels and relies only on scanning currently running tasks and tasks that have blocked or been preempted within their current RCU Tasks Trace read-side critical section. Before this commit, the time complexity of an RCU Tasks Trace grace period is O(T), where T is the number of tasks. After this commit, this time complexity is O(C+B), where C is the number of CPUs and B is the number of tasks that have blocked (or been preempted) at least once during their current RCU Tasks Trace read-side critical sections. Of course, if all tasks have blocked (or been preempted) at least once during their current RCU Tasks Trace read-side critical sections, this is still O(T), but current expectations are that RCU Tasks Trace read-side critical section will be short and that there will normally not be large numbers of tasks blocked within such a critical section. Dave Marchevsky kindly measured the effects of this commit on the RCU Tasks Trace grace-period latency and the rcu_tasks_trace_kthread task's CPU consumption per RCU Tasks Trace grace period over the course of a fixed test, all in milliseconds: Before After GP latency 22.3 ms stddev > 0.1 17.0 ms stddev < 0.1 GP CPU 2.3 ms stddev 0.3 1.1 ms stddev 0.2 This was on a system with 15,000 tasks, so it is reasonable to expect much larger savings on the systems on which this issue was first noted, given that they sport well in excess of 100,000 tasks. CPU consumption was measured using profiling techniques. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Martin KaFai Lau <kafai@fb.com> Cc: KP Singh <kpsingh@kernel.org> Tested-by: Dave Marchevsky <davemarchevsky@fb.com>
|
#
955a0192 |
|
18-May-2022 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Stop RCU Tasks Trace from scanning idle tasks Now that RCU scans both running tasks and tasks that have blocked within their current RCU Tasks Trace read-side critical section, there is no need for it to scan the idle tasks. After all, an idle loop should not be remain within an RCU Tasks Trace read-side critical section across exit from idle, and from a BPF viewpoint, functions invoked from the idle loop should not sleep. So only running idle tasks can be within RCU Tasks Trace read-side critical sections. This commit therefore removes the scan of the idle tasks from the rcu_tasks_trace_postscan() function. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Martin KaFai Lau <kafai@fb.com> Cc: KP Singh <kpsingh@kernel.org>
|
#
dc7d54b4 |
|
18-May-2022 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Pull in tasks blocked within RCU Tasks Trace readers This commit scans each CPU's ->rtp_blkd_tasks list, adding them to the list of holdout tasks. This will cause the current RCU Tasks Trace grace period to wait until these tasks exit their RCU Tasks Trace read-side critical sections. This commit will enable later work omitting the scan of the full task list. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Martin KaFai Lau <kafai@fb.com> Cc: KP Singh <kpsingh@kernel.org>
|
#
7460ade1 |
|
18-May-2022 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Scan running tasks for RCU Tasks Trace readers A running task might be within an RCU Tasks Trace read-side critical section for any length of time, but will not be placed on any of the per-CPU rcu_tasks_percpu structure's ->rtp_blkd_tasks lists. Therefore any RCU Tasks Trace grace-period processing that does not scan the full task list must interact with the running tasks. This commit therefore causes the rcu_tasks_trace_pregp_step() function to IPI each CPU in order to place the corresponding task on the holdouts list and to record whether or not it was in an RCU Tasks Trace read-side critical section. Yes, it is possible to avoid adding it to that list if it is not a reader, but that would prevent the system from remembering that this task was in a quiescent state. Which is why the running tasks are unconditionally added to the holdout list. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Martin KaFai Lau <kafai@fb.com> Cc: KP Singh <kpsingh@kernel.org>
|
#
19415004 |
|
17-May-2022 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Avoid rcu_tasks_trace_pertask() duplicate list additions This commit adds checks within rcu_tasks_trace_pertask() to avoid duplicate (and destructive) additions to the holdouts list. These checks will be required later due to the possibility of a given task having blocked while in an RCU Tasks Trace read-side critical section, but now running on a CPU. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Martin KaFai Lau <kafai@fb.com> Cc: KP Singh <kpsingh@kernel.org>
|
#
1fa98e2e |
|
17-May-2022 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Move rcu_tasks_trace_pertask() before rcu_tasks_trace_pregp_step() This is a code-motion-only commit that moves rcu_tasks_trace_pertask() to precede rcu_tasks_trace_pregp_step(), so that the latter will be able to invoke the other without forward references. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Martin KaFai Lau <kafai@fb.com> Cc: KP Singh <kpsingh@kernel.org>
|
#
387c0ad7 |
|
17-May-2022 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Add blocked-task indicator to RCU Tasks Trace stall warnings This commit adds a "B" indicator to the RCU Tasks Trace CPU stall warning when the task has blocked within its current read-side critical section. This serves as a debugging aid. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Martin KaFai Lau <kafai@fb.com> Cc: KP Singh <kpsingh@kernel.org>
|
#
0bcb3868 |
|
17-May-2022 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Untrack blocked RCU Tasks Trace at reader end This commit causes rcu_read_unlock_trace() to check for the current task being on a per-CPU list within the rcu_tasks_percpu structure, and removes it from that list if so. This has the effect of curtailing tracking of a task that blocked within an RCU Tasks Trace read-side critical section once it exits that critical section. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Martin KaFai Lau <kafai@fb.com> Cc: KP Singh <kpsingh@kernel.org>
|
#
0356d4e6 |
|
17-May-2022 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Track blocked RCU Tasks Trace readers This commit places any task that has ever blocked within its current RCU Tasks Trace read-side critical section on a per-CPU list within the rcu_tasks_percpu structure. Tasks are removed from this list when they exit by the exit_tasks_rcu_finish_trace() function. The purpose of this commit is to provide the information needed to eliminate the current scan of the full task list. This commit offsets the INT_MIN value for ->trc_reader_nesting with the new nesting level in order to avoid queueing tasks that are exiting their read-side critical sections. [ paulmck: Apply kernel test robot feedback. ] [ paulmck: Apply feedback from syzbot+9bb26e7c5e8e4fa7e641@syzkaller.appspotmail.com ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Tested-by: syzbot <syzbot+9bb26e7c5e8e4fa7e641@syzkaller.appspotmail.com> Tested-by: "Zhang, Qiang1" <qiang1.zhang@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Martin KaFai Lau <kafai@fb.com> Cc: KP Singh <kpsingh@kernel.org>
|
#
434c9eef |
|
16-May-2022 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Add data structures for lightweight grace periods This commit adds fields to task_struct and to rcu_tasks_percpu that will be used to avoid the task-list scan for RCU Tasks Trace grace periods, and also initializes these fields. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Martin KaFai Lau <kafai@fb.com> Cc: KP Singh <kpsingh@kernel.org>
|
#
f90f19da |
|
25-May-2022 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Make RCU Tasks Trace stall warning handle idle offline tasks When a CPU is offline, its idle task can appear to be running, but it cannot be doing anything while CPU-hotplug operations are excluded. This commit takes advantage of that fact by making trc_check_slow_task() check for task_curr(t) && cpu_online(task_cpu(t)), and recording full information in that case. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Martin KaFai Lau <kafai@fb.com> Cc: KP Singh <kpsingh@kernel.org>
|
#
be15a164 |
|
25-May-2022 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Make RCU Tasks Trace stall warnings print full .b.need_qs field Currently, the RCU Tasks Trace CPU stall warning simply indicates whether or not the .b.need_qs field is zero. This commit shows the three permitted values and flags other values with either "!" or "?". This is a debugging aid. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Martin KaFai Lau <kafai@fb.com> Cc: KP Singh <kpsingh@kernel.org>
|
#
c8c03ad9 |
|
24-May-2022 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Flag offline CPUs in RCU Tasks Trace stall warnings This commit tags offline CPUs with "(offline)" in RCU Tasks Trace CPU stall warnings. This is a debugging aid. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Martin KaFai Lau <kafai@fb.com> Cc: KP Singh <kpsingh@kernel.org>
|
#
9f3eb5fb |
|
24-May-2022 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Add slow-IPI indicator to RCU Tasks Trace stall warnings This commit adds a "I" indicator to the RCU Tasks Trace CPU stall warning when an IPI directed to a task has thus far failed to arrive. This serves as a debugging aid. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Martin KaFai Lau <kafai@fb.com> Cc: KP Singh <kpsingh@kernel.org>
|
#
0968e892 |
|
31-May-2022 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Simplify trc_inspect_reader() QS logic Currently, trc_inspect_reader() does one check for nesting less than or equal to zero, then sorts out the distinctions within this single "if" statement. This commit simplifies the logic by providing one "if" statement for quiescent states (nesting of zero) and another "if" statement for transitioning from one nesting level to another or the outermost rcu_read_unlock_trace() (negative nesting). Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Martin KaFai Lau <kafai@fb.com> Cc: KP Singh <kpsingh@kernel.org>
|
#
5d4c90d7 |
|
24-May-2022 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: RCU Tasks Trace grace-period kthread has implicit QS Because the task driving the grace-period kthread is in quiescent state throughout, this commit excludes it from the list of tasks from which a quiescent state is needed. This does mean that attaching a sleepable BPF program to function in kernel/rcu/tasks.h is a bad idea, by the way. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Martin KaFai Lau <kafai@fb.com> Cc: KP Singh <kpsingh@kernel.org>
|
#
897ba84d |
|
24-May-2022 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Handle idle tasks for recently offlined CPUs This commit identifies idle tasks for recently offlined CPUs as residing in a quiescent state. This is safe only because CPU-hotplug operations are excluded during these checks. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Martin KaFai Lau <kafai@fb.com> Cc: KP Singh <kpsingh@kernel.org>
|
#
5c9a9ca4 |
|
24-May-2022 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Idle tasks on offline CPUs are in quiescent states Any idle task corresponding to an offline CPU is in an RCU Tasks Trace quiescent state. This commit causes rcu_tasks_trace_postscan() to ignore idle tasks for offline CPUs, which it can do safely due to CPU-hotplug operations being disabled. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Martin KaFai Lau <kafai@fb.com> Cc: KP Singh <kpsingh@kernel.org>
|
#
9ff86b4c |
|
26-May-2022 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Make trc_read_check_handler() fetch ->trc_reader_nesting only once This commit replaces the pair of READ_ONCE(t->trc_reader_nesting) calls with a single such call and a local variable. This makes the code's intent more clear. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Martin KaFai Lau <kafai@fb.com> Cc: KP Singh <kpsingh@kernel.org>
|
#
55061126 |
|
25-May-2022 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Remove rcu_tasks_trace_postgp() wait for counter Now that tasks are not removed from the list until they have responded to any needed request for a quiescent state, it is no longer necessary to wait for the trc_n_readers_need_end counter to go to zero. This commit therefore removes that waiting code. It is therefore also no longer necessary for rcu_tasks_trace_postgp() to do the final decrement of this counter, so that code is also removed. This in turn means that trc_n_readers_need_end counter itself can be removed, as can the rcu_tasks_trace_iw irq_work structure and the rcu_read_unlock_iw() function. [ paulmck: Apply feedback from Zqiang. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Martin KaFai Lau <kafai@fb.com> Cc: KP Singh <kpsingh@kernel.org>
|
#
3847b645 |
|
23-May-2022 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Merge state into .b.need_qs and atomically update This commit gets rid of the task_struct structure's ->trc_reader_checked field, making it instead be a bit within the task_struct structure's existing ->trc_reader_special.b.need_qs field. This commit also atomically loads, stores, and checks the resulting combination of the reader-checked and need-quiescent state flags. This will in turn allow significant simplification of the rcu_tasks_trace_postgp() function as well as elimination of the trc_n_readers_need_end counter in later commits. These changes will in turn simplify later elimination of the RCU Tasks Trace scan of the task list, which will make RCU Tasks Trace grace periods less CPU-intensive. [ paulmck: Apply kernel test robot feedback. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Martin KaFai Lau <kafai@fb.com> Cc: KP Singh <kpsingh@kernel.org>
|
#
4a8cc433 |
|
19-Apr-2022 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Drive synchronous grace periods from calling task This commit causes synchronous grace periods to be driven from the task invoking synchronize_rcu_*(), allowing these functions to be invoked from the mid-boot dead zone extending from when the scheduler was initialized to to point that the various RCU tasks grace-period kthreads are spawned. This change will allow the self-tests to run in a consistent manner. Reported-by: Matthew Wilcox <willy@infradead.org> Reported-by: Zhouyi Zhou <zhouzhouyi@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
68cb4720 |
|
19-Apr-2022 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Move synchronize_rcu_tasks_generic() down This is strictly a code-motion commit that moves the synchronize_rcu_tasks_generic() down to where it can invoke rcu_tasks_one_gp() without the need for a forward declaration. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
d96225fd |
|
19-Apr-2022 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Split rcu_tasks_one_gp() from rcu_tasks_kthread() This commit abstracts most of the rcu_tasks_kthread() function's loop body into a new rcu_tasks_one_gp() function. It also introduces a new ->tasks_gp_mutex to synchronize concurrent calls to this new rcu_tasks_one_gp() function. This commit is preparation for allowing RCU tasks grace periods to be driven by the calling task during the mid-boot dead zone. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
4cf0585c |
|
06-Dec-2021 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Check for abandoned callbacks This commit adds a debugging scan for callbacks that got lost during a callback-queueing transition. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
ab2756ea |
|
08-Apr-2022 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Handle sparse cpu_possible_mask in rcu_tasks_invoke_cbs() If the cpu_possible_mask is sparse (for example, if bits are set only for CPUs 0, 4, 8, ...), then rcu_tasks_invoke_cbs() will access per-CPU data for a CPU not in cpu_possible_mask. It makes these accesses while doing a workqueue-based binary search for non-empty callback lists. Although this search must pass through CPUs not represented in cpu_possible_mask, it has no need to check the callback list for such CPUs. This commit therefore changes the rcu_tasks_invoke_cbs() function's binary search so as to only check callback lists for CPUs present in cpu_possible_mask. Reported-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
07d95c34 |
|
04-Apr-2022 |
Eric Dumazet <edumazet@google.com> |
rcu-tasks: Handle sparse cpu_possible_mask If the rcupdate.rcu_task_enqueue_lim kernel boot parameter is set to something greater than 1 and less than nr_cpu_ids, the code attempts to use a subset of the CPU's RCU Tasks callback lists. This works, but only if the cpu_possible_mask is contiguous. If there are "holes" in this mask, the callback-enqueue code might attempt to access a non-existent per-CPU ->rtcpu variable for a non-existent CPU. For example, if only CPUs 0, 4, 8, 12, 16 and so on are in cpu_possible_mask, specifying rcupdate.rcu_task_enqueue_lim=4 would cause the code to attempt to use callback queues for non-existent CPUs 1, 2, and 3. Because such systems have existed in the past and might still exist, the code needs to gracefully handle this situation. This commit therefore checks to see whether the desired CPU is present in cpu_possible_mask, and, if not, searches for the next CPU. This means that the systems administrator of a system with a sparse cpu_possible_mask will need to account for this sparsity when specifying the value of the rcupdate.rcu_task_enqueue_lim kernel boot parameter. For example, setting this parameter to the value 4 will use only CPUs 0 and 4, which CPU 4 getting three times the callback load of CPU 0. This commit assumes that bit (nr_cpu_ids - 1) is always set in cpu_possible_mask. Link: https://lore.kernel.org/lkml/CANn89iKaNEwyNZ=L_PQnkH0LP_XjLYrr_dpyRKNNoDJaWKdrmg@mail.gmail.com/ Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
10b3742f |
|
28-Mar-2022 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Make show_rcu_tasks_generic_gp_kthread() check all CPUs Currently, the show_rcu_tasks_generic_gp_kthread() function only looks at CPU 0's callback lists. Although this is not fatal, it can confuse debugging efforts in cases where any of the Tasks RCU flavors are in per-CPU queueing mode. This commit therefore causes this function to scan all CPUs' callback queues. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
bddf7122 |
|
18-Mar-2022 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Restore use of timers for non-RT kernels The use of hrtimers for RCU-tasks grace-period delays works well in general, but can result in excessive grace-period delays for some corner-case workloads. This commit therefore reverts to the use of timers for non-RT kernels to mitigate those grace-period delays. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
777570d9 |
|
08-Mar-2022 |
Sebastian Andrzej Siewior <bigeasy@linutronix.de> |
rcu-tasks: Use schedule_hrtimeout_range() to wait for grace periods The synchronous RCU-tasks grace-period-wait primitives invoke schedule_timeout_idle() to give readers a chance to exit their read-side critical sections. Unfortunately, this fails during early boot on PREEMPT_RT because PREEMPT_RT relies solely on ksoftirqd to run timer handlers. Because ksoftirqd cannot operate until its kthreads are spawned, there is a brief period of time following scheduler initialization where PREEMPT_RT cannot run the timer handlers that schedule_timeout_idle() relies on, resulting in a hang. To avoid this boot-time hang, this commit replaces schedule_timeout_idle() with schedule_hrtimeout(), so that the timer expires in hardirq context. This is ensures that the timer fires even on PREEMPT_RT throughout the irqs-enabled portions of boot as well as during runtime. The timer is set to expire between fract and fract + HZ / 2 jiffies in order to align with any other timers that might expire during that time, thus reducing the number of wakeups. Note that RCU-tasks grace periods are infrequent, so the use of hrtimer should be fine. In contrast, in common-case code, user of hrtimer could result in performance issues. Cc: Martin KaFai Lau <kafai@fb.com> Cc: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
88db792b |
|
03-Mar-2022 |
Sebastian Andrzej Siewior <bigeasy@linutronix.de> |
rcu-tasks: Use rcuwait for the rcu_tasks_kthread() The waitqueue used by rcu_tasks_kthread() has always only one waiter. With a guaranteed only one waiter, this can be replaced with rcuwait which is smaller and simpler. With rcuwait based wake counterpart, the irqwork function (call_rcu_tasks_iw_wakeup()) can be invoked hardirq context because it is only a wake up and no sleeping locks are involved (unlike the wait_queue_head). As a side effect, this is also one piece of the puzzle to pass the RCU selftest at early boot on PREEMPT_RT. Replace wait_queue_head with rcuwait and let the irqwork run in hardirq context on PREEMPT_RT. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
f2539003 |
|
25-Feb-2022 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Print pre-stall-warning informational messages RCU-tasks stall-warning messages are printed after the grace period is ten minutes old. Unfortunately, most of us will have rebooted the system in response to an apparently-hung command long before the ten minutes is up, and will thus see what looks to be a silent hang. This commit therefore adds pr_info() messages that are printed earlier. These should avoid being classified as errors, but should give impatient users a hint. These are controlled by new rcupdate.rcu_task_stall_info and rcupdate.rcu_task_stall_info_mult kernel-boot parameters. The former defines the initial delay in jiffies (defaulting to 10 seconds) and the latter defines the multiplier (defaulting to 3). Thus, by default, the first message will appear 10 seconds into the RCU-tasks grace period, the second 40 seconds in, and the third 160 seconds in. There would be a fourth at 640 seconds in, but the stall warning message appears 600 seconds in, and once a stall warning is printed for a given grace period, no further informational messages are printed. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
f75fd4b9 |
|
17-Feb-2022 |
Padmanabha Srinivasaiah <treasure4paddy@gmail.com> |
rcu-tasks: Fix race in schedule and flush work While booting secondary CPUs, cpus_read_[lock/unlock] is not keeping online cpumask stable. The transient online mask results in below calltrace. [ 0.324121] CPU1: Booted secondary processor 0x0000000001 [0x410fd083] [ 0.346652] Detected PIPT I-cache on CPU2 [ 0.347212] CPU2: Booted secondary processor 0x0000000002 [0x410fd083] [ 0.377255] Detected PIPT I-cache on CPU3 [ 0.377823] CPU3: Booted secondary processor 0x0000000003 [0x410fd083] [ 0.379040] ------------[ cut here ]------------ [ 0.383662] WARNING: CPU: 0 PID: 10 at kernel/workqueue.c:3084 __flush_work+0x12c/0x138 [ 0.384850] Modules linked in: [ 0.385403] CPU: 0 PID: 10 Comm: rcu_tasks_rude_ Not tainted 5.17.0-rc3-v8+ #13 [ 0.386473] Hardware name: Raspberry Pi 4 Model B Rev 1.4 (DT) [ 0.387289] pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 0.388308] pc : __flush_work+0x12c/0x138 [ 0.388970] lr : __flush_work+0x80/0x138 [ 0.389620] sp : ffffffc00aaf3c60 [ 0.390139] x29: ffffffc00aaf3d20 x28: ffffffc009c16af0 x27: ffffff80f761df48 [ 0.391316] x26: 0000000000000004 x25: 0000000000000003 x24: 0000000000000100 [ 0.392493] x23: ffffffffffffffff x22: ffffffc009c16b10 x21: ffffffc009c16b28 [ 0.393668] x20: ffffffc009e53861 x19: ffffff80f77fbf40 x18: 00000000d744fcc9 [ 0.394842] x17: 000000000000000b x16: 00000000000001c2 x15: ffffffc009e57550 [ 0.396016] x14: 0000000000000000 x13: ffffffffffffffff x12: 0000000100000000 [ 0.397190] x11: 0000000000000462 x10: ffffff8040258008 x9 : 0000000100000000 [ 0.398364] x8 : 0000000000000000 x7 : ffffffc0093c8bf4 x6 : 0000000000000000 [ 0.399538] x5 : 0000000000000000 x4 : ffffffc00a976e40 x3 : ffffffc00810444c [ 0.400711] x2 : 0000000000000004 x1 : 0000000000000000 x0 : 0000000000000000 [ 0.401886] Call trace: [ 0.402309] __flush_work+0x12c/0x138 [ 0.402941] schedule_on_each_cpu+0x228/0x278 [ 0.403693] rcu_tasks_rude_wait_gp+0x130/0x144 [ 0.404502] rcu_tasks_kthread+0x220/0x254 [ 0.405264] kthread+0x174/0x1ac [ 0.405837] ret_from_fork+0x10/0x20 [ 0.406456] irq event stamp: 102 [ 0.406966] hardirqs last enabled at (101): [<ffffffc0093c8468>] _raw_spin_unlock_irq+0x78/0xb4 [ 0.408304] hardirqs last disabled at (102): [<ffffffc0093b8270>] el1_dbg+0x24/0x5c [ 0.409410] softirqs last enabled at (54): [<ffffffc0081b80c8>] local_bh_enable+0xc/0x2c [ 0.410645] softirqs last disabled at (50): [<ffffffc0081b809c>] local_bh_disable+0xc/0x2c [ 0.411890] ---[ end trace 0000000000000000 ]--- [ 0.413000] smp: Brought up 1 node, 4 CPUs [ 0.413762] SMP: Total of 4 processors activated. [ 0.414566] CPU features: detected: 32-bit EL0 Support [ 0.415414] CPU features: detected: 32-bit EL1 Support [ 0.416278] CPU features: detected: CRC32 instructions [ 0.447021] Callback from call_rcu_tasks_rude() invoked. [ 0.506693] Callback from call_rcu_tasks() invoked. This commit therefore fixes this issue by applying a single-CPU optimization to the RCU Tasks Rude grace-period process. The key point here is that the purpose of this RCU flavor is to force a schedule on each online CPU since some past event. But the rcu_tasks_rude_wait_gp() function runs in the context of the RCU Tasks Rude's grace-period kthread, so there must already have been a context switch on the current CPU since the call to either synchronize_rcu_tasks_rude() or call_rcu_tasks_rude(). So if there is only a single CPU online, RCU Tasks Rude's grace-period kthread does not need to anything at all. It turns out that the rcu_tasks_rude_wait_gp() function's call to schedule_on_each_cpu() causes problems during early boot. During that time, there is only one online CPU, namely the boot CPU. Therefore, applying this single-CPU optimization fixes early-boot instances of this problem. Link: https://lore.kernel.org/lkml/20220210184319.25009-1-treasure4paddy@gmail.com/T/ Suggested-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Padmanabha Srinivasaiah <treasure4paddy@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
04d4e665 |
|
07-Feb-2022 |
Frederic Weisbecker <frederic@kernel.org> |
sched/isolation: Use single feature type while referring to housekeeping cpumask Refer to housekeeping APIs using single feature types instead of flags. This prevents from passing multiple isolation features at once to housekeeping interfaces, which soon won't be possible anymore as each isolation features will have their own cpumask. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Juri Lelli <juri.lelli@redhat.com> Reviewed-by: Phil Auld <pauld@redhat.com> Link: https://lore.kernel.org/r/20220207155910.527133-5-frederic@kernel.org
|
#
00a8b4b5 |
|
02-Feb-2022 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Set ->percpu_enqueue_shift to zero upon contention Currently, call_rcu_tasks_generic() sets ->percpu_enqueue_shift to order_base_2(nr_cpu_ids) upon encountering sufficient contention. This does not shift to use of non-CPU-0 callback queues as intended, but rather continues using only CPU 0's queue. Although this does provide some decrease in contention due to spreading work over multiple locks, it is not the dramatic decrease that was intended. This commit therefore makes call_rcu_tasks_generic() set ->percpu_enqueue_shift to 0. Reported-by: Neeraj Upadhyay <quic_neeraju@quicinc.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
2bcd18e0 |
|
02-Feb-2022 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Use order_base_2() instead of ilog2() The ilog2() function can be used to generate a shift count, but it will generate the same count for a power of two as for one greater than a power of two. This results in shift counts that are larger than necessary for systems with a power-of-two number of CPUs because the CPUs are numbered from zero, so that the maximum CPU number is one less than that power of two. This commit therefore substitutes order_base_2(), which appears to have been designed for exactly this use case. Suggested-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
da123016 |
|
26-Jan-2022 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Fix computation of CPU-to-list shift counts The ->percpu_enqueue_shift field is used to map from the running CPU number to the index of the corresponding callback list. This mapping can change at runtime in response to varying callback load, resulting in varying levels of contention on the callback-list locks. Unfortunately, the initial value of this field is correct only if the system happens to have a power-of-two number of CPUs, otherwise the callbacks from the high-numbered CPUs can be placed into the callback list indexed by 1 (rather than 0), and those index-1 callbacks will be ignored. This can result in soft lockups and hangs. This commit therefore corrects this mapping, adding one to this shift count as needed for systems having odd numbers of CPUs. Fixes: 7a30871b6a27 ("rcu-tasks: Introduce ->percpu_enqueue_shift for dynamic queue selection") Reported-by: Andrii Nakryiko <andrii.nakryiko@gmail.com> Cc: Reported-by: Martin Lau <kafai@fb.com> Cc: Neeraj Upadhyay <neeraj.iitr10@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
fd796e41 |
|
29-Nov-2021 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Use fewer callbacks queues if callback flood ends By default, when lock contention is encountered, the RCU Tasks flavors of RCU switch to using per-CPU queueing. However, if the callback flood ends, per-CPU queueing continues to be used, which introduces significant additional overhead, especially for callback invocation, which fans out a series of workqueue handlers. This commit therefore switches back to single-queue operation if at the beginning of a grace period there are very few callbacks. The definition of "very few" is set by the rcupdate.rcu_task_collapse_lim module parameter, which defaults to 10. This switch happens in two phases, with the first phase causing future callbacks to be enqueued on CPU 0's queue, but with all queues continuing to be checked for grace periods and callback invocation. The second phase checks to see if an RCU grace period has elapsed and if all remaining RCU-Tasks callbacks are queued on CPU 0. If so, only CPU 0 is checked for future grace periods and callback operation. Of course, the return of contention anywhere during this process will result in returning to per-CPU callback queueing. Reported-by: Martin Lau <kafai@fb.com> Cc: Neeraj Upadhyay <neeraj.iitr10@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
2cee0789 |
|
29-Nov-2021 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Use separate ->percpu_dequeue_lim for callback dequeueing Decreasing the number of callback queues is a bit tricky because it is necessary to handle callbacks that were queued before the number of queues decreased, but which were not ready to invoke until afterwards. This commit takes a first step in this direction by maintaining a separate ->percpu_dequeue_lim to control callback dequeueing, in addition to the existing ->percpu_enqueue_lim which now controls only enqueueing. Reported-by: Martin Lau <kafai@fb.com> Cc: Neeraj Upadhyay <neeraj.iitr10@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
ab97152f |
|
24-Nov-2021 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Use more callback queues if contention encountered The rcupdate.rcu_task_enqueue_lim module parameter allows system administrators to tune the number of callback queues used by the RCU Tasks flavors. However if callback storms are infrequent, it would be better to operate with a single queue on a given system unless and until that system actually needed more queues. Systems not needing more queues can then avoid the overhead of checking the extra queues and especially avoid the overhead of fanning workqueue handlers out to all CPUs to invoke callbacks. This commit therefore switches to using all the CPUs' callback queues if call_rcu_tasks_generic() encounters too much lock contention. The amount of lock contention to tolerate defaults to 100 contended lock acquisitions per jiffy, and can be adjusted using the new rcupdate.rcu_task_contend_lim module parameter. Such switching is undertaken only if the rcupdate.rcu_task_enqueue_lim module parameter is negative, which is its default value (-1). This allows savvy systems administrators to set the number of queues to some known good value and to not have to worry about the kernel doing any second guessing. [ paulmck: Apply feedback from Guillaume Tucker and kernelci. ] Reported-by: Martin Lau <kafai@fb.com> Cc: Neeraj Upadhyay <neeraj.iitr10@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
3063b33a |
|
23-Nov-2021 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Avoid raw-spinlocked wakeups from call_rcu_tasks_generic() If the caller of of call_rcu_tasks(), call_rcu_tasks_rude(), or call_rcu_tasks_trace() holds a raw spinlock, and then if call_rcu_tasks_generic() determines that the grace-period kthread must be awakened, then the wakeup might acquire a normal spinlock while a raw spinlock is held. This results in lockdep splats when the kernel is built with CONFIG_PROVE_RAW_LOCK_NESTING=y. This commit therefore defers the wakeup using irq_work_queue(). It would be nice to directly invoke wakeup when a raw spinlock is not held, but there is currently no way to check for this in all kernels. Reported-by: Martin Lau <kafai@fb.com> Cc: Neeraj Upadhyay <neeraj.iitr10@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
7d13d30b |
|
22-Nov-2021 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Count trylocks to estimate call_rcu_tasks() contention This commit converts the unconditional raw_spin_lock_rcu_node() lock acquisition in call_rcu_tasks_generic() to a trylock followed by an unconditional acquisition if the trylock fails. If the trylock fails, the failure is counted, but the count is reset to zero on each new jiffy. This statistic will be used to determine when to move from a single callback queue to per-CPU callback queues. Reported-by: Martin Lau <kafai@fb.com> Cc: Neeraj Upadhyay <neeraj.iitr10@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
8610b656 |
|
12-Nov-2021 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Add rcupdate.rcu_task_enqueue_lim to set initial queueing This commit adds a rcupdate.rcu_task_enqueue_lim module parameter that sets the initial number of callback queues to use for the RCU Tasks family of RCU implementations. This parameter allows testing of various fanout values. Reported-by: Martin Lau <kafai@fb.com> Cc: Neeraj Upadhyay <neeraj.iitr10@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
ce9b1c66 |
|
11-Nov-2021 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Make rcu_barrier_tasks*() handle multiple callback queues Currently, rcu_barrier_tasks(), rcu_barrier_tasks_rude(), and rcu_barrier_tasks_trace() simply invoke the corresponding synchronize_rcu_tasks*() function. This works because there is only one callback queue. However, there will soon be multiple callback queues. This commit therefore scans the queues currently in use, entraining a callback on each non-empty queue. Sequence numbers and reference counts are used to synchronize this process in a manner similar to the approach taken by rcu_barrier(). Reported-by: Martin Lau <kafai@fb.com> Cc: Neeraj Upadhyay <neeraj.iitr10@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
d363f833 |
|
10-Nov-2021 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Use workqueues for multiple rcu_tasks_invoke_cbs() invocations If there is a flood of callbacks, it is necessary to put multiple CPUs to work invoking those callbacks. This commit therefore uses a workqueue-flooding approach to parallelize RCU Tasks callback execution. Reported-by: Martin Lau <kafai@fb.com> Cc: Neeraj Upadhyay <neeraj.iitr10@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
57881863 |
|
09-Nov-2021 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Abstract invocations of callbacks This commit adds a rcu_tasks_invoke_cbs() function that invokes all ready callbacks on all of the per-CPU lists that are currently in use. Reported-by: Martin Lau <kafai@fb.com> Cc: Neeraj Upadhyay <neeraj.iitr10@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
4d1114c0 |
|
09-Nov-2021 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Abstract checking of callback lists This commit adds a rcu_tasks_need_gpcb() function that returns an indication of whether another grace period is required, and if no grace period is required, whether there are callbacks that need to be invoked. The function scans all per-CPU lists currently in use. Reported-by: Martin Lau <kafai@fb.com> Cc: Neeraj Upadhyay <neeraj.iitr10@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
8dd593fd |
|
09-Nov-2021 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Add a ->percpu_enqueue_lim to the rcu_tasks structure This commit adds a ->percpu_enqueue_lim field to the rcu_tasks structure. This field contains two to the power of the ->percpu_enqueue_shift field, easing construction of iterators over the per-CPU queues that might contain RCU Tasks callbacks. Such iterators will be introduced in later commits. Reported-by: Martin Lau <kafai@fb.com> Cc: Neeraj Upadhyay <neeraj.iitr10@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
65b629e7 |
|
09-Nov-2021 |
Neeraj Upadhyay <quic_neeraju@quicinc.com> |
rcu-tasks: Inspect stalled task's trc state in locked state On RCU tasks trace stall, inspect the RCU-tasks-trace specific states of stalled task in locked down state, using try_invoke_ on_locked_down_task(), to get reliable trc state of a non-running stalled task. This was tested using the following command: tools/testing/selftests/rcutorture/bin/kvm.sh --cpus 8 --configs TRACE01 \ --bootargs "rcutorture.torture_type=tasks-tracing rcutorture.stall_cpu=10 \ rcutorture.stall_cpu_block=1 rcupdate.rcu_task_stall_timeout=100" --trust-make As expected, this produced the following console output for running and sleeping tasks. [ 21.520291] INFO: rcu_tasks_trace detected stalls on tasks: [ 21.521292] P85: ... nesting: 1N cpu: 2 [ 21.521966] task:rcu_torture_sta state:D stack:15080 pid: 85 ppid: 2 flags:0x00004000 [ 21.523384] Call Trace: [ 21.523808] __schedule+0x273/0x6e0 [ 21.524428] schedule+0x35/0xa0 [ 21.524971] schedule_timeout+0x1ed/0x270 [ 21.525690] ? del_timer_sync+0x30/0x30 [ 21.526371] ? rcu_torture_writer+0x720/0x720 [ 21.527106] rcu_torture_stall+0x24a/0x270 [ 21.527816] kthread+0x115/0x140 [ 21.528401] ? set_kthread_struct+0x40/0x40 [ 21.529136] ret_from_fork+0x22/0x30 [ 21.529766] 1 holdouts [ 21.632300] INFO: rcu_tasks_trace detected stalls on tasks: [ 21.632345] rcu_torture_stall end. [ 21.633293] P85: . [ 21.633294] task:rcu_torture_sta state:R running task stack:15080 pid: 85 ppid: 2 flags:0x00004000 [ 21.633299] Call Trace: [ 21.633301] ? vprintk_emit+0xab/0x180 [ 21.633306] ? vprintk_emit+0x11a/0x180 [ 21.633308] ? _printk+0x4d/0x69 [ 21.633311] ? __default_send_IPI_shortcut+0x1f/0x40 [ paulmck: Update to new v5.16 task_call_func() name. ] Signed-off-by: Neeraj Upadhyay <quic_neeraju@quicinc.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
381a4f3b |
|
08-Nov-2021 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Use spin_lock_rcu_node() and friends This commit renames the rcu_tasks_percpu structure's ->cbs_pcpu_lock to ->lock and then uses spin_lock_rcu_node() and friends to acquire and release this lock, preparing for upcoming commits that will spread the grace-period process across multiple CPUs and kthreads. [ paulmck: Apply feedback from kernel test robot. ] Reported-by: Martin Lau <kafai@fb.com> Cc: Neeraj Upadhyay <neeraj.iitr10@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
9b073de1 |
|
08-Nov-2021 |
Paul E. McKenney <paulmck@kernel.org> |
rcu_tasks: Convert bespoke callback list to rcu_segcblist structure This commit moves from a bespoke head and tail pointer in the rcu_tasks_percpu structure to an rcu_segcblist structure, thus allowing associating the grace-period sequence number with groups of callbacks. This in turn will allow callbacks to be invoked independently on different CPUs. Reported-by: Martin Lau <kafai@fb.com> Cc: Neeraj Upadhyay <neeraj.iitr10@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
b14fb4fb |
|
08-Nov-2021 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Convert grace-period counter to grace-period sequence number This commit moves the rcu_tasks structure's ->n_gps grace-period-counter field to a ->task_gp_seq grce-period sequence number in order to enable use of the rcu_segcblist structure for the callback lists. This in turn permits CPUs to lag behind the RCU Tasks grace-period sequence number without suffering long-term slowdowns in callback invocation. Reported-by: Martin Lau <kafai@fb.com> Cc: Neeraj Upadhyay <neeraj.iitr10@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
7a30871b |
|
08-Nov-2021 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Introduce ->percpu_enqueue_shift for dynamic queue selection This commit introduces a ->percpu_enqueue_shift field to the rcu_tasks structure, and uses it to shift down the CPU number in order to select a rcu_tasks_percpu structure. This field is currently set to a sufficiently large shift count to always select the CPU-0 instance of the rcu_tasks_percpu structure, and later commits will adjust this. Reported-by: Martin Lau <kafai@fb.com> Cc: Neeraj Upadhyay <neeraj.iitr10@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
cafafd67 |
|
05-Nov-2021 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Create per-CPU callback lists Currently, RCU Tasks Trace (as well as the other two flavors of RCU Tasks) use a single global callback list. This works well and is simple, but expected changes in workload will cause this list to become a bottleneck. This commit therefore creates per-CPU callback lists for the various flavors of RCU Tasks, but continues queueing on a single list, namely that of CPU 0. Later commits will dynamically vary the number of lists in use to accommodate dynamic changes in workload. Reported-by: Martin Lau <kafai@fb.com> Cc: Neeraj Upadhyay <neeraj.iitr10@gmail.com> Tested-by: kernel test robot <beibei.si@intel.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
f5dbc594 |
|
18-Sep-2021 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Don't remove tasks with pending IPIs from holdout list Currently, the check_all_holdout_tasks_trace() function removes all tasks marked with ->trc_reader_checked from the holdout list, including those with IPIs pending. This means that the IPI handler might arrive at a task that has already been removed from the list, which is at best an accident waiting to happen. This commit therefore avoids removing tasks with IPIs pending from the holdout list. This in turn means that the "if" condition in the for_each_online_cpu() loop in rcu_tasks_trace_postgp() should always evaluate to false, so a WARN_ON_ONCE() is added to check that. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
9b3c4ab3 |
|
21-Sep-2021 |
Peter Zijlstra <peterz@infradead.org> |
sched,rcu: Rework try_invoke_on_locked_down_task() Give try_invoke_on_locked_down_task() a saner name and have it return an int so that the caller might distinguish between different reasons of failure. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Paul E. McKenney <paulmck@kernel.org> Acked-by: Vasily Gorbik <gor@linux.ibm.com> Tested-by: Vasily Gorbik <gor@linux.ibm.com> # on s390 Link: https://lkml.kernel.org/r/20210929152428.649944917@infradead.org
|
#
8af9e2c7 |
|
15-Sep-2021 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Update comments to cond_resched_tasks_rcu_qs() The cond_resched_rcu_qs() function no longer exists, despite being mentioned several times in kernel/rcu/tasks.h. This commit therefore updates it to the current cond_resched_tasks_rcu_qs(). Reported-by: Neeraj Upadhyay <neeraju@codeaurora.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
46aa886c |
|
27-Aug-2021 |
Neeraj Upadhyay <neeraju@codeaurora.org> |
rcu-tasks: Fix IPI failure handling in trc_wait_for_one_reader The trc_wait_for_one_reader() function is called at multiple stages of trace rcu-tasks GP function, rcu_tasks_wait_gp(): - First, it is called as part of per task function - rcu_tasks_trace_pertask(), for all non-idle tasks. As part of per task processing, this function add the task in the holdout list and if the task is currently running on a CPU, it sends IPI to the task's CPU. The IPI handler takes action depending on whether task is in trace rcu-tasks read side critical section or not: - a. If the task is in trace rcu-tasks read side critical section (t->trc_reader_nesting != 0), the IPI handler sets the task's ->trc_reader_special.b.need_qs, so that this task notifies exit from its outermost read side critical section (by decrementing trc_n_readers_need_end) to the GP handling function. trc_wait_for_one_reader() also increments trc_n_readers_need_end, so that the trace rcu-tasks GP handler function waits for this task's read side exit notification. The IPI handler also sets t->trc_reader_checked to true, and no further IPIs are sent for this task, for this trace rcu-tasks grace period and this task can be removed from holdout list. - b. If the task is in the process of exiting its trace rcu-tasks read side critical section, (t->trc_reader_nesting < 0), defer this task's processing to future calls to trc_wait_for_one_reader(). - c. If task is not in rcu-task read side critical section, t->trc_reader_nesting == 0, ->trc_reader_checked is set for this task, so that this task is removed from holdout list. - Second, trc_wait_for_one_reader() is called as part of post scan, in function rcu_tasks_trace_postscan(), for all idle tasks. - Third, in function check_all_holdout_tasks_trace(), this function is called for each task in the holdout list, but only if there isn't a pending IPI for the task (->trc_ipi_to_cpu == -1). This function removed the task from holdout list, if IPI handler has completed the required work, to ensure that the current trace rcu-tasks grace period either waits for this task, or this task is not in a trace rcu-tasks read side critical section. Now, considering the scenario where smp_call_function_single() fails in first case, inside rcu_tasks_trace_pertask(). In this case, ->trc_ipi_to_cpu is set to the current CPU for that task. This will result in trc_wait_for_one_reader() getting skipped in third case, inside check_all_holdout_tasks_trace(), for this task. This further results in ->trc_reader_checked never getting set for this task, and the task not getting removed from holdout list. This can cause the current trace rcu-tasks grace period to stall. Fix the above problem, by resetting ->trc_ipi_to_cpu to -1, on smp_call_function_single() failure, so that future IPI calls can be send for this task. Note that all three of the trc_wait_for_one_reader() function's callers (rcu_tasks_trace_pertask(), rcu_tasks_trace_postscan(), check_all_holdout_tasks_trace()) hold cpu_read_lock(). This means that smp_call_function_single() cannot race with CPU hotplug, and thus should never fail. Therefore, also add a warning in order to report any such failure in case smp_call_function_single() grows some other reason for failure. Signed-off-by: Neeraj Upadhyay <neeraju@codeaurora.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
ed42c380 |
|
24-Aug-2021 |
Neeraj Upadhyay <neeraju@codeaurora.org> |
rcu-tasks: Fix read-side primitives comment for call_rcu_tasks_trace call_rcu_tasks_trace() does have read-side primitives - rcu_read_lock_trace() and rcu_read_unlock_trace(). Fix this information in the comments. Signed-off-by: Neeraj Upadhyay <neeraju@codeaurora.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
a6517e9c |
|
17-Aug-2021 |
Neeraj Upadhyay <neeraju@codeaurora.org> |
rcu-tasks: Clarify read side section info for rcu_tasks_rude GP primitives RCU tasks rude variant does not check whether the current running context on a CPU is usermode. Read side critical section ends on transition to usermode execution, by the virtue of usermode execution being schedulable. Clarify this in comments for call_rcu_tasks_rude() and synchronize_rcu_tasks_rude(). Signed-off-by: Neeraj Upadhyay <neeraju@codeaurora.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
d39ec8f3 |
|
17-Aug-2021 |
Neeraj Upadhyay <neeraju@codeaurora.org> |
rcu-tasks: Correct comparisons for CPU numbers in show_stalled_task_trace Valid CPU numbers can be zero or greater, but the checks for ->trc_ipi_to_cpu and tick_nohz_full_cpu()'s argument are for strictly greater than. This commit therefore corrects the check for no_hz_full cpu in show_stalled_task_trace() so as to include cpu 0. Signed-off-by: Neeraj Upadhyay <neeraju@codeaurora.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
89401176 |
|
17-Aug-2021 |
Neeraj Upadhyay <neeraju@codeaurora.org> |
rcu-tasks: Correct firstreport usage in check_all_holdout_tasks_trace In check_all_holdout_tasks_trace(), firstreport is a pointer argument; so, check the dereferenced value, instead of checking the pointer. Signed-off-by: Neeraj Upadhyay <neeraju@codeaurora.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
d0a85858 |
|
17-Aug-2021 |
Neeraj Upadhyay <neeraju@codeaurora.org> |
rcu-tasks: Fix s/rcu_add_holdout/trc_add_holdout/ typo in comment Signed-off-by: Neeraj Upadhyay <neeraju@codeaurora.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
0db7c32a |
|
11-Aug-2021 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Move RTGS_WAIT_CBS to beginning of rcu_tasks_kthread() loop Early in debugging, it made some sense to differentiate the first iteration from subsequent iterations, but now this just causes confusion. This commit therefore moves the "set_tasks_gp_state(rtp, RTGS_WAIT_CBS)" statement to the beginning of the "for" loop in rcu_tasks_kthread(). Reported-by: Neeraj Upadhyay <neeraju@codeaurora.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
c4f113ac |
|
05-Aug-2021 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Fix s/instruction/instructions/ typo in comment Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
a5c071cc |
|
28-Jul-2021 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Remove second argument of rcu_read_unlock_trace_special() The second argument of rcu_read_unlock_trace_special() is always zero. When called from exit_tasks_rcu_finish_trace(), it is the constant zero, and rcu_read_unlock_trace_special() doesn't get called from rcu_read_unlock_trace() unless the value of local variable "nesting" is zero because in that case the early return is taken instead. This commit therefore removes the "nesting" argument from the rcu_read_unlock_trace_special() function, substituting the constant zero within that function. This commit also adds a WARN_ON_ONCE() to rcu_read_lock_trace_held() in case non-zeroness some day appears. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
18f08e75 |
|
28-Jul-2021 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Add trc_inspect_reader() checks for exiting critical section Currently, trc_inspect_reader() treats a task exiting its RCU Tasks Trace read-side critical section the same as being within that critical section. However, this can fail because that task might have already checked its .need_qs field, which means that it might never decrement the all-important trc_n_readers_need_end counter. Of course, for that to happen, the task would need to never again execute an RCU Tasks Trace read-side critical section, but this really could happen if the system's last trampoline was removed. Note that exit from such a critical section cannot be treated as a quiescent state due to the possibility of nested critical sections. This means that if trc_inspect_reader() sees a negative nesting value, it must set up to try again later. This commit therefore ignores tasks that are exiting their RCU Tasks Trace read-side critical sections so that they will be rechecked later. [ paulmck: Apply feedback from Neeraj Upadhyay and Boqun Feng. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
96017bf9 |
|
28-Jul-2021 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Simplify trc_read_check_handler() atomic operations Currently, trc_wait_for_one_reader() atomically increments the trc_n_readers_need_end counter before sending the IPI invoking trc_read_check_handler(). All failure paths out of trc_read_check_handler() and also from the smp_call_function_single() within trc_wait_for_one_reader() must carefully atomically decrement this counter. This is more complex than it needs to be. This commit therefore simplifies things and saves a few lines of code by dispensing with the atomic decrements in favor of having trc_read_check_handler() do the atomic increment only in the success case. In theory, this represents no change in functionality. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
cbe0d8d9 |
|
30-Jul-2021 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Wait for trc_read_check_handler() IPIs Currently, RCU Tasks Trace initializes the trc_n_readers_need_end counter to the value one, increments it before each trc_read_check_handler() IPI, then decrements it within trc_read_check_handler() if the target task was in a quiescent state (or if the target task moved to some other CPU while the IPI was in flight), complaining if the new value was zero. The rationale for complaining is that the initial value of one must be decremented away before zero can be reached, and this decrement has not yet happened. Except that trc_read_check_handler() is initiated with an asynchronous smp_call_function_single(), which might be significantly delayed. This can result in false-positive complaints about the counter reaching zero. This commit therefore waits for in-flight IPI handlers to complete before decrementing away the initial value of one from the trc_n_readers_need_end counter. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
8211e922 |
|
30-Jun-2021 |
Liu Song <liu.song11@zte.com.cn> |
rcu: Use per_cpu_ptr to get the pointer of per_cpu variable There are a few remaining locations in kernel/rcu that still use "&per_cpu()". This commit replaces them with "per_cpu_ptr(&)", and does not introduce any functional change. Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Reviewed-by: Neeraj Upadhyay <neeraju@codeaurora.org> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Liu Song <liu.song11@zte.com.cn> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
e4be1f44 |
|
22-Jun-2021 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Fix synchronize_rcu_rude() typo in comment This commit replaces the fictitious synchronize_rcu_rude() function with its real-world synchronize_rcu_tasks_rude() counterpart. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
f8ab3fad |
|
24-May-2021 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Mark ->trc_reader_special.b.need_qs data races There are several ->trc_reader_special.b.need_qs data races that are too low-probability for KCSAN to notice, but which will happen sooner or later. This commit therefore marks these accesses. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
bdb0cca0 |
|
24-May-2021 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Mark ->trc_reader_nesting data races There are several ->trc_reader_nesting data races that are too low-probability for KCSAN to notice, but which will happen sooner or later. This commit therefore marks these accesses, and comments one that cannot race. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
45f4b4a2 |
|
24-May-2021 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Add comments explaining task_struct strategy Accesses to task_struct structures must be either protected by RCU or by get_task_struct(). Tasks trace RCU uses these in a non-obvious combination, in conjunction with an IPI handler. This commit therefore adds comments explaining this usage. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
a9ab9cce |
|
25-May-2021 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Don't delete holdouts within trc_wait_for_one_reader() Invoking trc_del_holdout() from within trc_wait_for_one_reader() is only a performance optimization because the RCU Tasks Trace grace-period kthread will eventually do this within check_all_holdout_tasks_trace(). But it is not a particularly important performance optimization because it only applies to the grace-period kthread, of which there is but one. This commit therefore removes this invocation of trc_del_holdout() in favor of the one in check_all_holdout_tasks_trace() in the grace-period kthread. Reported-by: "Xu, Yanfei" <yanfei.xu@windriver.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
1d10bf55 |
|
25-May-2021 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Don't delete holdouts within trc_inspect_reader() As Yanfei pointed out, although invoking trc_del_holdout() is safe from the viewpoint of the integrity of the holdout list itself, the put_task_struct() invoked by trc_del_holdout() can result in use-after-free errors due to later accesses to this task_struct structure by the RCU Tasks Trace grace-period kthread. This commit therefore removes this call to trc_del_holdout() from trc_inspect_reader() in favor of the grace-period thread's existing call to trc_del_holdout(), thus eliminating that particular class of use-after-free errors. Reported-by: "Xu, Yanfei" <yanfei.xu@windriver.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
474d0997 |
|
20-Apr-2021 |
Paul E. McKenney <paulmck@kernel.org> |
tasks-rcu: Make show_rcu_tasks_gp_kthreads() be static inline In some architectures, the no-op variant of show_rcu_tasks_gp_kthreads() get "no previous prototype" compiler warnings. These are false positives given that kernel/rcu/tasks.h is included only once. But why put up with the compiler noise? This commit therefore adds "static inline" to this definition to force the compiler to accept this situation, while also moving it to its proper place in kernel/rcu/rcu.h. Reported-by: kernel test robot <lkp@intel.com> [ paulmck: Update per Stephen Rothwell feedback. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
a616aec9 |
|
22-Mar-2021 |
Ingo Molnar <mingo@kernel.org> |
rcu: Fix various typos in comments Fix ~12 single-word typos in RCU code comments. [ paulmck: Apply feedback from Randy Dunlap. ] Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
9fc98e31 |
|
04-Mar-2021 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Add block comment laying out RCU Rude design This commit adds a block comment that gives a high-level overview of how RCU Rude grace periods progress. It also gives an overview of the memory ordering. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
06a3ec92 |
|
04-Mar-2021 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Add block comment laying out RCU Tasks design This commit adds a block comment that gives a high-level overview of how RCU tasks grace periods progress. It also adds a note about how exiting tasks are handled, plus it gives an overview of the memory ordering. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
a434dd10 |
|
25-Feb-2021 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Add block comment laying out RCU Tasks Trace design This commit adds a block comment that gives a high-level overview of how RCU tasks trace grace periods progress. It also adds a note about how exiting tasks are handled, plus it gives an overview of the memory ordering. Reported-by: Peter Zijlstra <peterz@infradead.org> Reported-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> [ paulmck: Fix commit log per Mathieu Desnoyers feedback. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
85b86994 |
|
25-Jan-2021 |
Lukas Bulwahn <lukas.bulwahn@gmail.com> |
rcu-tasks: Rectify kernel-doc for struct rcu_tasks The command 'find ./kernel/rcu/ | xargs ./scripts/kernel-doc -none' reported an issue with the kernel-doc of struct rcu_tasks. This commit rectifies the kernel-doc, such that no issues remain for ./kernel/rcu/. Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
bfba7ed0 |
|
09-Dec-2020 |
Uladzislau Rezki (Sony) <urezki@gmail.com> |
rcu-tasks: Add RCU-tasks self tests This commit adds self tests for early-boot use of RCU-tasks grace periods. It tests all three variants (Rude, Tasks, and Tasks Trace) and covers both synchronous (e.g., synchronize_rcu_tasks()) and asynchronous (e.g., call_rcu_tasks()) grace-period APIs. Self-tests are run only in kernels built with CONFIG_PROVE_RCU=y. Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> [ paulmck: Handle CONFIG_PROVE_RCU=n and identify test cases' callbacks. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
1b04fa99 |
|
09-Dec-2020 |
Uladzislau Rezki (Sony) <urezki@gmail.com> |
rcu-tasks: Move RCU-tasks initialization to before early_initcall() PowerPC testing encountered boot failures due to RCU Tasks not being fully initialized until core_initcall() time. This commit therefore initializes RCU Tasks (along with Rude RCU and RCU Tasks Trace) just before early_initcall() time, thus allowing waiting on RCU Tasks grace periods from early_initcall() handlers. Link: https://lore.kernel.org/rcu/87eekfh80a.fsf@dja-thinkpad.axtens.net/ Fixes: 36dadef23fcc ("kprobes: Init kprobes in early_initcall") Tested-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
75dc2da5 |
|
17-Sep-2020 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Make the units of ->init_fract be jiffies Currently, the units of ->init_fract are milliseconds while those of ->gp_sleep are jiffies. For consistency with each other and with the argument of schedule_timeout_idle(), this commit changes the units of ->init_fract to jiffies. This change does affect the backoff algorithm, but only on systems where HZ is not 1000, and even there the change makes more sense, given that the current setup would "back off" to the same number of jiffies repeatedly. In contrast, with this change, the number of jiffies waited increases on each pass through the loop in the rcu_tasks_wait_gp() function. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
27c0f144 |
|
15-Sep-2020 |
Paul E. McKenney <paulmck@kernel.org> |
rcutorture: Make grace-period kthread report match RCU flavor being tested At the end of the test and after rcu_torture_writer() stalls, rcutorture invokes show_rcu_gp_kthreads() in order to dump out information on the RCU grace-period kthread. This makes a lot of sense when testing vanilla RCU, but not so much for the other flavors. This commit therefore allows per-flavor kthread-dump functions to be specified. [ paulmck: Apply feedback from kernel test robot <lkp@intel.com>. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
77dc1741 |
|
15-Sep-2020 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Convert rcu_tasks_wait_gp() for-loop to while-loop The infinite for-loop in rcu_tasks_wait_gp() has its only exit at the top of the loop, so this commit does the straightforward conversion to a while-loop, thus saving a few lines. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
f747c7e1 |
|
15-Sep-2020 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Enclose task-list scan in rcu_read_lock() The rcu_tasks_trace_postgp() function uses for_each_process_thread() to scan the task list without the benefit of RCU read-side protection, which can result in use-after-free errors on task_struct structures. This error was missed because the TRACE01 rcutorture scenario enables lockdep, but also builds with CONFIG_PREEMPT_NONE=y. In this situation, preemption is disabled everywhere, so lockdep thinks everywhere can be a legitimate RCU reader. This commit therefore adds the needed rcu_read_lock() and rcu_read_unlock(). Note that this bug can occur only after an RCU Tasks Trace CPU stall warning, which by default only happens after a grace period has extended for ten minutes (yes, not a typo, minutes). Fixes: 4593e772b502 ("rcu-tasks: Add stall warnings for RCU Tasks Trace") Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Jiri Olsa <jolsa@redhat.com> Cc: <bpf@vger.kernel.org> Cc: <stable@vger.kernel.org> # 5.7.x Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
592031cc |
|
15-Sep-2020 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Fix low-probability task_struct leak When rcu_tasks_trace_postgp() function detects an RCU Tasks Trace CPU stall, it adds all tasks blocking the current grace period to a list, invoking get_task_struct() on each to prevent them from being freed while on the list. It then traverses that list, printing stall-warning messages for each one that is still blocking the current grace period and removing it from the list. The list removal invokes the matching put_task_struct(). This of course means that in the admittedly unlikely event that some task executes its outermost rcu_read_unlock_trace() in the meantime, it won't be removed from the list and put_task_struct() won't be executing, resulting in a task_struct leak. This commit therefore makes the list removal and put_task_struct() unconditional, stopping the leak. Note further that this bug can occur only after an RCU Tasks Trace CPU stall warning, which by default only happens after a grace period has extended for ten minutes (yes, not a typo, minutes). Fixes: 4593e772b502 ("rcu-tasks: Add stall warnings for RCU Tasks Trace") Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Jiri Olsa <jolsa@redhat.com> Cc: <bpf@vger.kernel.org> Cc: <stable@vger.kernel.org> # 5.7.x Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
ba3a86e4 |
|
14-Sep-2020 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Fix grace-period/unlock race in RCU Tasks Trace The more intense grace-period processing resulting from the 50x RCU Tasks Trace grace-period speedups exposed the following race condition: o Task A running on CPU 0 executes rcu_read_lock_trace(), entering a read-side critical section. o When Task A eventually invokes rcu_read_unlock_trace() to exit its read-side critical section, this function notes that the ->trc_reader_special.s flag is zero and and therefore invoke wil set ->trc_reader_nesting to zero using WRITE_ONCE(). But before that happens... o The RCU Tasks Trace grace-period kthread running on some other CPU interrogates Task A, but this fails because this task is currently running. This kthread therefore sends an IPI to CPU 0. o CPU 0 receives the IPI, and thus invokes trc_read_check_handler(). Because Task A has not yet cleared its ->trc_reader_nesting counter, this function sees that Task A is still within its read-side critical section. This function therefore sets the ->trc_reader_nesting.b.need_qs flag, AKA the .need_qs flag. Except that Task A has already checked the .need_qs flag, which is part of the ->trc_reader_special.s flag. The .need_qs flag therefore remains set until Task A's next rcu_read_unlock_trace(). o Task A now invokes synchronize_rcu_tasks_trace(), which cannot start a new grace period until the current grace period completes. And thus cannot return until after that time. But Task A's .need_qs flag is still set, which prevents the current grace period from completing. And because Task A is blocked, it will never execute rcu_read_unlock_trace() until its call to synchronize_rcu_tasks_trace() returns. We are therefore deadlocked. This race is improbable, but 80 hours of rcutorture made it happen twice. The race was possible before the grace-period speedup, but roughly 50x less probable. Several thousand hours of rcutorture would have been necessary to have a reasonable chance of making this happen before this 50x speedup. This commit therefore eliminates this deadlock by setting ->trc_reader_nesting to a large negative number before checking the .need_qs and zeroing (or decrementing with respect to its initial value) ->trc_reader_nesting. For its part, the IPI handler's trc_read_check_handler() function adds a check for negative values, deferring evaluation of the task in this case. Taken together, these changes avoid this deadlock scenario. Fixes: 276c410448db ("rcu-tasks: Split ->trc_reader_need_end") Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Jiri Olsa <jolsa@redhat.com> Cc: <bpf@vger.kernel.org> Cc: <stable@vger.kernel.org> # 5.7.x Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
4fe192df |
|
09-Sep-2020 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Shorten per-grace-period sleep for RCU Tasks Trace The various RCU tasks flavors currently wait 100 milliseconds between each grace period in order to prevent CPU-bound loops and to favor efficiency over latency. However, RCU Tasks Trace needs to have a grace-period latency of roughly 25 milliseconds, which is completely infeasible given the 100-millisecond per-grace-period sleep. This commit therefore reduces this sleep duration to 5 milliseconds (or one jiffy, whichever is longer) in kernels built with CONFIG_TASKS_TRACE_RCU_READ_MB=y. Link: https://lore.kernel.org/bpf/CAADnVQK_AiX+S_L_A4CQWT11XyveppBbQSQgH_qWGyzu_E8Yeg@mail.gmail.com/ Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Jiri Olsa <jolsa@redhat.com> Cc: <bpf@vger.kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
574de876 |
|
09-Sep-2020 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Selectively enable more RCU Tasks Trace IPIs Many workloads are quite sensitive to IPIs, and such workloads should build kernels with CONFIG_TASKS_TRACE_RCU_READ_MB=y to prevent RCU Tasks Trace from using them under normal conditions. However, other workloads are quite happy to permit more IPIs if doing so makes BPF program updates go faster. This commit therefore sets the default value for the rcupdate.rcu_task_ipi_delay kernel parameter to zero for kernels that have been built with CONFIG_TASKS_TRACE_RCU_READ_MB=n, while retaining the old default of (HZ / 10) for kernels that have indicated an aversion to IPIs via CONFIG_TASKS_TRACE_RCU_READ_MB=y. Link: https://lore.kernel.org/bpf/CAADnVQK_AiX+S_L_A4CQWT11XyveppBbQSQgH_qWGyzu_E8Yeg@mail.gmail.com/ Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Jiri Olsa <jolsa@redhat.com> Cc: <bpf@vger.kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
78edc005 |
|
25-Aug-2020 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Prevent complaints of unused show_rcu_tasks_classic_gp_kthread() Commit 8344496e8b49 ("rcu-tasks: Conditionally compile show_rcu_tasks_gp_kthreads()") introduced conditional compilation of several functions, but forgot one occurrence of show_rcu_tasks_classic_gp_kthread() that causes the compiler to warn of an unused static function. This commit uses "static inline" to avoid these complaints and possibly also to avoid emitting an actual definition of this function. Fixes: 8344496e8b49 ("rcu-tasks: Conditionally compile show_rcu_tasks_gp_kthreads()") Cc: <stable@vger.kernel.org> # 5.8.x Reported-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
2393a613 |
|
09-Sep-2020 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Use more aggressive polling for RCU Tasks Trace The RCU Tasks Trace grace periods are too slow, as in 40x slower than those of RCU Tasks. This is due to my having assumed a one-second grace period was OK, and thus not having optimized any further. This commit provides the first step in this optimization process, namely by allowing the task_list scan backoff interval to be specified on a per-flavor basis, and then speeding up the scans for RCU Tasks Trace. However, kernels built with CONFIG_TASKS_TRACE_RCU_READ_MB=y continue to use the old slower backoff, consistent with that Kconfig option's goal of reducing IPIs. Link: https://lore.kernel.org/bpf/CAADnVQK_AiX+S_L_A4CQWT11XyveppBbQSQgH_qWGyzu_E8Yeg@mail.gmail.com/ Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Jiri Olsa <jolsa@redhat.com> Cc: <bpf@vger.kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
6731da9e |
|
09-Sep-2020 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Mark variables static The n_heavy_reader_attempts, n_heavy_reader_updates, and n_heavy_reader_ofl_updates variables are not used outside of their translation unit, so this commit marks them static. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
c7dcf810 |
|
12-Jun-2020 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Fix synchronize_rcu_tasks_trace() header comment The synchronize_rcu_tasks_trace() header comment incorrectly claims that any number of things delimit RCU Tasks Trace read-side critical sections, when in fact only rcu_read_lock_trace() and rcu_read_unlock_trace() do so. This commit therefore fixes this comment, and, while in the area, fixes a typo in the rcu_read_lock_trace() header comment. Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
30d8aa51 |
|
09-Jun-2020 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Fix code-style issues This commit declares trc_n_readers_need_end and trc_wait static and replaced a "&" with "&&". The "&" happened to work because the values are bool, but accidents waiting to happen and all that... Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
8344496e |
|
28-May-2020 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Conditionally compile show_rcu_tasks_gp_kthreads() The show_rcu_tasks_gp_kthreads() function is not invoked by Tiny RCU, but is nevertheless defined in Tiny RCU builds that enable Tasks Trace RCU. This commit therefore conditionally compiles this function so that it is defined only in builds that actually use it. Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
04a3c5aa |
|
28-May-2020 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Make rcu_tasks_postscan() be static The rcu_tasks_postscan() function is not used outside of RCU's tasks.h file, so this commit makes it be static. Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
ea6eed9f |
|
07-May-2020 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Convert sleeps to idle priority This commit converts the long-standing schedule_timeout_interruptible() and schedule_timeout_uninterruptible() calls used by the various Tasks RCU's grace-period kthreads to schedule_timeout_idle(). This conversion avoids polluting the load-average with Tasks-RCU-related sleeping. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
25246fc8 |
|
05-Apr-2020 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Allow standalone use of TASKS_{TRACE_,}RCU This commit allows TASKS_TRACE_RCU to be used independently of TASKS_RCU and vice versa. [ paulmck: Fix conditional compilation per kbuild test robot feedback. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
7e0669c3 |
|
25-Mar-2020 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Add IPI failure count to statistics This commit adds a failure-return count for smp_call_function_single(), and adds this to the console messages for rcutorture writer stalls and at the end of rcutorture testing. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
edf3775f |
|
22-Mar-2020 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Add count for idle tasks on offline CPUs This commit adds a counter for the number of times the quiescent state was an idle task associated with an offline CPU, and prints this count at the end of rcutorture runs and at stall time. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
40471509 |
|
22-Mar-2020 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Add rcu_dynticks_zero_in_eqs() effectiveness statistics This commit adds counts of the number of calls and number of successful calls to rcu_dynticks_zero_in_eqs(), which are printed at the end of rcutorture runs and at stall time. This allows evaluation of the effectiveness of rcu_dynticks_zero_in_eqs(). Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
9796e1ae |
|
22-Mar-2020 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Make RCU tasks trace also wait for idle tasks This commit scans the CPUs, adding each CPU's idle task to the list of tasks that need quiescent states. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
7e3b70e0 |
|
22-Mar-2020 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Handle the running-offline idle-task special case The idle task corresponding to an offline CPU can appear to be running while that CPU is offline. This commit therefore adds checks for this situation, treating it as a quiescent state. Because the tasklist scan and the holdout-list scan now exclude CPU-hotplug operations, readers on the CPU-hotplug paths are still waited for. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
81b4a7bc |
|
22-Mar-2020 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Disable CPU hotplug across RCU tasks trace scans This commit disables CPU hotplug across RCU tasks trace scans, which is a first step towards correctly recognizing idle tasks "running" on offline CPUs. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
b38f57c1 |
|
20-Mar-2020 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Allow rcu_read_unlock_trace() under scheduler locks The rcu_read_unlock_trace() can invoke rcu_read_unlock_trace_special(), which in turn can call wake_up(). Therefore, if any scheduler lock is held across a call to rcu_read_unlock_trace(), self-deadlock can occur. This commit therefore uses the irq_work facility to defer the wake_up() to a clean environment where no scheduler locks will be held. Reported-by: Steven Rostedt <rostedt@goodmis.org> [ paulmck: Update #includes for m68k per kbuild test robot. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
7d0c9c50 |
|
19-Mar-2020 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Avoid IPIing userspace/idle tasks if kernel is so built Systems running CPU-bound real-time task do not want IPIs sent to CPUs executing nohz_full userspace tasks. Battery-powered systems don't want IPIs sent to idle CPUs in low-power mode. Unfortunately, RCU tasks trace can and will send such IPIs in some cases. Both of these situations occur only when the target CPU is in RCU dyntick-idle mode, in other words, when RCU is not watching the target CPU. This suggests that CPUs in dyntick-idle mode should use memory barriers in outermost invocations of rcu_read_lock_trace() and rcu_read_unlock_trace(), which would allow the RCU tasks trace grace period to directly read out the target CPU's read-side state. One challenge is that RCU tasks trace is not targeting a specific CPU, but rather a task. And that task could switch from one CPU to another at any time. This commit therefore uses try_invoke_on_locked_down_task() and checks for task_curr() in trc_inspect_reader_notrunning(). When this condition holds, the target task is running and cannot move. If CONFIG_TASKS_TRACE_RCU_READ_MB=y, the new rcu_dynticks_zero_in_eqs() function can be used to check if the specified integer (in this case, t->trc_reader_nesting) is zero while the target CPU remains in that same dyntick-idle sojourn. If so, the target task is in a quiescent state. If not, trc_read_check_handler() must indicate failure so that the grace-period kthread can take appropriate action or retry after an appropriate delay, as the case may be. With this change, given CONFIG_TASKS_TRACE_RCU_READ_MB=y, if a given CPU remains idle or a given task continues executing in nohz_full mode, the RCU tasks trace grace-period kthread will detect this without the need to send an IPI. Suggested-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
9ae58d7b |
|
18-Mar-2020 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Add Kconfig option to mediate smp_mb() vs. IPI This commit provides a new TASKS_TRACE_RCU_READ_MB Kconfig option that enables use of read-side memory barriers by both rcu_read_lock_trace() and rcu_read_unlock_trace() when the are executed with the current->trc_reader_special.b.need_mb flag set. This flag is currently never set. Doing that is the subject of a later commit. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
238dbce3 |
|
18-Mar-2020 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Add grace-period and IPI counts to statistics This commit adds a grace-period count and a count of IPIs sent since boot, which is printed in response to rcutorture writer stalls and at the end of rcutorture testing. These counts will be used to evaluate various schemes to reduce the number of IPIs sent. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
276c4104 |
|
17-Mar-2020 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Split ->trc_reader_need_end This commit splits ->trc_reader_need_end by using the rcu_special union. This change permits readers to check to see if a memory barrier is required without any added overhead in the common case where no such barrier is required. This commit also adds the read-side checking. Later commits will add the machinery to properly set the new ->trc_reader_special.b.need_mb field. This commit also makes rcu_read_unlock_trace_special() tolerate nested read-side critical sections within interrupt and NMI handlers. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
b0afa0f0 |
|
17-Mar-2020 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Provide boot parameter to delay IPIs until late in grace period This commit provides a rcupdate.rcu_task_ipi_delay kernel boot parameter that specifies how old the RCU tasks trace grace period must be before the grace-period kthread starts sending IPIs. This delay allows more tasks to pass through rcu_tasks_qs() quiescent states, thus reducing (or even eliminating) the number of IPIs that must be sent. On a short rcutorture test setting this kernel boot parameter to HZ/2 resulted in zero IPIs for all 877 RCU-tasks trace grace periods that elapsed during that test. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
88092d0c |
|
17-Mar-2020 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Add a grace-period start time for throttling and debug This commit adds a place to record the grace-period start in jiffies. This will be used by later commits for debugging purposes and to throttle IPIs early in the grace period. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
43766c3e |
|
16-Mar-2020 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Make RCU Tasks Trace make use of RCU scheduler hooks This commit makes the calls to rcu_tasks_qs() detect and report quiescent states for RCU tasks trace. If the task is in a quiescent state and if ->trc_reader_checked is not yet set, the task sets its own ->trc_reader_checked. This will cause the grace-period kthread to remove it from the holdout list if it still remains there. [ paulmck: Fix conditional compilation per kbuild test robot feedback. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
af051ca4 |
|
16-Mar-2020 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Make rcutorture writer stall output include GP state This commit adds grace-period state and time to the rcutorture writer stall output. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
e21408ce |
|
16-Mar-2020 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Add RCU tasks to rcutorture writer stall output This commit adds state for each RCU-tasks flavor to the rcutorture writer stall output. The initial state is minimal, but you have to start somewhere. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> [ paulmck: Fixes based on feedback from kbuild test robot. ]
|
#
8fd8ca38 |
|
15-Mar-2020 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Move #ifdef into tasks.h This commit pushes the #ifdef CONFIG_TASKS_RCU_GENERIC from kernel/rcu/update.c to kernel/rcu/tasks.h in order to improve readability as more APIs are added. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
4593e772 |
|
10-Mar-2020 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Add stall warnings for RCU Tasks Trace This commit adds RCU CPU stall warnings for RCU Tasks Trace. These dump out any tasks blocking the current grace period, as well as any CPUs that have not responded to an IPI request. This happens in two phases, when initially extracting state from the tasks and later when waiting for any holdout tasks to check in. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
d5f177d3 |
|
09-Mar-2020 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Add an RCU Tasks Trace to simplify protection of tracing hooks Because RCU does not watch exception early-entry/late-exit, idle-loop, or CPU-hotplug execution, protection of tracing and BPF operations is needlessly complicated. This commit therefore adds a variant of Tasks RCU that: o Has explicit read-side markers to allow finite grace periods in the face of in-kernel loops for PREEMPT=n builds. These markers are rcu_read_lock_trace() and rcu_read_unlock_trace(). o Protects code in the idle loop, exception entry/exit, and CPU-hotplug code paths. In this respect, RCU-tasks trace is similar to SRCU, but with lighter-weight readers. o Avoids expensive read-side instruction, having overhead similar to that of Preemptible RCU. There are of course downsides: o The grace-period code can send IPIs to CPUs, even when those CPUs are in the idle loop or in nohz_full userspace. This is mitigated by later commits. o It is necessary to scan the full tasklist, much as for Tasks RCU. o There is a single callback queue guarded by a single lock, again, much as for Tasks RCU. However, those early use cases that request multiple grace periods in quick succession are expected to do so from a single task, which makes the single lock almost irrelevant. If needed, multiple callback queues can be provided using any number of schemes. Perhaps most important, this variant of RCU does not affect the vanilla flavors, rcu_preempt and rcu_sched. The fact that RCU Tasks Trace readers can operate from idle, offline, and exception entry/exit in no way enables rcu_preempt and rcu_sched readers to do so. The memory ordering was outlined here: https://lore.kernel.org/lkml/20200319034030.GX3199@paulmck-ThinkPad-P72/ This effort benefited greatly from off-list discussions of BPF requirements with Alexei Starovoitov and Andrii Nakryiko. At least some of the on-list discussions are captured in the Link: tags below. In addition, KCSAN was quite helpful in finding some early bugs. Link: https://lore.kernel.org/lkml/20200219150744.428764577@infradead.org/ Link: https://lore.kernel.org/lkml/87mu8p797b.fsf@nanos.tec.linutronix.de/ Link: https://lore.kernel.org/lkml/20200225221305.605144982@linutronix.de/ Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com> Cc: Andrii Nakryiko <andriin@fb.com> [ paulmck: Apply feedback from Steve Rostedt and Joel Fernandes. ] [ paulmck: Decrement trc_n_readers_need_end upon IPI failure. ] [ paulmck: Fix locking issue reported by rcutorture. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
d01aa263 |
|
05-Mar-2020 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Code movement to allow more Tasks RCU variants This commit does nothing but move rcu_tasks_wait_gp() up to a new section for common code. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
e4fe5dd6 |
|
04-Mar-2020 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Further refactor RCU-tasks to allow adding more variants This commit refactors RCU tasks to allow variants to be added. These variants will share the current Tasks-RCU tasklist scan and the holdout list processing. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
c97d12a6 |
|
03-Mar-2020 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Use unique names for RCU-Tasks kthreads and messages This commit causes the flavors of RCU Tasks to use different names for their kthreads and in their console messages. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
c84aad76 |
|
02-Mar-2020 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Add an RCU-tasks rude variant This commit adds a "rude" variant of RCU-tasks that has as quiescent states schedule(), cond_resched_tasks_rcu_qs(), userspace execution, and (in theory, anyway) cond_resched(). In other words, RCU-tasks rude readers are regions of code with preemption disabled, but excluding code early in the CPU-online sequence and late in the CPU-offline sequence. Updates make use of IPIs and force an IPI and a context switch on each online CPU. This variant is useful in some situations in tracing. Suggested-by: Steven Rostedt <rostedt@goodmis.org> [ paulmck: Apply EXPORT_SYMBOL_GPL() feedback from Qiujun Huang. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org> [ paulmck: Apply review feedback from Steve Rostedt. ]
|
#
5873b8a9 |
|
03-Mar-2020 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Refactor RCU-tasks to allow variants to be added This commit splits out generic processing from RCU-tasks-specific processing in order to allow additional flavors to be added. It also adds a def_bool TASKS_RCU_GENERIC to enable the common RCU-tasks infrastructure code. This is primarily, but not entirely, a code-movement commit. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
07e10515 |
|
02-Mar-2020 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Create struct to hold state information This commit creates an rcu_tasks struct to hold state information for RCU Tasks. This is a preparation commit for adding additional flavors of Tasks RCU, each of which would have its own rcu_tasks struct. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
eacd6f04 |
|
02-Mar-2020 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Move Tasks RCU to its own file This code-movement-only commit is in preparation for adding an additional flavor of Tasks RCU, which relies on workqueues to detect grace periods. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|