#
b67cffcb |
|
12-Jan-2024 |
Frederic Weisbecker <frederic@kernel.org> |
rcu/exp: Handle parallel exp gp kworkers affinity Affine the parallel expedited gp kworkers to their respective RCU node in order to make them close to the cache their are playing with. This reuses the boost kthreads machinery that probe into CPU hotplug operations such that the kthreads become/stay affine to their respective node as soon/long as they contain online CPUs. Otherwise and if the current CPU going down was the last online on the leaf node, the related kthread is affine to the housekeeping CPUs. In the long run, this affinity VS CPU hotplug operation game should probably be implemented at the generic kthread level. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Reviewed-by: Paul E. McKenney <paulmck@kernel.org> [boqun: s/* rcu_boost_task/*rcu_boost_task as reported by checkpatch] Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
|
#
8e5e6215 |
|
12-Jan-2024 |
Frederic Weisbecker <frederic@kernel.org> |
rcu/exp: Make parallel exp gp kworker per rcu node When CONFIG_RCU_EXP_KTHREAD=n, the expedited grace period per node initialization is performed in parallel via workqueues (one work per node). However in CONFIG_RCU_EXP_KTHREAD=y, this per node initialization is performed by a single kworker serializing each node initialization (one work for all nodes). The second part is certainly less scalable and efficient beyond a single leaf node. To improve this, expand this single kworker into per-node kworkers. This new layout is eventually intended to remove the workqueues based implementation since it will essentially now become duplicate code. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Reviewed-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
|
#
7836b270 |
|
12-Jan-2024 |
Frederic Weisbecker <frederic@kernel.org> |
rcu: s/boost_kthread_mutex/kthread_mutex This mutex is currently protecting per node boost kthreads creation and affinity setting across CPU hotplug operations. Since the expedited kworkers will soon be split per node as well, they will be subject to the same concurrency constraints against hotplug. Therefore their creation and affinity tuning operations will be grouped with those of boost kthreads and then rely on the same mutex. To prepare for that, generalize its name. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Reviewed-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
|
#
9146eb25 |
|
07-Apr-2023 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Mark additional concurrent load from ->cpu_no_qs.b.exp The per-CPU rcu_data structure's ->cpu_no_qs.b.exp field is updated only on the instance corresponding to the current CPU, but can be read more widely. Unmarked accesses are OK from the corresponding CPU, but only if interrupts are disabled, given that interrupt handlers can and do modify this field. Unfortunately, although the load from rcu_preempt_deferred_qs() is always carried out from the corresponding CPU, interrupts are not necessarily disabled. This commit therefore upgrades this load to READ_ONCE. Similarly, the diagnostic access from synchronize_rcu_expedited_wait() might run with interrupts disabled and from some other CPU. This commit therefore marks this load with data_race(). Finally, the C-language access in rcu_preempt_ctxt_queue() is OK as is because interrupts are disabled and this load is always from the corresponding CPU. This commit adds a comment giving the rationale for this access being safe. This data race was reported by KCSAN. Not appropriate for backporting due to failure being unlikely. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
6343402a |
|
06-Sep-2022 |
Pingfan Liu <kernelfans@gmail.com> |
rcu: Synchronize ->qsmaskinitnext in rcu_boost_kthread_setaffinity() Once either rcutree_online_cpu() or rcutree_dead_cpu() is invoked concurrently, the following rcu_boost_kthread_setaffinity() race can occur: CPU 1 CPU2 mask = rcu_rnp_online_cpus(rnp); ... mask = rcu_rnp_online_cpus(rnp); ... set_cpus_allowed_ptr(t, cm); set_cpus_allowed_ptr(t, cm); This results in CPU2's update being overwritten by that of CPU1, and thus the possibility of ->boost_kthread_task continuing to run on a to-be-offlined CPU. This commit therefore eliminates this race by relying on the pre-existing acquisition of ->boost_kthread_mutex to serialize the full process of changing the affinity of ->boost_kthread_task. Signed-off-by: Pingfan Liu <kernelfans@gmail.com> Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: Frederic Weisbecker <frederic@kernel.org> Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: "Jason A. Donenfeld" <Jason@zx2c4.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
528262f5 |
|
18-Jul-2022 |
Zqiang <qiang1.zhang@intel.com> |
rcu-tasks: Make RCU Tasks Trace check for userspace execution Userspace execution is a valid quiescent state for RCU Tasks Trace, but the scheduling-clock interrupt does not currently report such quiescent states. Of course, the scheduling-clock interrupt is not strictly speaking userspace execution. However, the only way that this code is not in a quiescent state is if something invoked rcu_read_lock_trace(), and that would be reflected in the ->trc_reader_nesting field in the task_struct structure. Furthermore, this field is checked by rcu_tasks_trace_qs(), which is invoked by rcu_tasks_qs() which is in turn invoked by rcu_note_voluntary_context_switch() in kernels building at least one of the RCU Tasks flavors. It is therefore safe to invoke rcu_tasks_trace_qs() from the rcu_sched_clock_irq(). But rcu_tasks_qs() also invokes rcu_tasks_classic_qs() for RCU Tasks, which lacks the read-side markers provided by RCU Tasks Trace. This raises the possibility that an RCU Tasks grace period could start after the interrupt from userspace execution, but before the call to rcu_sched_clock_irq(). However, it turns out that this is safe because the RCU Tasks grace period waits for an RCU grace period, which will wait for the entire scheduling-clock interrupt handler, including any RCU Tasks read-side critical section that this handler might contain. This commit therefore updates the rcu_sched_clock_irq() function's check for usermode execution and its call to rcu_tasks_classic_qs() to instead check for both usermode execution and interrupt from idle, and to instead call rcu_note_voluntary_context_switch(). This consolidates code and provides more faster RCU Tasks Trace reporting of quiescent states in kernels that do scheduling-clock interrupts for userspace execution. [ paulmck: Consolidate checks into rcu_sched_clock_irq(). ] Signed-off-by: Zqiang <qiang1.zhang@intel.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
7634b1ea |
|
24-Aug-2022 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Exclude outgoing CPU when it is the last to leave The rcu_boost_kthread_setaffinity() function removes the outgoing CPU from the set_cpus_allowed() mask for the corresponding leaf rcu_node structure's rcub priority-boosting kthread. Except that if the outgoing CPU will leave that structure without any online CPUs, the mask is set to the housekeeping CPU mask from housekeeping_cpumask(). Which is fine unless the outgoing CPU happens to be a housekeeping CPU. This commit therefore removes the outgoing CPU from the housekeeping mask. This would of course be problematic if the outgoing CPU was the last online housekeeping CPU, but in that case you are in a world of hurt anyway. If someone comes up with a valid use case for a system needing all the housekeeping CPUs to be offline, further adjustments can be made. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
621189a1 |
|
07-Aug-2022 |
Zqiang <qiang1.zhang@intel.com> |
rcu: Avoid triggering strict-GP irq-work when RCU is idle Kernels built with PREEMPT_RCU=y and RCU_STRICT_GRACE_PERIOD=y trigger irq-work from rcu_read_unlock(), and the resulting irq-work handler invokes rcu_preempt_deferred_qs_handle(). The point of this triggering is to force grace periods to end quickly in order to give tools like KASAN a better chance of detecting RCU usage bugs such as leaking RCU-protected pointers out of an RCU read-side critical section. However, this irq-work triggering is unconditional. This works, but there is no point in doing this irq-work unless the current grace period is waiting on the running CPU or task, which is not the common case. After all, in the common case there are many rcu_read_unlock() calls per CPU per grace period. This commit therefore triggers the irq-work only when the current grace period is waiting on the running CPU or task. This change was tested as follows on a four-CPU system: echo rcu_preempt_deferred_qs_handler > /sys/kernel/debug/tracing/set_ftrace_filter echo 1 > /sys/kernel/debug/tracing/function_profile_enabled insmod rcutorture.ko sleep 20 rmmod rcutorture.ko echo 0 > /sys/kernel/debug/tracing/function_profile_enabled echo > /sys/kernel/debug/tracing/set_ftrace_filter This procedure produces results in this per-CPU set of files: /sys/kernel/debug/tracing/trace_stat/function* Sample output from one of these files is as follows: Function Hit Time Avg s^2 -------- --- ---- --- --- rcu_preempt_deferred_qs_handle 838746 182650.3 us 0.217 us 0.004 us The baseline sum of the "Hit" values (the number of calls to this function) was 3,319,015. With this commit, that sum was 1,140,359, for a 2.9x reduction. The worst-case variance across the CPUs was less than 25%, so this large effect size is statistically significant. The raw data is available in the Link: URL. Link: https://lore.kernel.org/all/20220808022626.12825-1-qiang1.zhang@intel.com/ Signed-off-by: Zqiang <qiang1.zhang@intel.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
089254fd |
|
03-Aug-2022 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Document reason for rcu_all_qs() call to preempt_disable() Given that rcu_all_qs() is in non-preemptible kernels, why on earth should it invoke preempt_disable()? This commit adds the reason, which is to work nicely with debugging enabled in CONFIG_PREEMPT_COUNT=y kernels. Reported-by: Neeraj Upadhyay <quic_neeraju@quicinc.com> Reported-by: Boqun Feng <boqun.feng@gmail.com> Reported-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
bca4fa8c |
|
20-Jun-2022 |
Zqiang <qiang1.zhang@intel.com> |
rcu: Update rcu_preempt_deferred_qs() comments for !PREEMPT kernels In non-premptible kernels, tasks never do context switches within RCU read-side critical sections. Therefore, in such kernels, each leaf rcu_node structure's ->blkd_tasks list will always be empty. The comment on the non-preemptible version of rcu_preempt_deferred_qs() confuses this point, so this commit therefore fixes it. Signed-off-by: Zqiang <qiang1.zhang@intel.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
6d60ea03 |
|
16-Jun-2022 |
Zqiang <qiang1.zhang@intel.com> |
rcu: Fix rcu_read_unlock_strict() strict QS reporting Kernels built with CONFIG_PREEMPT=n and CONFIG_RCU_STRICT_GRACE_PERIOD=y report the quiescent state directly from the outermost rcu_read_unlock(). However, the current CPU's rcu_data structure's ->cpu_no_qs.b.norm might still be set, in which case rcu_report_qs_rdp() will exit early, thus failing to report quiescent state. This commit therefore causes rcu_read_unlock_strict() to clear CPU's rcu_data structure's ->cpu_no_qs.b.norm field before invoking rcu_report_qs_rdp(). Signed-off-by: Zqiang <qiang1.zhang@intel.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
51038506 |
|
29-Apr-2022 |
Zqiang <qiang1.zhang@intel.com> |
rcu: Add nocb_cb_kthread check to rcu_is_callbacks_kthread() Callbacks are invoked in RCU kthreads when calbacks are offloaded (rcu_nocbs boot parameter) or when RCU's softirq handler has been offloaded to rcuc kthreads (use_softirq==0). The current code allows for the rcu_nocbs case but not the use_softirq case. This commit adds support for the use_softirq case. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Zqiang <qiang1.zhang@intel.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
|
#
70a82c3c |
|
12-May-2022 |
Zqiang <qiang1.zhang@intel.com> |
rcu: Immediately boost preempted readers for strict grace periods The intent of the CONFIG_RCU_STRICT_GRACE_PERIOD Konfig option is to cause normal grace periods to complete quickly in order to better catch errors resulting from improperly leaking pointers from RCU read-side critical sections. However, kernels built with this option enabled still wait for some hundreds of milliseconds before boosting RCU readers that have been preempted within their current critical section. The value of this delay is set by the CONFIG_RCU_BOOST_DELAY Kconfig option, which defaults to 500 milliseconds. This commit therefore causes kernels build with strict grace periods to ignore CONFIG_RCU_BOOST_DELAY. This causes rcu_initiate_boost() to start boosting immediately after all CPUs on a given leaf rcu_node structure have passed through their quiescent states. Signed-off-by: Zqiang <qiang1.zhang@intel.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
|
#
48f8070f |
|
26-Apr-2022 |
Patrick Wang <patrick.wang.shcn@gmail.com> |
rcu: Avoid tracing a few functions executed in stop machine Stop-machine recently started calling additional functions while waiting: ---------------------------------------------------------------- Former stop machine wait loop: do { cpu_relax(); => macro ... } while (curstate != STOPMACHINE_EXIT); ----------------------------------------------------------------- Current stop machine wait loop: do { stop_machine_yield(cpumask); => function (notraced) ... touch_nmi_watchdog(); => function (notraced, inside calls also notraced) ... rcu_momentary_dyntick_idle(); => function (notraced, inside calls traced) } while (curstate != MULTI_STOP_EXIT); ------------------------------------------------------------------ These functions (and the functions that they call) must be marked notrace to prevent them from being updated while they are executing. The consequences of failing to mark these functions can be severe: rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: rcu: 1-...!: (0 ticks this GP) idle=14f/1/0x4000000000000000 softirq=3397/3397 fqs=0 rcu: 3-...!: (0 ticks this GP) idle=ee9/1/0x4000000000000000 softirq=5168/5168 fqs=0 (detected by 0, t=8137 jiffies, g=5889, q=2 ncpus=4) Task dump for CPU 1: task:migration/1 state:R running task stack: 0 pid: 19 ppid: 2 flags:0x00000000 Stopper: multi_cpu_stop+0x0/0x18c <- stop_machine_cpuslocked+0x128/0x174 Call Trace: Task dump for CPU 3: task:migration/3 state:R running task stack: 0 pid: 29 ppid: 2 flags:0x00000000 Stopper: multi_cpu_stop+0x0/0x18c <- stop_machine_cpuslocked+0x128/0x174 Call Trace: rcu: rcu_preempt kthread timer wakeup didn't happen for 8136 jiffies! g5889 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 rcu: Possible timer handling issue on cpu=2 timer-softirq=594 rcu: rcu_preempt kthread starved for 8137 jiffies! g5889 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=2 rcu: Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior. rcu: RCU grace-period kthread stack dump: task:rcu_preempt state:I stack: 0 pid: 14 ppid: 2 flags:0x00000000 Call Trace: schedule+0x56/0xc2 schedule_timeout+0x82/0x184 rcu_gp_fqs_loop+0x19a/0x318 rcu_gp_kthread+0x11a/0x140 kthread+0xee/0x118 ret_from_exception+0x0/0x14 rcu: Stack dump where RCU GP kthread last ran: Task dump for CPU 2: task:migration/2 state:R running task stack: 0 pid: 24 ppid: 2 flags:0x00000000 Stopper: multi_cpu_stop+0x0/0x18c <- stop_machine_cpuslocked+0x128/0x174 Call Trace: This commit therefore marks these functions notrace: rcu_preempt_deferred_qs() rcu_preempt_need_deferred_qs() rcu_preempt_deferred_qs_irqrestore() [ paulmck: Apply feedback from Neeraj Upadhyay. ] Signed-off-by: Patrick Wang <patrick.wang.shcn@gmail.com> Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
|
#
17211455 |
|
08-Jun-2022 |
Frederic Weisbecker <frederic@kernel.org> |
rcu/context-tracking: Move RCU-dynticks internal functions to context_tracking Move the core RCU eqs/dynticks functions to context tracking so that we can later merge all that code within context tracking. Acked-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com> Cc: Uladzislau Rezki <uladzislau.rezki@sony.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Nicolas Saenz Julienne <nsaenz@kernel.org> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Xiongfeng Wang <wangxiongfeng2@huawei.com> Cc: Yu Liao <liaoyu15@huawei.com> Cc: Phil Auld <pauld@redhat.com> Cc: Paul Gortmaker<paul.gortmaker@windriver.com> Cc: Alex Belits <abelits@marvell.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Nicolas Saenz Julienne <nsaenzju@redhat.com> Tested-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>
|
#
6a694411 |
|
24-May-2022 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Make rcu_note_context_switch() unconditionally call rcu_tasks_qs() This commit makes rcu_note_context_switch() unconditionally invoke the rcu_tasks_qs() function, as opposed to doing so only when RCU (as opposed to RCU Tasks Trace) urgently needs a grace period to end. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Martin KaFai Lau <kafai@fb.com> Cc: KP Singh <kpsingh@kernel.org>
|
#
f596e2ce |
|
03-Apr-2022 |
Zqiang <qiang1.zhang@intel.com> |
rcu: Use IRQ_WORK_INIT_HARD() to avoid rcu_read_unlock() hangs When booting kernels built with both CONFIG_RCU_STRICT_GRACE_PERIOD=y and CONFIG_PREEMPT_RT=y, the rcu_read_unlock_special() function's invocation of irq_work_queue_on() the init_irq_work() causes the rcu_preempt_deferred_qs_handler() function to work execute in SCHED_FIFO irq_work kthreads. Because rcu_read_unlock_special() is invoked on each rcu_read_unlock() in such kernels, the amount of work just keeps piling up, resulting in a boot-time hang. This commit therefore avoids this hang by using IRQ_WORK_INIT_HARD() instead of init_irq_work(), but only in kernels built with both CONFIG_PREEMPT_RT=y and CONFIG_RCU_STRICT_GRACE_PERIOD=y. Signed-off-by: Zqiang <qiang1.zhang@intel.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
88ca472f |
|
24-Mar-2022 |
Zqiang <qiang1.zhang@intel.com> |
rcu: Check for successful spawn of ->boost_kthread_task For the spawning of the priority-boost kthreads can fail, improbable though this might seem. This commit therefore refrains from attemoting to initiate RCU priority boosting when The ->boost_kthread_task pointer is NULL. Signed-off-by: Zqiang <qiang1.zhang@intel.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
90d2efe7 |
|
16-Feb-2022 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Fix rcu_preempt_deferred_qs_irqrestore() strict QS reporting Suppose we have a kernel built with both CONFIG_RCU_STRICT_GRACE_PERIOD=y and CONFIG_PREEMPT=y. Suppose further that an RCU reader from which RCU core needs a quiescent state ends in rcu_preempt_deferred_qs_irqrestore(). This function will then invoke rcu_report_qs_rdp() in order to immediately report that quiescent state. Unfortunately, it will not have cleared that reader's CPU's rcu_data structure's ->cpu_no_qs.b.norm field. As a result, rcu_report_qs_rdp() will take an early exit because it will believe that this CPU has not yet encountered a quiescent state, and there will be no reporting of the current quiescent state. This commit therefore causes rcu_preempt_deferred_qs_irqrestore() to clear the ->cpu_no_qs.b.norm field before invoking rcu_report_qs_rdp(). Kudos to Boqun Feng and Neeraj Upadhyay for helping with analysis of this issue! Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
3352911f |
|
16-Feb-2022 |
Frederic Weisbecker <frederic@kernel.org> |
rcu: Initialize boost kthread only for boot node prior SMP initialization The rcu_spawn_gp_kthread() function is called as an early initcall, which means that SMP initialization hasn't happened yet and only the boot CPU is online. Therefore, create only the boost kthread for the leaf node of the boot CPU. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com> Cc: Uladzislau Rezki <uladzislau.rezki@sony.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
04d4e665 |
|
07-Feb-2022 |
Frederic Weisbecker <frederic@kernel.org> |
sched/isolation: Use single feature type while referring to housekeeping cpumask Refer to housekeeping APIs using single feature types instead of flags. This prevents from passing multiple isolation features at once to housekeeping interfaces, which soon won't be possible anymore as each isolation features will have their own cpumask. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Juri Lelli <juri.lelli@redhat.com> Reviewed-by: Phil Auld <pauld@redhat.com> Link: https://lore.kernel.org/r/20220207155910.527133-5-frederic@kernel.org
|
#
6a2c1d45 |
|
23-Jan-2022 |
Yury Norov <yury.norov@gmail.com> |
rcu: Replace cpumask_weight with cpumask_empty where appropriate In some places, RCU code calls cpumask_weight() to check if any bit of a given cpumask is set. We can do it more efficiently with cpumask_empty() because cpumask_empty() stops traversing the cpumask as soon as it finds first set bit, while cpumask_weight() counts all bits unconditionally. Signed-off-by: Yury Norov <yury.norov@gmail.com> Acked-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
218b957a |
|
08-Dec-2021 |
David Woodhouse <dwmw@amazon.co.uk> |
rcu: Add mutex for rcu boost kthread spawning and affinity setting As we handle parallel CPU bringup, we will need to take care to avoid spawning multiple boost threads, or race conditions when setting their affinity. Spotted by Paul McKenney. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
5ae0f1b5 |
|
10-Dec-2021 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Create and use an rcu_rdp_cpu_online() The pattern "rdp->grpmask & rcu_rnp_online_cpus(rnp)" occurs frequently in RCU code in order to determine whether rdp->cpu is online from an RCU perspective. This commit therefore creates an rcu_rdp_cpu_online() function to replace it. [ paulmck: Apply kernel test robot unused-variable feedback. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
c9515875 |
|
24-Jan-2022 |
Zqiang <qiang1.zhang@intel.com> |
rcu: Add per-CPU rcuc task dumps to RCU CPU stall warnings When the rcutree.use_softirq kernel boot parameter is set to zero, all RCU_SOFTIRQ processing is carried out by the per-CPU rcuc kthreads. If these kthreads are being starved, quiescent states will not be reported, which in turn means that the grace period will not end, which can in turn trigger RCU CPU stall warnings. This commit therefore dumps stack traces of stalled CPUs' rcuc kthreads, which can help identify what is preventing those kthreads from running. Suggested-by: Ammar Faizi <ammarfaizi2@gnuweeb.org> Reviewed-by: Ammar Faizi <ammarfaizi2@gnuweeb.org> Signed-off-by: Zqiang <qiang1.zhang@intel.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
10c53578 |
|
21-Jan-2022 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Don't deboost before reporting expedited quiescent state Currently rcu_preempt_deferred_qs_irqrestore() releases rnp->boost_mtx before reporting the expedited quiescent state. Under heavy real-time load, this can result in this function being preempted before the quiescent state is reported, which can in turn prevent the expedited grace period from completing. Tim Murray reports that the resulting expedited grace periods can take hundreds of milliseconds and even more than one second, when they should normally complete in less than a millisecond. This was fine given that there were no particular response-time constraints for synchronize_rcu_expedited(), as it was designed for throughput rather than latency. However, some users now need sub-100-millisecond response-time constratints. This patch therefore follows Neeraj's suggestion (seconded by Tim and by Uladzislau Rezki) of simply reversing the two operations. Reported-by: Tim Murray <timmurray@google.com> Reported-by: Joel Fernandes <joelaf@google.com> Reported-by: Neeraj Upadhyay <quic_neeraju@quicinc.com> Reviewed-by: Neeraj Upadhyay <quic_neeraju@quicinc.com> Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Tested-by: Tim Murray <timmurray@google.com> Cc: Todd Kjos <tkjos@google.com> Cc: Sandeep Patil <sspatil@google.com> Cc: <stable@vger.kernel.org> # 5.4.x Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
eae9f147 |
|
12-Dec-2021 |
Neeraj Upadhyay <quic_neeraju@quicinc.com> |
rcu: Remove unused rcu_state.boost Signed-off-by: Neeraj Upadhyay <quic_neeraju@quicinc.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
790da248 |
|
29-Sep-2021 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Make idle entry report expedited quiescent states In non-preemptible kernels, an unfortunately timed expedited grace period can result in the rcu_exp_handler() IPI handler setting the rcu_data structure's cpu_no_qs.b.exp field just as the target CPU enters idle. There are situations in which this field will not be checked until after that CPU exits idle. The resulting grace-period latency does not qualify as "expedited". This commit therefore checks this field upon non-preemptible idle entry in the rcu_preempt_deferred_qs() function. It also qualifies the rcu_core() preempt_count() check with IS_ENABLED(CONFIG_PREEMPT_COUNT) to prevent false-positive quiescent states from count-free kernels. Reported-by: Neeraj Upadhyay <neeraju@codeaurora.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
6120b72e |
|
16-Sep-2021 |
Frederic Weisbecker <frederic@kernel.org> |
rcu: Remove rcu_data.exp_deferred_qs and convert to rcu_data.cpu no_qs.b.exp Having two fields for the same purpose with subtle differences on different RCU flavours is confusing, especially when both fields always exist on both RCU flavours. Fortunately, it is now safe for preemptible RCU to rely on the rcu_data structure's ->cpu_no_qs.b.exp field, just like non-preemptible RCU. This commit therefore removes the ad-hoc ->exp_deferred_qs field. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
6e16b0f7 |
|
16-Sep-2021 |
Frederic Weisbecker <frederic@kernel.org> |
rcu: Move rcu_data.cpu_no_qs.b.exp reset to rcu_export_exp_rdp() On non-preemptible RCU, move clearing of the rcu_data structure's ->cpu_no_qs.b.exp filed to the actual expedited quiescent state report function, matching hw preemptible RCU handles the ->exp_deferred_qs field. This prepares for removing ->exp_deferred_qs in favor of ->cpu_no_qs.b.exp for both preemptible and non-preemptible RCU. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
a4382659 |
|
16-Sep-2021 |
Frederic Weisbecker <frederic@kernel.org> |
rcu: Ignore rdp.cpu_no_qs.b.exp on preemptible RCU's rcu_qs() Preemptible RCU does not use the rcu_data structure's ->cpu_no_qs.b.exp, instead using a separate ->exp_deferred_qs field to record the need for an expedited quiescent state. In fact ->cpu_no_qs.b.exp should never be set in preemptible RCU because preemptible RCU's expedited grace periods use other mechanisms to record quiescent states. This commit therefore removes the implicit rcu_qs() reference to ->cpu_no_qs.b.exp in favor of a direct reference to ->cpu_no_qs.b.norm. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
c2cf0767 |
|
14-Nov-2021 |
Zqiang <qiang.zhang1211@gmail.com> |
rcu: Avoid running boost kthreads on isolated CPUs When the boost kthreads are created on systems with nohz_full CPUs, the cpus_allowed_ptr is set to housekeeping_cpumask(HK_FLAG_KTHREAD). However, when the rcu_boost_kthread_setaffinity() is called, the original affinity will be changed and these kthreads can subsequently run on nohz_full CPUs. This commit makes rcu_boost_kthread_setaffinity() restrict these boost kthreads to housekeeping CPUs. Signed-off-by: Zqiang <qiang.zhang1211@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
17ea3718 |
|
23-Oct-2021 |
Zhouyi Zhou <zhouzhouyi@gmail.com> |
rcu: Improve tree_plugin.h comments and add code cleanups This commit cleans up some comments and code in kernel/rcu/tree_plugin.h. Signed-off-by: Zhouyi Zhou <zhouzhouyi@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
2407a64f |
|
27-Sep-2021 |
Changbin Du <changbin.du@intel.com> |
rcu: in_irq() cleanup This commit replaces the obsolete and ambiguous macro in_irq() with its shiny new in_hardirq() equivalent. Signed-off-by: Changbin Du <changbin.du@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
bc849e91 |
|
27-Sep-2021 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Move rcu_needs_cpu() to tree.c Now that RCU_FAST_NO_HZ is no more, there is but one implementation of the rcu_needs_cpu() function. This commit therefore moves this function from kernel/rcu/tree_plugin.c to kernel/rcu/tree.c. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
e2c73a68 |
|
27-Sep-2021 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Remove the RCU_FAST_NO_HZ Kconfig option All of the uses of CONFIG_RCU_FAST_NO_HZ=y that I have seen involve systems with RCU callbacks offloaded. In this situation, all that this Kconfig option does is slow down idle entry/exit with an additional allways-taken early exit. If this is the only use case, then this Kconfig option nothing but an attractive nuisance that needs to go away. This commit therefore removes the RCU_FAST_NO_HZ Kconfig option. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
7663ad9a |
|
28-Sep-2021 |
Peter Zijlstra <peterz@infradead.org> |
rcu: Always inline rcu_dynticks_task*_{enter,exit}() RCU managed to grow a few noinstr violations: vmlinux.o: warning: objtool: rcu_dynticks_eqs_enter()+0x0: call to rcu_dynticks_task_trace_enter() leaves .noinstr.text section vmlinux.o: warning: objtool: rcu_dynticks_eqs_exit()+0xe: call to rcu_dynticks_task_trace_exit() leaves .noinstr.text section Fix them by adding __always_inline to the relevant trivial functions. Also replace the noinstr with __always_inline for the existing rcu_dynticks_task_*() functions since noinstr would force noinline them, even when empty, which seems silly. Fixes: 7d0c9c50c5a1 ("rcu-tasks: Avoid IPIing userspace/idle tasks if kernel is so built") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
925da92b |
|
26-Aug-2021 |
Waiman Long <longman@redhat.com> |
rcu: Avoid unneeded function call in rcu_read_unlock() Since commit aa40c138cc8f3 ("rcu: Report QS for outermost PREEMPT=n rcu_read_unlock() for strict GPs") the function rcu_read_unlock_strict() is invoked by the inlined rcu_read_unlock() function. However, rcu_read_unlock_strict() is an empty function in production kernels, which are built with CONFIG_RCU_STRICT_GRACE_PERIOD=n. There is a mention of rcu_read_unlock_strict() in the BPF verifier, but this is in a deny-list, meaning that BPF does not care whether rcu_read_unlock_strict() is ever called. This commit therefore provides a slight performance improvement by hoisting the check of CONFIG_RCU_STRICT_GRACE_PERIOD from rcu_read_unlock_strict() into rcu_read_unlock(), thus avoiding the pointless call to an empty function. Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
830e6acc |
|
15-Aug-2021 |
Peter Zijlstra <peterz@infradead.org> |
locking/rtmutex: Split out the inner parts of 'struct rtmutex' RT builds substitutions for rwsem, mutex, spinlock and rwlock around rtmutexes. Split the inner working out so each lock substitution can use them with the appropriate lockdep annotations. This avoids having an extra unused lockdep map in the wrapped rtmutex. No functional change. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211302.784739994@linutronix.de
|
#
521c89b3 |
|
19-Jul-2021 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Print human-readable message for schedule() in RCU reader The WARN_ON_ONCE() invocation within the CONFIG_PREEMPT=y version of rcu_note_context_switch() triggers when there is a voluntary context switch in an RCU read-side critical section, but there is quite a gap between the output of that WARN_ON_ONCE() and this RCU-usage error. This commit therefore converts the WARN_ON_ONCE() to a WARN_ONCE() that explicitly describes the problem in its message. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
5fcb3a5f |
|
20-May-2021 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Mark accesses to ->rcu_read_lock_nesting KCSAN flags accesses to ->rcu_read_lock_nesting as data races, but in the past, the overhead of marked accesses was excessive. However, that was long ago, and much has changed since then, both in terms of hardware and of compilers. Here is data taken on an eight-core laptop using Intel(R) Core(TM) i9-10885H CPU @ 2.40GHz with a kernel built using gcc version 9.3.0, with all data in nanoseconds. Unmarked accesses (status quo), measured by three refscale runs: Minimum reader duration: 3.286 2.851 3.395 Median reader duration: 3.698 3.531 3.4695 Maximum reader duration: 4.481 5.215 5.157 Marked accesses, also measured by three refscale runs: Minimum reader duration: 3.501 3.677 3.580 Median reader duration: 4.053 3.723 3.895 Maximum reader duration: 7.307 4.999 5.511 This focused microbenhmark shows only sub-nanosecond differences which are unlikely to be visible at the system level. This commit therefore marks data-racing accesses to ->rcu_read_lock_nesting. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
fed31a4d |
|
12-Jul-2021 |
Zhouyi Zhou <zhouzhouyi@gmail.com> |
rcu: Fix macro name CONFIG_TASKS_RCU_TRACE This commit fixes several typos where CONFIG_TASKS_RCU_TRACE should instead be CONFIG_TASKS_TRACE_RCU. Among other things, these typos could cause CONFIG_TASKS_TRACE_RCU_READ_MB=y kernels to suffer from memory-ordering bugs that could result in false-positive quiescent states and too-short grace periods. Signed-off-by: Zhouyi Zhou <zhouzhouyi@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
dfcb2754 |
|
18-May-2021 |
Frederic Weisbecker <frederic@kernel.org> |
rcu/nocb: Start moving nocb code to its own plugin file The kernel/rcu/tree_plugin.h file contains not only the plugins for preemptible RCU, but also many other features including rcu_nocbs callback offloading. This offloading has become large and complex, so it is time to put it in its own file. This commit starts that process. Suggested-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> [ paulmck: Rename to tree_nocb.h, add Frederic as author. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
a616aec9 |
|
22-Mar-2021 |
Ingo Molnar <mingo@kernel.org> |
rcu: Fix various typos in comments Fix ~12 single-word typos in RCU code comments. [ paulmck: Apply feedback from Randy Dunlap. ] Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
e75bcd48 |
|
22-Feb-2021 |
Frederic Weisbecker <frederic@kernel.org> |
rcu/nocb: Unify timers Now that ->nocb_timer and ->nocb_bypass_timer have become quite similar, this commit merges them together. A new RCU_NOCB_WAKE_BYPASS wake level is introduced. As a result, timers perform all kinds of deferred wake ups but other deferred wakeup callsites only handle non-bypass wakeups in order not to wake up rcuo too early. The timer also unconditionally executes a full barrier so as to order timer_pending() and callback enqueue although the path performing RCU_NOCB_WAKE_FORCE that makes use of it is debatable. It should also test against the rdp leader instead of the current rdp. This unconditional full barrier shouldn't bring visible overhead since these timers almost never fire. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Neeraj Upadhyay <neeraju@codeaurora.org> Cc: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
87090516 |
|
22-Feb-2021 |
Frederic Weisbecker <frederic@kernel.org> |
rcu/nocb: Prepare for fine-grained deferred wakeup Tuning the deferred wakeup level must be done from a safe wakeup point. Currently those sites are: * ->nocb_timer * user/idle/guest entry * CPU down * softirq/rcuc All of these sites perform the wake up for both RCU_NOCB_WAKE and RCU_NOCB_WAKE_FORCE. In order to merge ->nocb_timer and ->nocb_bypass_timer together, we plan to add a new RCU_NOCB_WAKE_BYPASS that really should be deferred until a timer fires so that we don't wake up the NOCB-gp kthread too early. To prepare for that, this commit specifies the per-callsite wakeup level/limit. Cc: Josh Triplett <josh@joshtriplett.org> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Neeraj Upadhyay <neeraju@codeaurora.org> Cc: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> [ paulmck: Fix non-NOCB rcu_nocb_need_deferred_wakeup() definition. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
f9fc166b |
|
22-Feb-2021 |
Frederic Weisbecker <frederic@kernel.org> |
rcu/nocb: Only cancel nocb timer if not polling This commit refrains deleting the ->nocb_timer if rcu_nocb is polling because it should not ever have been queued in the polling case. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Neeraj Upadhyay <neeraju@codeaurora.org> Cc: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
3b2348e2 |
|
22-Feb-2021 |
Frederic Weisbecker <frederic@kernel.org> |
rcu/nocb: Delete bypass_timer upon nocb_gp wakeup A NOCB-gp wake p can safely delete the ->nocb_bypass_timer because nocb_gp_wait() will recheck again the bypass state and rearm the bypass timer if necessary. This commit therefore deletes this timer. Reviewed-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Neeraj Upadhyay <neeraju@codeaurora.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
b6e2c4ed |
|
22-Feb-2021 |
Frederic Weisbecker <frederic@kernel.org> |
rcu/nocb: Cancel nocb_timer upon nocb_gp wakeup When waking up in nocb_gp_wait(), there is no need to keep the nocb_timer around because this function will traverse the whole rdp list. Any update performed before the timer was armed will now be visible after the ->nocb_gp_lock acquire. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Neeraj Upadhyay <neeraju@codeaurora.org> Cc: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
552cac80 |
|
22-Feb-2021 |
Frederic Weisbecker <frederic@kernel.org> |
rcu/nocb: Allow de-offloading rdp leader The only thing that prevented an rdp leader from being de-offloaded was the nocb_bypass_timer that used to lock the nocb_lock of the rdp leader. If an rdp gets de-offloaded, it will subtlely ignore rcu_nocb_lock() calls and do its job in the timer unsafely. Worse yet: If it gets re-offloaded in the middle of the timer, rcu_nocb_unlock() would try to unlock, leaving it imbalanced. Now that the nocb_bypass_timer doesn't use the nocb_lock anymore, de-offloading the rdp leader is now safe. This commit therefore allows the rdp leader to be de-offloaded. Reported-by: Paul E. McKenney <paulmck@kernel.org> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Neeraj Upadhyay <neeraju@codeaurora.org> Cc: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
c7ef7500 |
|
22-Feb-2021 |
Frederic Weisbecker <frederic@kernel.org> |
rcu/nocb: Directly call __wake_nocb_gp() from bypass timer The bypass timer calls __call_rcu_nocb_wake() instead of directly calling __wake_nocb_gp(). The only difference here is that rdp->qlen_last_fqs_check gets overridden. But resetting the deferred force quiescent state base shouldn't be relevant for that timer. In fact the bypass queue in question can be for any rdp from the group and not necessarily the rdp leader on which the bypass timer is attached. This commit therefore calls __wake_nocb_gp() directly. This way we don't even need to lock the ->nocb_lock. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Neeraj Upadhyay <neeraju@codeaurora.org> Cc: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
3ef5a1c3 |
|
05-Apr-2021 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Make RCU priority boosting work on single-CPU rcu_node structures When any CPU comes online, it checks to see if an RCU-boost kthread has already been created for that CPU's leaf rcu_node structure, and if not, it creates one. Unfortunately, it also verifies that this leaf rcu_node structure actually has at least one online CPU, and if not, it declines to create the kthread. Although this behavior makes sense during early boot, especially on systems that claim far more CPUs than they actually have, it makes no sense for the first CPU to come online for a given rcu_node structure. There is no point in checking because we know there is a CPU on its way in. The problem is that timing differences can cause this incoming CPU to not yet be reflected in the various bit masks even at rcutree_online_cpu() time, and there is no chance at rcutree_prepare_cpu() time. Plus it would be better to create the RCU-boost kthread at rcutree_prepare_cpu() to handle the case where the CPU is involved in an RCU priority inversion very shortly after it comes online. This commit therefore moves the checking to rcu_prepare_kthreads(), which is called only at early boot, when the check is appropriate. In addition, it makes rcutree_prepare_cpu() invoke rcu_spawn_one_boost_kthread(), which no longer does any checking for online CPUs. With this change, RCU priority boosting tests now pass for short rcutorture runs, even with single-CPU leaf rcu_node structures. Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Scott Wood <swood@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
396eba65 |
|
06-Apr-2021 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Add quiescent states and boost states to show_rcu_gp_kthreads() output This commit adds each rcu_node structure's ->qsmask and "bBEG" output indicating whether: (1) There is a boost kthread, (2) A reader needs to be (or is in the process of being) boosted, (3) A reader is blocking an expedited grace period, and (4) A reader is blocking a normal grace period. This helps diagnose RCU priority boosting failures. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
d76e0926 |
|
22-Feb-2021 |
Frederic Weisbecker <frederic@kernel.org> |
rcu/nocb: Use the rcuog CPU's ->nocb_timer Currently each CPU has its own ->nocb_timer queued when the nocb_gp wakeup must be deferred. This approach has many drawbacks, compared to a solution based on a single timer per NOCB group: * There are a lot of timers to maintain. * The per-rdp ->nocb_lock must be held to queue and cancel the timer and this lock can already be heavily contended. * One timer firing doesn't cancel the other timers in the same group: - These other timers can thus cause spurious wakeups - Each rdp that queued a timer must lock both ->nocb_lock and then ->nocb_gp_lock upon exit from the kernel to idle/user/guest mode. * We can't cancel all of them if we detect an unflushed bypass in nocb_gp_wait(). In fact currently we only ever cancel the ->nocb_timer of the leader group. * The leader group's nocb_timer is cancelled without locking ->nocb_lock in nocb_gp_wait(). This currently appears to be safe but is an accident waiting to happen. * Since the timer acquires ->nocb_lock, it requires extra care in the NOCB (de-)offloading process, requiring that it be either enabled or disabled and then flushed. This commit instead uses the rcuog kthread's CPU's ->nocb_timer instead. It is protected by nocb_gp_lock, which is _way_ less contended and remains so even after this change. As a matter of fact, the nocb_timer almost never fires and the deferred wakeup is mostly carried out upon idle/user/guest entry. Now the early check performed at this point in do_nocb_deferred_wakeup() is done on rdp_gp->nocb_defer_wakeup, which is of course racy. However, this raciness is harmless because we only need the guarantee that the timer is queued if we were the last one to queue it. Any other situation (another CPU has queued it and we either see it or not) is fine. This solves all the issues listed above. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Neeraj Upadhyay <neeraju@codeaurora.org> Cc: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
4c9c3809 |
|
17-Mar-2021 |
Rolf Eike Beer <eb@emlix.com> |
rcu: Fix typo in comment: kthead -> kthread Signed-off-by: Rolf Eike Beer <eb@emlix.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
a6814a79 |
|
20-Apr-2021 |
Yury Norov <yury.norov@gmail.com> |
rcu/tree_plugin: Don't handle the case of 'all' CPU range The 'all' semantics is now supported by the bitmap_parselist() so we can drop supporting it as a special case in RCU code. Since 'all' is properly supported in core bitmap code, also drop legacy comment in RCU for it. This patch does not make any functional changes for existing users. Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Yury Norov <yury.norov@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
b03fbd4f |
|
11-Jun-2021 |
Peter Zijlstra <peterz@infradead.org> |
sched: Introduce task_is_running() Replace a bunch of 'p->state == TASK_RUNNING' with a new helper: task_is_running(p). Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Davidlohr Bueso <dave@stgolabs.net> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20210611082838.222401495@infradead.org
|
#
e02691b7 |
|
22-Feb-2021 |
Frederic Weisbecker <frederic@kernel.org> |
rcu/nocb: Move trace_rcu_nocb_wake() calls outside nocb_lock when possible Those tracing calls don't need to be under ->nocb_lock. This commit therefore moves them outside of that lock. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Neeraj Upadhyay <neeraju@codeaurora.org> Cc: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
76d00b49 |
|
22-Feb-2021 |
Frederic Weisbecker <frederic@kernel.org> |
rcu/nocb: Disable bypass when CPU isn't completely offloaded Currently, the bypass is flushed at the very last moment in the deoffloading procedure. However, this approach leads to a larger state space than would be preferred. This commit therefore disables the bypass at soon as the deoffloading procedure begins, then flushes it. This guarantees that the bypass remains empty and thus out of the way of the deoffloading procedure. Symmetrically, this commit waits to enable the bypass until the offloading procedure has completed. Reported-by: Paul E. McKenney <paulmck@kernel.org> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Neeraj Upadhyay <neeraju@codeaurora.org> Cc: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
b2fcf210 |
|
22-Feb-2021 |
Frederic Weisbecker <frederic@kernel.org> |
rcu/nocb: Fix missed nocb_timer requeue This sequence of events can lead to a failure to requeue a CPU's ->nocb_timer: 1. There are no callbacks queued for any CPU covered by CPU 0-2's ->nocb_gp_kthread. Note that ->nocb_gp_kthread is associated with CPU 0. 2. CPU 1 enqueues its first callback with interrupts disabled, and thus must defer awakening its ->nocb_gp_kthread. It therefore queues its rcu_data structure's ->nocb_timer. At this point, CPU 1's rdp->nocb_defer_wakeup is RCU_NOCB_WAKE. 3. CPU 2, which shares the same ->nocb_gp_kthread, also enqueues a callback, but with interrupts enabled, allowing it to directly awaken the ->nocb_gp_kthread. 4. The newly awakened ->nocb_gp_kthread associates both CPU 1's and CPU 2's callbacks with a future grace period and arranges for that grace period to be started. 5. This ->nocb_gp_kthread goes to sleep waiting for the end of this future grace period. 6. This grace period elapses before the CPU 1's timer fires. This is normally improbably given that the timer is set for only one jiffy, but timers can be delayed. Besides, it is possible that kernel was built with CONFIG_RCU_STRICT_GRACE_PERIOD=y. 7. The grace period ends, so rcu_gp_kthread awakens the ->nocb_gp_kthread, which in turn awakens both CPU 1's and CPU 2's ->nocb_cb_kthread. Then ->nocb_gb_kthread sleeps waiting for more newly queued callbacks. 8. CPU 1's ->nocb_cb_kthread invokes its callback, then sleeps waiting for more invocable callbacks. 9. Note that neither kthread updated any ->nocb_timer state, so CPU 1's ->nocb_defer_wakeup is still set to RCU_NOCB_WAKE. 10. CPU 1 enqueues its second callback, this time with interrupts enabled so it can wake directly ->nocb_gp_kthread. It does so with calling wake_nocb_gp() which also cancels the pending timer that got queued in step 2. But that doesn't reset CPU 1's ->nocb_defer_wakeup which is still set to RCU_NOCB_WAKE. So CPU 1's ->nocb_defer_wakeup and its ->nocb_timer are now desynchronized. 11. ->nocb_gp_kthread associates the callback queued in 10 with a new grace period, arranges for that grace period to start and sleeps waiting for it to complete. 12. The grace period ends, rcu_gp_kthread awakens ->nocb_gp_kthread, which in turn wakes up CPU 1's ->nocb_cb_kthread which then invokes the callback queued in 10. 13. CPU 1 enqueues its third callback, this time with interrupts disabled so it must queue a timer for a deferred wakeup. However the value of its ->nocb_defer_wakeup is RCU_NOCB_WAKE which incorrectly indicates that a timer is already queued. Instead, CPU 1's ->nocb_timer was cancelled in 10. CPU 1 therefore fails to queue the ->nocb_timer. 14. CPU 1 has its pending callback and it may go unnoticed until some other CPU ever wakes up ->nocb_gp_kthread or CPU 1 ever calls an explicit deferred wakeup, for example, during idle entry. This commit fixes this bug by resetting rdp->nocb_defer_wakeup everytime we delete the ->nocb_timer. It is quite possible that there is a similar scenario involving ->nocb_bypass_timer and ->nocb_defer_wakeup. However, despite some effort from several people, a failure scenario has not yet been located. However, that by no means guarantees that no such scenario exists. Finding a failure scenario is left as an exercise for the reader, and the "Fixes:" tag below relates to ->nocb_bypass_timer instead of ->nocb_timer. Fixes: d1b222c6be1f (rcu/nocb: Add bypass callback queueing) Cc: <stable@vger.kernel.org> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Boqun Feng <boqun.feng@gmail.com> Reviewed-by: Neeraj Upadhyay <neeraju@codeaurora.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
9640dcab |
|
24-Feb-2021 |
Jiapeng Chong <jiapeng.chong@linux.alibaba.com> |
rcu: Make nocb_nobypass_lim_per_jiffy static RCU triggerse the following sparse warning: kernel/rcu/tree_plugin.h:1497:5: warning: symbol 'nocb_nobypass_lim_per_jiffy' was not declared. Should it be static? This commit therefore makes this variable static. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Reported-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
7e937220 |
|
26-Feb-2021 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Add explicit barrier() to __rcu_read_unlock() Because preemptible RCU's __rcu_read_unlock() is an external function, the rough equivalent of an implicit barrier() is inserted by the compiler. Except that there is a direct call to __rcu_read_unlock() in that same file, and compilers are getting to the point where they might choose to inline the fastpath of the __rcu_read_unlock() function. This commit therefore adds an explicit barrier() to the very beginning of __rcu_read_unlock(). Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
7308e024 |
|
27-Jan-2021 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Make rcu_read_unlock_special() expedite strict grace periods In kernels built with CONFIG_RCU_STRICT_GRACE_PERIOD=y, every grace period is an expedited grace period. However, rcu_read_unlock_special() does not treat them that way, instead allowing the deferred quiescent state to be reported whenever. This commit therefore adds a check of this Kconfig option that causes rcu_read_unlock_special() to treat all grace periods as expedited for CONFIG_RCU_STRICT_GRACE_PERIOD=y kernels. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
39bbfc62 |
|
14-Jan-2021 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Expedite deboost in case of deferred quiescent state Historically, a task that has been subjected to RCU priority boosting is deboosted at rcu_read_unlock() time. However, with the advent of deferred quiescent states, if the outermost rcu_read_unlock() was invoked with either bottom halves, interrupts, or preemption disabled, the deboosting will be delayed for some time. During this time, a low-priority process might be incorrectly running at a high real-time priority level. Fortunately, rcu_read_unlock_special() already provides mechanisms for forcing a minimal deferral of quiescent states, at least for kernels built with CONFIG_IRQ_WORK=y. These mechanisms are currently used when expedited grace periods are pending that might be blocked by the current task. This commit therefore causes those mechanisms to also be used in cases where the current task has been or might soon be subjected to RCU priority boosting. Note that this applies to all kernels built with CONFIG_RCU_BOOST=y, regardless of whether or not they are also built with CONFIG_PREEMPT_RT=y. This approach assumes that kernels build for use with aggressive real-time applications are built with CONFIG_IRQ_WORK=y. It is likely to be far simpler to enable CONFIG_IRQ_WORK=y than to implement a fast-deboosting scheme that works correctly in its absence. While in the area, alphabetize the rcu_preempt_deferred_qs_handler() function's local variables. Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Scott Wood <swood@redhat.com> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
55adc3e1 |
|
28-Jan-2021 |
Frederic Weisbecker <frederic@kernel.org> |
rcu/nocb: Rename nocb_gp_update_state to nocb_gp_update_state_deoffloading The name nocb_gp_update_state() is unenlightening, so this commit changes it to nocb_gp_update_state_deoffloading(). This function now does what its name says, updates state and returns true if the CPU corresponding to the specified rcu_data structure is in the process of being de-offloaded. Reported-by: Paul E. McKenney <paulmck@kernel.org> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Neeraj Upadhyay <neeraju@codeaurora.org> Cc: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
8a682b39 |
|
28-Jan-2021 |
Frederic Weisbecker <frederic@kernel.org> |
rcu/nocb: Avoid confusing double write of rdp->nocb_cb_sleep The nocb_cb_wait() function first sets the rdp->nocb_cb_sleep flag to true by after invoking the callbacks, and then sets it back to false if it finds more callbacks that are ready to invoke. This is confusing and will become unsafe if this flag is ever read locklessly. This commit therefore writes it only once, based on the state after both callback invocation and checking. Reported-by: Paul E. McKenney <paulmck@kernel.org> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Neeraj Upadhyay <neeraju@codeaurora.org> Cc: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
64305db2 |
|
28-Jan-2021 |
Frederic Weisbecker <frederic@kernel.org> |
rcu/nocb: Forbid NOCB toggling on offline CPUs It makes no sense to de-offload an offline CPU because that CPU will never invoke any remaining callbacks. It also makes little sense to offload an offline CPU because any pending RCU callbacks were migrated when that CPU went offline. Yes, it is in theory possible to use a number of tricks to permit offloading and deoffloading offline CPUs in certain cases, but in practice it is far better to have the simple and deterministic rule "Toggling the offload state of an offline CPU is forbidden". For but one example, consider that an offloaded offline CPU might have millions of callbacks queued. Best to just say "no". This commit therefore forbids toggling of the offloaded state of offline CPUs. Reported-by: Paul E. McKenney <paulmck@kernel.org> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Neeraj Upadhyay <neeraju@codeaurora.org> Cc: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
5de2e5bb |
|
28-Jan-2021 |
Frederic Weisbecker <frederic@kernel.org> |
rcu/nocb: Comment the reason behind BH disablement on batch processing This commit explains why softirqs need to be disabled while invoking callbacks, even when callback processing has been offloaded. After all, invoking callbacks concurrently is one thing, but concurrently invoking the same callback is quite another. Reported-by: Boqun Feng <boqun.feng@gmail.com> Reported-by: Paul E. McKenney <paulmck@kernel.org> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Neeraj Upadhyay <neeraju@codeaurora.org> Cc: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
3820b513 |
|
11-Nov-2020 |
Frederic Weisbecker <frederic@kernel.org> |
rcu/nocb: Detect unsafe checks for offloaded rdp Provide CONFIG_PROVE_RCU sanity checks to ensure we are always reading the offloaded state of an rdp in a safe and stable way and prevent from its value to be changed under us. We must either hold the barrier mutex, the cpu-hotplug lock (read or write) or the nocb lock. Local non-preemptible reads are also safe. NOCB kthreads and timers have their own means of synchronization against the offloaded state updaters. Cc: Josh Triplett <josh@joshtriplett.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Neeraj Upadhyay <neeraju@codeaurora.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
3e70df91 |
|
21-Feb-2021 |
Paul Gortmaker <paul.gortmaker@windriver.com> |
rcu: deprecate "all" option to rcu_nocbs= With the core bitmap support now accepting "N" as a placeholder for the end of the bitmap, "all" can be represented as "0-N" and has the advantage of not being specific to RCU (or any other subsystem). So deprecate the use of "all" by removing documentation references to it. The support itself needs to remain for now, since we don't know how many people out there are using it currently, but since it is in an __init area anyway, it isn't worth losing sleep over. Cc: Yury Norov <yury.norov@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Josh Triplett <josh@joshtriplett.org> Acked-by: Yury Norov <yury.norov@gmail.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
4ae7dc97 |
|
31-Jan-2021 |
Frederic Weisbecker <frederic@kernel.org> |
entry/kvm: Explicitly flush pending rcuog wakeup before last rescheduling point Following the idle loop model, cleanly check for pending rcuog wakeup before the last rescheduling point upon resuming to guest mode. This way we can avoid to do it from rcu_user_enter() with the last resort self-IPI hack that enforces rescheduling. Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20210131230548.32970-6-frederic@kernel.org
|
#
f8bb5cae |
|
31-Jan-2021 |
Frederic Weisbecker <frederic@kernel.org> |
rcu/nocb: Trigger self-IPI on late deferred wake up before user resume Entering RCU idle mode may cause a deferred wake up of an RCU NOCB_GP kthread (rcuog) to be serviced. Unfortunately the call to rcu_user_enter() is already past the last rescheduling opportunity before we resume to userspace or to guest mode. We may escape there with the woken task ignored. The ultimate resort to fix every callsites is to trigger a self-IPI (nohz_full depends on arch to implement arch_irq_work_raise()) that will trigger a reschedule on IRQ tail or guest exit. Eventually every site that want a saner treatment will need to carefully place a call to rcu_nocb_flush_deferred_wakeup() before the last explicit need_resched() check upon resume. Fixes: 96d3fd0d315a (rcu: Break call_rcu() deadlock involving scheduler and perf) Reported-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20210131230548.32970-4-frederic@kernel.org
|
#
43789ef3 |
|
31-Jan-2021 |
Frederic Weisbecker <frederic@kernel.org> |
rcu/nocb: Perform deferred wake up before last idle's need_resched() check Entering RCU idle mode may cause a deferred wake up of an RCU NOCB_GP kthread (rcuog) to be serviced. Usually a local wake up happening while running the idle task is handled in one of the need_resched() checks carefully placed within the idle loop that can break to the scheduler. Unfortunately the call to rcu_idle_enter() is already beyond the last generic need_resched() check and we may halt the CPU with a resched request unhandled, leaving the task hanging. Fix this with splitting the rcuog wakeup handling from rcu_idle_enter() and place it before the last generic need_resched() check in the idle loop. It is then assumed that no call to call_rcu() will be performed after that in the idle loop until the CPU is put in low power mode. Fixes: 96d3fd0d315a (rcu: Break call_rcu() deadlock involving scheduler and perf) Reported-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20210131230548.32970-3-frederic@kernel.org
|
#
f759081e |
|
21-Dec-2020 |
Paul E. McKenney <paulmck@kernel.org> |
rcu/nocb: Code-style nits in callback-offloading toggling This commit addresses a few code-style nits in callback-offloading toggling, including one that predates this toggling. Cc: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
3d0cef50 |
|
18-Dec-2020 |
Paul E. McKenney <paulmck@kernel.org> |
rcu/nocb: Add nocb CB kthread list to show_rcu_nocb_state() output This commit improves debuggability by indicating laying out the order in which rcuoc kthreads appear in the ->nocb_next_cb_rdp list. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
34169061 |
|
18-Dec-2020 |
Paul E. McKenney <paulmck@kernel.org> |
rcu/nocb: Add grace period and task state to show_rcu_nocb_state() output This commit improves debuggability by indicating which grace period each batch of nocb callbacks is waiting on and by showing the task state and last CPU for reach nocb kthread. [ paulmck: Handle !SMP CB offloading per kernel test robot feedback. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
b9ced9e1 |
|
13-Nov-2020 |
Frederic Weisbecker <frederic@kernel.org> |
rcu/nocb: Set SEGCBLIST_SOFTIRQ_ONLY at the very last stage of de-offloading This commit sets SEGCBLIST_SOFTIRQ_ONLY once toggling is otherwise fully complete, allowing further RCU callback manipulation to be carried out locklessly and locally. Cc: Josh Triplett <josh@joshtriplett.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Neeraj Upadhyay <neeraju@codeaurora.org> Cc: Thomas Gleixner <tglx@linutronix.de> Inspired-by: Paul E. McKenney <paulmck@kernel.org> Tested-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
314202f8 |
|
13-Nov-2020 |
Frederic Weisbecker <frederic@kernel.org> |
rcu/nocb: Flush bypass before setting SEGCBLIST_SOFTIRQ_ONLY This commit flushes the bypass queue and sets state to avoid its being refilled before switching to the final de-offloaded state. To avoid refilling, this commit sets SEGCBLIST_SOFTIRQ_ONLY before re-enabling IRQs. Cc: Josh Triplett <josh@joshtriplett.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Neeraj Upadhyay <neeraju@codeaurora.org> Cc: Thomas Gleixner <tglx@linutronix.de> Inspired-by: Paul E. McKenney <paulmck@kernel.org> Tested-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
69cdea87 |
|
13-Nov-2020 |
Frederic Weisbecker <frederic@kernel.org> |
rcu/nocb: Shutdown nocb timer on de-offloading This commit ensures that the nocb timer is shut down before reaching the final de-offloaded state. The key goal is to prevent the timer handler from manipulating the callbacks without the protection of the nocb locks. Cc: Josh Triplett <josh@joshtriplett.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Neeraj Upadhyay <neeraju@codeaurora.org> Cc: Thomas Gleixner <tglx@linutronix.de> Inspired-by: Paul E. McKenney <paulmck@kernel.org> Tested-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
254e11ef |
|
13-Nov-2020 |
Frederic Weisbecker <frederic@kernel.org> |
rcu/nocb: Re-offload support To re-offload the callback processing off of a CPU, it is necessary to clear SEGCBLIST_SOFTIRQ_ONLY, set SEGCBLIST_OFFLOADED, and then notify both the CB and GP kthreads so that they both set their own bit flag and start processing the callbacks remotely. The re-offloading worker is then notified that it can stop the RCU_SOFTIRQ handler (or rcuc kthread, as the case may be) from processing the callbacks locally. Ordering must be carefully enforced so that the callbacks that used to be processed locally without locking will have the same ordering properties when they are invoked by the nocb CB and GP kthreads. This commit makes this change. Cc: Josh Triplett <josh@joshtriplett.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Neeraj Upadhyay <neeraju@codeaurora.org> Cc: Thomas Gleixner <tglx@linutronix.de> Inspired-by: Paul E. McKenney <paulmck@kernel.org> Tested-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> [ paulmck: Export rcu_nocb_cpu_offload(). ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
5bb39dc9 |
|
13-Nov-2020 |
Frederic Weisbecker <frederic@kernel.org> |
rcu/nocb: De-offloading GP kthread To de-offload callback processing back onto a CPU, it is necessary to clear SEGCBLIST_OFFLOAD and notify the nocb GP kthread, which will then clear its own bit flag and ignore this CPU until further notice. Whichever of the nocb CB and nocb GP kthreads is last to clear its own bit notifies the de-offloading worker kthread. Once notified, this worker kthread can proceed safe in the knowledge that the nocb CB and GP kthreads will no longer be manipulating this CPU's RCU callback list. This commit makes this change. Cc: Josh Triplett <josh@joshtriplett.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Neeraj Upadhyay <neeraju@codeaurora.org> Cc: Thomas Gleixner <tglx@linutronix.de> Inspired-by: Paul E. McKenney <paulmck@kernel.org> Tested-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
ef005345 |
|
13-Nov-2020 |
Frederic Weisbecker <frederic@kernel.org> |
rcu/nocb: Don't deoffload an offline CPU with pending work Offloaded CPUs do not migrate their callbacks, instead relying on their rcuo kthread to invoke them. But if the CPU is offline, it will be running neither its RCU_SOFTIRQ handler nor its rcuc kthread. This means that de-offloading an offline CPU that still has pending callbacks will strand those callbacks. This commit therefore refuses to toggle offline CPUs having pending callbacks. Cc: Josh Triplett <josh@joshtriplett.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Neeraj Upadhyay <neeraju@codeaurora.org> Cc: Thomas Gleixner <tglx@linutronix.de> Suggested-by: Paul E. McKenney <paulmck@kernel.org> Tested-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
d97b0781 |
|
13-Nov-2020 |
Frederic Weisbecker <frederic@kernel.org> |
rcu/nocb: De-offloading CB kthread To de-offload callback processing back onto a CPU, it is necessary to clear SEGCBLIST_OFFLOAD and notify the nocb CB kthread, which will then clear its own bit flag and go to sleep to stop handling callbacks. This commit makes that change. It will also be necessary to notify the nocb GP kthread in this same way, which is the subject of a follow-on commit. Cc: Josh Triplett <josh@joshtriplett.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Neeraj Upadhyay <neeraju@codeaurora.org> Cc: Thomas Gleixner <tglx@linutronix.de> Inspired-by: Paul E. McKenney <paulmck@kernel.org> Tested-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> [ paulmck: Add export per kernel test robot feedback. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
a649d25d |
|
19-Nov-2020 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Add lockdep_assert_irqs_disabled() to rcu_sched_clock_irq() and callees This commit adds a number of lockdep_assert_irqs_disabled() calls to rcu_sched_clock_irq() and a number of the functions that it calls. The point of this is to help track down a situation where lockdep appears to be insisting that interrupts are enabled within these functions, which should only ever be invoked from the scheduling-clock interrupt handler. Link: https://lore.kernel.org/lkml/20201111133813.GA81547@elver.google.com/ Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
bfb3aa73 |
|
30-Oct-2020 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Do not report strict GPs for outgoing CPUs An outgoing CPU is marked offline in a stop-machine handler and most of that CPU's services stop at that point, including IRQ work queues. However, that CPU must take another pass through the scheduler and through a number of CPU-hotplug notifiers, many of which contain RCU readers. In the past, these readers were not a problem because the outgoing CPU has interrupts disabled, so that rcu_read_unlock_special() would not be invoked, and thus RCU would never attempt to queue IRQ work on the outgoing CPU. This changed with the advent of the CONFIG_RCU_STRICT_GRACE_PERIOD Kconfig option, in which rcu_read_unlock_special() is invoked upon exit from almost all RCU read-side critical sections. Worse yet, because interrupts are disabled, rcu_read_unlock_special() cannot immediately report a quiescent state and will therefore attempt to defer this reporting, for example, by queueing IRQ work. Which fails with a splat because the CPU is already marked as being offline. But it turns out that there is no need to report this quiescent state because rcu_report_dead() will do this job shortly after the outgoing CPU makes its final dive into the idle loop. This commit therefore makes rcu_read_unlock_special() refrain from queuing IRQ work onto outgoing CPUs. Fixes: 44bad5b3cca2 ("rcu: Do full report for .need_qs for strict GPs") Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Jann Horn <jannh@google.com>
|
#
cfeac397 |
|
20-Aug-2020 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Remove unused "cpu" parameter from rcu_report_qs_rdp() The "cpu" parameter to rcu_report_qs_rdp() is not used, with rdp->cpu being used instead. Furtheremore, every call to rcu_report_qs_rdp() invokes it on rdp->cpu. This commit therefore removes this unused "cpu" parameter and converts a check of rdp->cpu against smp_processor_id() to a WARN_ON_ONCE(). Reported-by: Jann Horn <jannh@google.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
aa40c138 |
|
10-Aug-2020 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Report QS for outermost PREEMPT=n rcu_read_unlock() for strict GPs The CONFIG_PREEMPT=n instance of rcu_read_unlock is even more aggressively than that of CONFIG_PREEMPT=y in deferring reporting quiescent states to the RCU core. This is just what is wanted in normal use because it reduces overhead, but the resulting delay is not what is wanted for kernels built with CONFIG_RCU_STRICT_GRACE_PERIOD=y. This commit therefore adds an rcu_read_unlock_strict() function that checks for exceptional conditions, and reports the newly started quiescent state if it is safe to do so, also doing a spin-delay if requested via rcutree.rcu_unlock_delay. This commit also adds a call to rcu_read_unlock_strict() from the CONFIG_PREEMPT=n instance of __rcu_read_unlock(). [ paulmck: Fixed bug located by kernel test robot <lkp@intel.com> ] Reported-by Jann Horn <jannh@google.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
3d29aaf1 |
|
07-Aug-2020 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Provide optional RCU-reader exit delay for strict GPs The goal of this series is to increase the probability of tools like KASAN detecting that an RCU-protected pointer was used outside of its RCU read-side critical section. Thus far, the approach has been to make grace periods and callback processing happen faster. Another approach is to delay the pointer leaker. This commit therefore allows a delay to be applied to exit from RCU read-side critical sections. This slowdown is specified by a new rcutree.rcu_unlock_delay kernel boot parameter that specifies this delay in microseconds, defaulting to zero. Reported-by Jann Horn <jannh@google.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
44bad5b3 |
|
06-Aug-2020 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Do full report for .need_qs for strict GPs The rcu_preempt_deferred_qs_irqrestore() function is invoked at the end of an RCU read-side critical section (for example, directly from rcu_read_unlock()) and, if .need_qs is set, invokes rcu_qs() to report the new quiescent state. This works, except that rcu_qs() only updates per-CPU state, leaving reporting of the actual quiescent state to a later call to rcu_report_qs_rdp(), for example from within a later RCU_SOFTIRQ instance. Although this approach is exactly what you want if you are more concerned about efficiency than about short grace periods, in CONFIG_RCU_STRICT_GRACE_PERIOD=y kernels, short grace periods are the name of the game. This commit therefore makes rcu_preempt_deferred_qs_irqrestore() directly invoke rcu_report_qs_rdp() in CONFIG_RCU_STRICT_GRACE_PERIOD=y, thus shortening grace periods. Historical note: To the best of my knowledge, causing rcu_read_unlock() to directly report a quiescent state first appeared in Jim Houston's and Joe Korty's JRCU. This is the second instance of a Linux-kernel RCU feature being inspired by JRCU, the first being RCU callback offloading (as in the RCU_NOCB_CPU Kconfig option). Reported-by Jann Horn <jannh@google.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
f19920e4 |
|
06-Aug-2020 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Always set .need_qs from __rcu_read_lock() for strict GPs The ->rcu_read_unlock_special.b.need_qs field in the task_struct structure indicates that the RCU core needs a quiscent state from the corresponding task. The __rcu_read_unlock() function checks this (via an eventual call to rcu_preempt_deferred_qs_irqrestore()), and if set reports a quiscent state immediately upon exit from the outermost RCU read-side critical section. Currently, this flag is only set when the scheduling-clock interrupt decides that the current RCU grace period is too old, as in about one full second too old. But if the kernel has been built with CONFIG_RCU_STRICT_GRACE_PERIOD=y, we clearly do not want to wait that long. This commit therefore sets the .need_qs field immediately at the start of the RCU read-side critical section from within __rcu_read_lock() in order to unconditionally enlist help from __rcu_read_unlock(). But note the additional check for rcu_state.gp_kthread, which prevents attempts to awaken RCU's grace-period kthread during early boot before there is a scheduler. Leaving off this check results in early boot hangs. So early that there is no console output. Thus, this additional check fails until such time as RCU's grace-period kthread has been created, avoiding these empty-console hangs. Reported-by Jann Horn <jannh@google.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
8cbd0e38 |
|
05-Aug-2020 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Add Kconfig option for strict RCU grace periods People running automated tests have asked for a way to make RCU minimize grace-period duration in order to increase the probability of KASAN detecting a pointer being improperly leaked from an RCU read-side critical section, for example, like this: rcu_read_lock(); p = rcu_dereference(gp); do_something_with(p); // OK rcu_read_unlock(); do_something_else_with(p); // BUG!!! The rcupdate.rcu_expedited boot parameter is a start in this direction, given that it makes calls to synchronize_rcu() instead invoke the faster (and more wasteful) synchronize_rcu_expedited(). However, this does nothing to shorten RCU grace periods that are instead initiated by call_rcu(), and RCU pointer-leak bugs can involve call_rcu() just as surely as they can synchronize_rcu(). This commit therefore adds a RCU_STRICT_GRACE_PERIOD Kconfig option that will be used to shorten normal (non-expedited) RCU grace periods. This commit also dumps out a message when this option is in effect. Later commits will actually shorten grace periods. Reported-by Jann Horn <jannh@google.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
4569c5ee |
|
05-Aug-2020 |
Paul E. McKenney <paulmck@kernel.org> |
rcu/nocb: Add a warning for non-GP kthread running GP code This commit increases RCU's ability to defend itself by emitting a warning if one of the nocb CB kthreads invokes the GP kthread's wait function. This warning augments a similar check that is carried out at the end of rcutorture testing and when RCU CPU stall warnings are emitted. The problem with those checks is that the miscreants have long since departed and disposed of any and all evidence. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
2130c6b4 |
|
22-Jun-2020 |
Paul E. McKenney <paulmck@kernel.org> |
nocb: Remove show_rcu_nocb_state() false positive printout The rcu_data structure's ->nocb_timer field is used to defer wakeups of the corresponding no-CBs CPU's grace-period kthread ("rcuog*"), and that structure's ->nocb_defer_wakeup field is used to track such deferral. This means that the show_rcu_nocb_state() printing an error when those fields are set for a CPU not corresponding to a no-CBs grace-period kthread is erroneous. This commit therefore switches the check from ->nocb_timer to ->nocb_bypass_timer and removes the check of ->nocb_defer_wakeup. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
e082c7b3 |
|
22-Jun-2020 |
Paul E. McKenney <paulmck@kernel.org> |
nocb: Clarify RCU nocb CPU error message A message of the form "rcu: !!! lDTs ." can be tracked down, but doing so is not trivial. This commit therefore eases this process by adding text so that this error message now reads as follows: "rcu: nocb GP activity on CB-only CPU!!! lDTs ." Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
f5ca3464 |
|
07-May-2020 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: No-CBs-related sleeps to idle priority This commit converts the schedule_timeout_interruptible() call used by RCU's no-CBs grace-period kthreads to schedule_timeout_idle(). This conversion avoids polluting the load-average with RCU-related sleeping. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
a9352f72 |
|
07-May-2020 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Priority-boost-related sleeps to idle priority This commit converts the long-standing schedule_timeout_interruptible() call used by RCU's priority-boosting kthreads to schedule_timeout_idle(). This conversion avoids polluting the load-average with RCU-related sleeping. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
ff5c4f5c |
|
13-Mar-2020 |
Thomas Gleixner <tglx@linutronix.de> |
rcu/tree: Mark the idle relevant functions noinstr These functions are invoked from context tracking and other places in the low level entry code. Move them into the .noinstr.text section to exclude them from instrumentation. Mark the places which are safe to invoke traceable functions with instrumentation_begin/end() so objtool won't complain. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Paul E. McKenney <paulmck@kernel.org> Link: https://lkml.kernel.org/r/20200505134100.575356107@linutronix.de
|
#
7d0c9c50 |
|
19-Mar-2020 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Avoid IPIing userspace/idle tasks if kernel is so built Systems running CPU-bound real-time task do not want IPIs sent to CPUs executing nohz_full userspace tasks. Battery-powered systems don't want IPIs sent to idle CPUs in low-power mode. Unfortunately, RCU tasks trace can and will send such IPIs in some cases. Both of these situations occur only when the target CPU is in RCU dyntick-idle mode, in other words, when RCU is not watching the target CPU. This suggests that CPUs in dyntick-idle mode should use memory barriers in outermost invocations of rcu_read_lock_trace() and rcu_read_unlock_trace(), which would allow the RCU tasks trace grace period to directly read out the target CPU's read-side state. One challenge is that RCU tasks trace is not targeting a specific CPU, but rather a task. And that task could switch from one CPU to another at any time. This commit therefore uses try_invoke_on_locked_down_task() and checks for task_curr() in trc_inspect_reader_notrunning(). When this condition holds, the target task is running and cannot move. If CONFIG_TASKS_TRACE_RCU_READ_MB=y, the new rcu_dynticks_zero_in_eqs() function can be used to check if the specified integer (in this case, t->trc_reader_nesting) is zero while the target CPU remains in that same dyntick-idle sojourn. If so, the target task is in a quiescent state. If not, trc_read_check_handler() must indicate failure so that the grace-period kthread can take appropriate action or retry after an appropriate delay, as the case may be. With this change, given CONFIG_TASKS_TRACE_RCU_READ_MB=y, if a given CPU remains idle or a given task continues executing in nohz_full mode, the RCU tasks trace grace-period kthread will detect this without the need to send an IPI. Suggested-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
43766c3e |
|
16-Mar-2020 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Make RCU Tasks Trace make use of RCU scheduler hooks This commit makes the calls to rcu_tasks_qs() detect and report quiescent states for RCU tasks trace. If the task is in a quiescent state and if ->trc_reader_checked is not yet set, the task sets its own ->trc_reader_checked. This will cause the grace-period kthread to remove it from the holdout list if it still remains there. [ paulmck: Fix conditional compilation per kbuild test robot feedback. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
66777e58 |
|
16-Mar-2020 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Use context-switch hook for PREEMPT=y kernels Currently, the PREEMPT=y version of rcu_note_context_switch() does not invoke rcu_tasks_qs(), and we need it to in order to keep RCU Tasks Trace's IPIs down to a dull roar. This commit therefore enables this hook. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
5f5fa7ea |
|
15-Feb-2020 |
Lai Jiangshan <laijs@linux.alibaba.com> |
rcu: Don't use negative nesting depth in __rcu_read_unlock() Now that RCU flavors have been consolidated, an RCU-preempt rcu_read_unlock() in an interrupt or softirq handler cannot possibly end the RCU read-side critical section. Consider the old vulnerability involving rcu_read_unlock() being invoked within such a handler that interrupted an __rcu_read_unlock_special(), in which a wakeup might be invoked with a scheduler lock held. Because rcu_read_unlock_special() no longer does wakeups in such situations, it is no longer necessary for __rcu_read_unlock() to set the nesting level negative. This commit therefore removes this recursion-protection code from __rcu_read_unlock(). [ paulmck: Let rcu_exp_handler() continue to call rcu_report_exp_rdp(). ] [ paulmck: Adjust other checks given no more negative nesting. ] Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
f0bdf6d4 |
|
15-Feb-2020 |
Lai Jiangshan <laijs@linux.alibaba.com> |
rcu: Remove unused ->rcu_read_unlock_special.b.deferred_qs field The ->rcu_read_unlock_special.b.deferred_qs field is set to true in rcu_read_unlock_special() but never set to false. This is not particularly useful, so this commit removes this field. The only possible justification for this field is to ease debugging of RCU deferred quiscent states, but the combination of the other ->rcu_read_unlock_special fields plus ->rcu_blocked_node and of course ->rcu_read_lock_nesting should cover debugging needs. And if this last proves incorrect, this patch can always be reverted, along with the required setting of ->rcu_read_unlock_special.b.deferred_qs to false in rcu_preempt_deferred_qs_irqrestore(). Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
07b4a930 |
|
15-Feb-2020 |
Lai Jiangshan <laijs@linux.alibaba.com> |
rcu: Don't set nesting depth negative in rcu_preempt_deferred_qs() Now that RCU flavors have been consolidated, an RCU-preempt rcu_read_unlock() in an interrupt or softirq handler cannot possibly end the RCU read-side critical section. Consider the old vulnerability involving rcu_preempt_deferred_qs() being invoked within such a handler that interrupted an extended RCU read-side critical section, in which a wakeup might be invoked with a scheduler lock held. Because rcu_read_unlock_special() no longer does wakeups in such situations, it is no longer necessary for rcu_preempt_deferred_qs() to set the nesting level negative. This commit therefore removes this recursion-protection code from rcu_preempt_deferred_qs(). [ paulmck: Fix typo in commit log per Steve Rostedt. ] Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
e4453d8a |
|
15-Feb-2020 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Make rcu_read_unlock_special() safe for rq/pi locks The scheduler is currently required to hold rq/pi locks across the entire RCU read-side critical section or not at all. This is inconvenient and leaves traps for the unwary, including the author of this commit. But now that excessively long grace periods enable scheduling-clock interrupts for holdout nohz_full CPUs, the nohz_full rescue logic in rcu_read_unlock_special() can be dispensed with. In other words, the rcu_read_unlock_special() function can refrain from doing wakeups unless such wakeups are guaranteed safe. This commit therefore avoids unsafe wakeups, freeing the scheduler to hold rq/pi locks across rcu_read_unlock() even if the corresponding RCU read-side critical section might have been preempted. This commit also updates RCU's requirements documentation. This commit is inspired by a patch from Lai Jiangshan: https://lore.kernel.org/lkml/20191102124559.1135-2-laijs@linux.alibaba.com This commit is further intended to be a step towards his goal of permitting the inlining of RCU-preempt's rcu_read_lock() and rcu_read_unlock(). Cc: Lai Jiangshan <laijs@linux.alibaba.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
e2f3ccfa |
|
10-Apr-2020 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Convert rcu_nohz_full_cpu() ULONG_CMP_LT() to time_before() This commit converts the ULONG_CMP_LT() in rcu_nohz_full_cpu() to time_before() to reflect the fact that it is comparing a timestamp to the jiffies counter. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
7b241311 |
|
10-Apr-2020 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Convert rcu_initiate_boost() ULONG_CMP_GE() to time_after() This commit converts the ULONG_CMP_GE() in rcu_initiate_boost() to time_after() to reflect the fact that it is comparing a timestamp to the jiffies counter. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
5822b812 |
|
04-Jan-2020 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Add WRITE_ONCE() to rcu_node ->boost_tasks The rcu_node structure's ->boost_tasks field is read locklessly, so this commit adds the WRITE_ONCE() to an update in order to provide proper documentation and READ_ONCE()/WRITE_ONCE() pairing. This data race was reported by KCSAN. Not appropriate for backporting due to failure being unlikely. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
065a6db1 |
|
03-Jan-2020 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Add READ_ONCE and data_race() to rcu_node ->boost_tasks The rcu_node structure's ->boost_tasks field is read locklessly, so this commit adds the READ_ONCE() to one load in order to avoid destructive compiler optimizations. The other load is from a diagnostic print, so data_race() suffices. This data race was reported by KCSAN. Not appropriate for backporting due to failure being unlikely. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
314eeb43 |
|
03-Jan-2020 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Add *_ONCE() and data_race() to rcu_node ->exp_tasks plus locking There are lockless loads from the rcu_node structure's ->exp_tasks field, so this commit causes all stores to use WRITE_ONCE() and all lockless loads to use READ_ONCE() or data_race(), with the latter for debug prints. This code also did a unprotected traversal of the linked list pointed into by ->exp_tasks, so this commit also acquires the rcu_node structure's ->lock to properly protect this traversal. This list was traversed unprotected only when printing an RCU CPU stall warning for an expedited grace period, so the odds of seeing this in production are not all that high. This data race was reported by KCSAN. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
aa96a93b |
|
12-Dec-2019 |
Colin Ian King <colin.king@canonical.com> |
rcu: Fix spelling mistake "leval" -> "level" This commit fixes a spelling mistake in a pr_info() message. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
8c14263d |
|
07-Nov-2019 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: React to callback overload by boosting RCU readers RCU priority boosting currently is not applied until the grace period is at least 250 milliseconds old (or the number of milliseconds specified by the CONFIG_RCU_BOOST_DELAY Kconfig option). Although this has worked well, it can result in OOM under conditions of RCU callback flooding. One can argue that the real-time systems using RCU priority boosting should carefully avoid RCU callback flooding, but one can just as well argue that an OOM is a rather obnoxious error message. This commit therefore disables the RCU priority boosting delay when there are excessive numbers of callbacks queued. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
b2b00ddf |
|
30-Oct-2019 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: React to callback overload by aggressively seeking quiescent states In default configutions, RCU currently waits at least 100 milliseconds before asking cond_resched() and/or resched_rcu() for help seeking quiescent states to end a grace period. But 100 milliseconds can be one good long time during an RCU callback flood, for example, as can happen when user processes repeatedly open and close files in a tight loop. These 100-millisecond gaps in successive grace periods during a callback flood can result in excessive numbers of callbacks piling up, unnecessarily increasing memory footprint. This commit therefore asks cond_resched() and/or resched_rcu() for help as early as the first FQS scan when at least one of the CPUs has more than 20,000 callbacks queued, a number that can be changed using the new rcutree.qovld kernel boot parameter. An auxiliary qovld_calc variable is used to avoid acquisition of locks that have not yet been initialized. Early tests indicate that this reduces the RCU-callback memory footprint during rcutorture floods by from 50% to 4x, depending on configuration. Reported-by: Joel Fernandes (Google) <joel@joelfernandes.org> Reported-by: Tejun Heo <tj@kernel.org> [ paulmck: Fix bug located by Qian Cai. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Tested-by: Dexuan Cui <decui@microsoft.com> Tested-by: Qian Cai <cai@lca.pw>
|
#
3d05031a |
|
04-Feb-2020 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Make nocb_gp_wait() double-check unexpected-callback warning Currently, nocb_gp_wait() unconditionally complains if there is a callback not already associated with a grace period. This assumes that either there was no such callback initially on the one hand, or that the rcu_advance_cbs() function assigned all such callbacks to a grace period on the other. However, in theory there are some situations that would prevent rcu_advance_cbs() from assigning all of the callbacks. This commit therefore checks for unassociated callbacks immediately after rcu_advance_cbs() returns, while the corresponding rcu_node structure's ->lock is still held. If there are unassociated callbacks at that point, the subsequent WARN_ON_ONCE() is disabled. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
13817dd5 |
|
04-Feb-2020 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Tighten rcu_lockdep_assert_cblist_protected() check The ->nocb_lock lockdep assertion is currently guarded by cpu_online(), which is incorrect for no-CBs CPUs, whose callback lists must be protected by ->nocb_lock regardless of whether or not the corresponding CPU is online. This situation could result in failure to detect bugs resulting from failing to hold ->nocb_lock for offline CPUs. This commit therefore removes the cpu_online() guard. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
92c0b889 |
|
29-Jan-2020 |
Jules Irenge <jbi.octave@gmail.com> |
rcu/nocb: Add missing annotation for rcu_nocb_bypass_unlock() Sparse reports warning at rcu_nocb_bypass_unlock() warning: context imbalance in rcu_nocb_bypass_unlock() - unexpected unlock The root cause is a missing annotation of rcu_nocb_bypass_unlock() which causes the warning. This commit therefore adds the missing __releases(&rdp->nocb_bypass_lock) annotation. Signed-off-by: Jules Irenge <jbi.octave@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Acked-by: Boqun Feng <boqun.feng@gmail.com>
|
#
9ced4548 |
|
20-Jan-2020 |
Jules Irenge <jbi.octave@gmail.com> |
rcu: Add missing annotation for rcu_nocb_bypass_lock() Sparse reports warning at rcu_nocb_bypass_lock() |warning: context imbalance in rcu_nocb_bypass_lock() - wrong count at exit To fix this, this commit adds an __acquires(&rdp->nocb_bypass_lock). Given that rcu_nocb_bypass_lock() does actually call raw_spin_lock() when raw_spin_trylock() fails, this not only fixes the warning but also improves on the readability of the code. Signed-off-by: Jules Irenge <jbi.octave@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
3ca3b0e2 |
|
08-Jan-2020 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Add *_ONCE() to rcu_node ->boost_kthread_status The rcu_node structure's ->boost_kthread_status field is accessed locklessly, so this commit causes all updates to use WRITE_ONCE() and all reads to use READ_ONCE(). This data race was reported by KCSAN. Not appropriate for backporting due to failure being unlikely. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
8ff37290 |
|
04-Jan-2020 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Add *_ONCE() for grace-period progress indicators The various RCU structures' ->gp_seq, ->gp_seq_needed, ->gp_req_activity, and ->gp_activity fields are read locklessly, so they must be updated with WRITE_ONCE() and, when read locklessly, with READ_ONCE(). This commit makes these changes. This data race was reported by KCSAN. Not appropriate for backporting due to failure being unlikely. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
77339e61 |
|
15-Nov-2019 |
Lai Jiangshan <laijs@linux.alibaba.com> |
rcu: Provide wrappers for uses of ->rcu_read_lock_nesting This commit provides wrapper functions for uses of ->rcu_read_lock_nesting to improve readability and to ease future changes to support inlining of __rcu_read_lock() and __rcu_read_unlock(). Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
c51f83c3 |
|
04-Nov-2019 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Use READ_ONCE() for ->expmask in rcu_read_unlock_special() The rcu_node structure's ->expmask field is updated only when holding the ->lock, but is also accessed locklessly. This means that all ->expmask updates must use WRITE_ONCE() and all reads carried out without holding ->lock must use READ_ONCE(). This commit therefore changes the lockless ->expmask read in rcu_read_unlock_special() to use READ_ONCE(). Reported-by: syzbot+99f4ddade3c22ab0cf23@syzkaller.appspotmail.com Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Acked-by: Marco Elver <elver@google.com>
|
#
3717e1e9 |
|
01-Nov-2019 |
Lai Jiangshan <laijs@linux.alibaba.com> |
rcu: Clear ->rcu_read_unlock_special only once In rcu_preempt_deferred_qs_irqrestore(), ->rcu_read_unlock_special is cleared one piece at a time. Given that the "if" statements in this function use the copy in "special", this commit removes the clearing of the individual pieces in favor of clearing ->rcu_read_unlock_special in one go just after it has been determined to be non-zero. Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
2eeba583 |
|
01-Nov-2019 |
Lai Jiangshan <laijs@linux.alibaba.com> |
rcu: Clear .exp_hint only when deferred quiescent state has been reported Currently, the .exp_hint flag is cleared in rcu_read_unlock_special(), which works, but which can also prevent subsequent rcu_read_unlock() calls from helping expedite the quiescent state needed by an ongoing expedited RCU grace period. This commit therefore defers clearing of .exp_hint from rcu_read_unlock_special() to rcu_preempt_deferred_qs_irqrestore(), thus ensuring that intervening calls to rcu_read_unlock() have a chance to help end the expedited grace period. Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
77a40f97 |
|
29-Aug-2019 |
Joel Fernandes (Google) <joel@joelfernandes.org> |
rcu: Remove kfree_rcu() special casing and lazy-callback handling This commit removes kfree_rcu() special-casing and the lazy-callback handling from Tree RCU. It moves some of this special casing to Tiny RCU, the removal of which will be the subject of later commits. This results in a nice negative delta. Suggested-by: Paul E. McKenney <paulmck@linux.ibm.com> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> [ paulmck: Add slab.h #include, thanks to kbuild test robot <lkp@intel.com>. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
90326f05 |
|
15-Oct-2019 |
Sebastian Andrzej Siewior <bigeasy@linutronix.de> |
rcu: Use CONFIG_PREEMPTION where appropriate The config option `CONFIG_PREEMPT' is used for the preemption model "Low-Latency Desktop". The config option `CONFIG_PREEMPTION' is enabled when kernel preemption is enabled which is true for the preemption model `CONFIG_PREEMPT' and `CONFIG_PREEMPT_RT'. Use `CONFIG_PREEMPTION' if it applies to both preemption models and not just to `CONFIG_PREEMPT'. Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: rcu@vger.kernel.org Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
03bd2983 |
|
10-Oct-2019 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Use lockdep rather than comment to enforce lock held The rcu_preempt_check_blocked_tasks() function has a comment that states that the rcu_node structure's ->lock must be held, which might be informative, but which carries little weight if not read. This commit therefore removes this comment in favor of raw_lockdep_assert_held_rcu_node(), which will complain quite visibly if the required lock is not held. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
6935c398 |
|
09-Oct-2019 |
Eric Dumazet <edumazet@google.com> |
rcu: Avoid data-race in rcu_gp_fqs_check_wake() The rcu_gp_fqs_check_wake() function uses rcu_preempt_blocked_readers_cgp() to read ->gp_tasks while other cpus might overwrite this field. We need READ_ONCE()/WRITE_ONCE() pairs to avoid compiler tricks and KCSAN splats like the following : BUG: KCSAN: data-race in rcu_gp_fqs_check_wake / rcu_preempt_deferred_qs_irqrestore write to 0xffffffff85a7f190 of 8 bytes by task 7317 on cpu 0: rcu_preempt_deferred_qs_irqrestore+0x43d/0x580 kernel/rcu/tree_plugin.h:507 rcu_read_unlock_special+0xec/0x370 kernel/rcu/tree_plugin.h:659 __rcu_read_unlock+0xcf/0xe0 kernel/rcu/tree_plugin.h:394 rcu_read_unlock include/linux/rcupdate.h:645 [inline] __ip_queue_xmit+0x3b0/0xa40 net/ipv4/ip_output.c:533 ip_queue_xmit+0x45/0x60 include/net/ip.h:236 __tcp_transmit_skb+0xdeb/0x1cd0 net/ipv4/tcp_output.c:1158 __tcp_send_ack+0x246/0x300 net/ipv4/tcp_output.c:3685 tcp_send_ack+0x34/0x40 net/ipv4/tcp_output.c:3691 tcp_cleanup_rbuf+0x130/0x360 net/ipv4/tcp.c:1575 tcp_recvmsg+0x633/0x1a30 net/ipv4/tcp.c:2179 inet_recvmsg+0xbb/0x250 net/ipv4/af_inet.c:838 sock_recvmsg_nosec net/socket.c:871 [inline] sock_recvmsg net/socket.c:889 [inline] sock_recvmsg+0x92/0xb0 net/socket.c:885 sock_read_iter+0x15f/0x1e0 net/socket.c:967 call_read_iter include/linux/fs.h:1864 [inline] new_sync_read+0x389/0x4f0 fs/read_write.c:414 read to 0xffffffff85a7f190 of 8 bytes by task 10 on cpu 1: rcu_gp_fqs_check_wake kernel/rcu/tree.c:1556 [inline] rcu_gp_fqs_check_wake+0x93/0xd0 kernel/rcu/tree.c:1546 rcu_gp_fqs_loop+0x36c/0x580 kernel/rcu/tree.c:1611 rcu_gp_kthread+0x143/0x220 kernel/rcu/tree.c:1768 kthread+0x1d4/0x200 drivers/block/aoe/aoecmd.c:1253 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:352 Reported by Kernel Concurrency Sanitizer on: CPU: 1 PID: 10 Comm: rcu_preempt Not tainted 5.3.0+ #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> [ paulmck: Added another READ_ONCE() for RCU CPU stall warnings. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
610dea36 |
|
04-Oct-2019 |
Stefan Reiter <stefan@pimaker.at> |
rcu/nocb: Fix dump_tree hierarchy print always active Commit 18cd8c93e69e ("rcu/nocb: Print gp/cb kthread hierarchy if dump_tree") added print statements to rcu_organize_nocb_kthreads for debugging, but incorrectly guarded them, causing the function to always spew out its message. This patch fixes it by guarding both pr_alert statements with dump_tree, while also changing the second pr_alert to a pr_cont, to print the hierarchy in a single line (assuming that's how it was supposed to work). Fixes: 18cd8c93e69e ("rcu/nocb: Print gp/cb kthread hierarchy if dump_tree") Signed-off-by: Stefan Reiter <stefan@pimaker.at> [ paulmck: Make single-nocbs-CPU GP kthreads look less erroneous. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
6c7d7dbf |
|
27-Nov-2019 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Rename sync_rcu_preempt_exp_done() to sync_rcu_exp_done() Now that the RCU flavors have been consolidated, there is one common function for checking to see if an expedited RCU grace period has completed, namely sync_rcu_preempt_exp_done(). Because this function is no longer specific to RCU-preempt, this commit removes the "_preempt" from its name. This commit also changes sync_rcu_preempt_exp_done_unlocked() to sync_rcu_exp_done_unlocked() for the same reason. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
b8889c9c |
|
23-Sep-2019 |
Dan Carpenter <dan.carpenter@oracle.com> |
rcu: Fix uninitialized variable in nocb_gp_wait() We never set this to false. This probably doesn't affect most people's runtime because GCC will automatically initialize it to false at certain common optimization levels. But that behavior is related to a bug in GCC and obviously should not be relied on. Fixes: 5d6742b37727 ("rcu/nocb: Use rcu_segcblist for no-CBs CPUs") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
f48fe4c5 |
|
16-Jul-2019 |
Paul E. McKenney <paulmck@kernel.org> |
rcu/nocb: Don't wake no-CBs GP kthread if timer posted under overload When under overload conditions, __call_rcu_nocb_wake() will wake the no-CBs GP kthread any time the no-CBs CB kthread is asleep or there are no ready-to-invoke callbacks, but only after a timer delay. If the no-CBs GP kthread has a ->nocb_bypass_timer pending, the deferred wakeup from __call_rcu_nocb_wake() is redundant. This commit therefore makes __call_rcu_nocb_wake() avoid posting the redundant deferred wakeup if ->nocb_bypass_timer is pending. This requires adding a bit of ordering of timer actions. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
|
#
296181d7 |
|
15-Jul-2019 |
Paul E. McKenney <paulmck@kernel.org> |
rcu/nocb: Reduce __call_rcu_nocb_wake() leaf rcu_node ->lock contention Currently, __call_rcu_nocb_wake() advances callbacks each time that it detects excessive numbers of callbacks, though only if it succeeds in conditionally acquiring its leaf rcu_node structure's ->lock. Despite the conditional acquisition of ->lock, this does increase contention. This commit therefore avoids advancing callbacks unless there are callbacks in ->cblist whose grace period has completed and advancing has not yet been done during this jiffy. Note that this decision does not take the presence of new callbacks into account. That is because on this code path, there will always be at least one new callback, namely the one we just enqueued. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
|
#
1d5a81c1 |
|
15-Jul-2019 |
Paul E. McKenney <paulmck@kernel.org> |
rcu/nocb: Reduce nocb_cb_wait() leaf rcu_node ->lock contention Currently, nocb_cb_wait() advances callbacks on each pass through its loop, though only if it succeeds in conditionally acquiring its leaf rcu_node structure's ->lock. Despite the conditional acquisition of ->lock, this does increase contention. This commit therefore avoids advancing callbacks unless there are callbacks in ->cblist whose grace period has completed. Note that nocb_cb_wait() doesn't worry about callbacks that have not yet been assigned a grace period. The idea is that the only reason for nocb_cb_wait() to advance callbacks is to allow it to continue invoking callbacks. Time will tell whether this is the correct choice. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
|
#
273f0340 |
|
09-Jul-2019 |
Paul E. McKenney <paulmck@kernel.org> |
rcu/nocb: Avoid synchronous wakeup in __call_rcu_nocb_wake() When callbacks are in full flow, the common case is waiting for a grace period, and this grace period will normally take a few jiffies to complete. It therefore isn't all that helpful for __call_rcu_nocb_wake() to do a synchronous wakeup in this case. This commit therefore turns this into a timer-based deferred wakeup of the no-CBs grace-period kthread. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
|
#
f7a81b12 |
|
25-Jun-2019 |
Paul E. McKenney <paulmck@kernel.org> |
rcu/nocb: Print no-CBs diagnostics when rcutorture writer unduly delayed This commit causes locking, sleeping, and callback state to be printed for no-CBs CPUs when the rcutorture writer is delayed sufficiently for rcutorture to complain. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
|
#
6aacd88d |
|
13-Jul-2019 |
Paul E. McKenney <paulmck@kernel.org> |
rcu/nocb: EXP Check use and usefulness of ->nocb_lock_contended Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
|
#
d1b222c6 |
|
02-Jul-2019 |
Paul E. McKenney <paulmck@kernel.org> |
rcu/nocb: Add bypass callback queueing Use of the rcu_data structure's segmented ->cblist for no-CBs CPUs takes advantage of unrelated grace periods, thus reducing the memory footprint in the face of floods of call_rcu() invocations. However, the ->cblist field is a more-complex rcu_segcblist structure which must be protected via locking. Even though there are only three entities which can acquire this lock (the CPU invoking call_rcu(), the no-CBs grace-period kthread, and the no-CBs callbacks kthread), the contention on this lock is excessive under heavy stress. This commit therefore greatly reduces contention by provisioning an rcu_cblist structure field named ->nocb_bypass within the rcu_data structure. Each no-CBs CPU is permitted only a limited number of enqueues onto the ->cblist per jiffy, controlled by a new nocb_nobypass_lim_per_jiffy kernel boot parameter that defaults to about 16 enqueues per millisecond (16 * 1000 / HZ). When that limit is exceeded, the CPU instead enqueues onto the new ->nocb_bypass. The ->nocb_bypass is flushed into the ->cblist every jiffy or when the number of callbacks on ->nocb_bypass exceeds qhimark, whichever happens first. During call_rcu() floods, this flushing is carried out by the CPU during the course of its call_rcu() invocations. However, a CPU could simply stop invoking call_rcu() at any time. The no-CBs grace-period kthread therefore carries out less-aggressive flushing (every few jiffies or when the number of callbacks on ->nocb_bypass exceeds (2 * qhimark), whichever comes first). This means that the no-CBs grace-period kthread cannot be permitted to do unbounded waits while there are callbacks on ->nocb_bypass. A ->nocb_bypass_timer is used to provide the needed wakeups. [ paulmck: Apply Coverity feedback reported by Colin Ian King. ] Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
|
#
faca5c25 |
|
26-Jun-2019 |
Paul E. McKenney <paulmck@kernel.org> |
rcu/nocb: Unconditionally advance and wake for excessive CBs When there are excessive numbers of callbacks, and when either the corresponding no-CBs callback kthread is asleep or there is no more ready-to-invoke callbacks, and when least one callback is pending, __call_rcu_nocb_wake() will advance the callbacks, but refrain from awakening the corresponding no-CBs grace-period kthread. However, because rcu_advance_cbs_nowake() is used, it is possible (if a bit unlikely) that the needed advancement could not happen due to a grace period not being in progress. Plus there will always be at least one pending callback due to one having just now been enqueued. This commit therefore attempts to advance callbacks and awakens the no-CBs grace-period kthread when there are excessive numbers of callbacks posted and when the no-CBs callback kthread is not in a position to do anything helpful. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
|
#
4fd8c5f1 |
|
02-Jun-2019 |
Paul E. McKenney <paulmck@kernel.org> |
rcu/nocb: Reduce ->nocb_lock contention with separate ->nocb_gp_lock The sleep/wakeup of the no-CBs grace-period kthreads is synchronized using the ->nocb_lock of the first CPU corresponding to that kthread. This commit provides a separate ->nocb_gp_lock for this purpose, thus reducing contention on ->nocb_lock. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
|
#
523bddd5 |
|
01-Jun-2019 |
Paul E. McKenney <paulmck@kernel.org> |
rcu/nocb: Reduce contention at no-CBs invocation-done time Currently, nocb_cb_wait() unconditionally acquires the leaf rcu_node ->lock to advance callbacks when done invoking the previous batch. It does this while holding ->nocb_lock, which means that contention on the leaf rcu_node ->lock visits itself on the ->nocb_lock. This commit therefore makes this lock acquisition conditional, forgoing callback advancement when the leaf rcu_node ->lock is not immediately available. (In this case, the no-CBs grace-period kthread will eventually do any needed callback advancement.) Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
|
#
6608c3a0 |
|
01-Jun-2019 |
Paul E. McKenney <paulmck@kernel.org> |
rcu/nocb: Reduce contention at no-CBs registry-time CB advancement Currently, __call_rcu_nocb_wake() conditionally acquires the leaf rcu_node structure's ->lock, and only afterwards does rcu_advance_cbs_nowake() check to see if it is possible to advance callbacks without potentially needing to awaken the grace-period kthread. Given that the no-awaken check can be done locklessly, this commit reverses the order, so that rcu_advance_cbs_nowake() is invoked without holding the leaf rcu_node structure's ->lock and rcu_advance_cbs_nowake() checks the grace-period state before conditionally acquiring that lock, thus reducing the number of needless acquistions of the leaf rcu_node structure's ->lock. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
|
#
9fcb09bd |
|
01-Jun-2019 |
Paul E. McKenney <paulmck@kernel.org> |
rcu/nocb: Round down for number of no-CBs grace-period kthreads Currently, when the square root of the number of CPUs is rounded down by int_sqrt(), this round-down is applied to the number of callback kthreads per grace-period kthreads. This makes almost no difference for large systems, but results in oddities such as three no-CBs grace-period kthreads for a five-CPU system, which is a bit excessive. This commit therefore causes the round-down to apply to the number of no-CBs grace-period kthreads, so that systems with from four to eight CPUs have only two no-CBs grace period kthreads. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
|
#
81c0b3d7 |
|
28-May-2019 |
Paul E. McKenney <paulmck@kernel.org> |
rcu/nocb: Avoid ->nocb_lock capture by corresponding CPU A given rcu_data structure's ->nocb_lock can be acquired very frequently by the corresponding CPU and occasionally by the corresponding no-CBs grace-period and callbacks kthreads. In particular, these two kthreads will have frequent gaps between ->nocb_lock acquisitions that are roughly a grace period in duration. This means that any excessive ->nocb_lock contention will be due to the CPU's acquisitions, and this in turn enables a very naive contention-avoidance strategy to be quite effective. This commit therefore modifies rcu_nocb_lock() to first attempt a raw_spin_trylock(), and to atomically increment a separate ->nocb_lock_contended across a raw_spin_lock(). This new ->nocb_lock_contended field is checked in __call_rcu_nocb_wake() when interrupts are enabled, with a spin-wait for contending acquisitions to complete, thus allowing the kthreads a chance to acquire the lock. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
|
#
7f36ef82 |
|
28-May-2019 |
Paul E. McKenney <paulmck@kernel.org> |
rcu/nocb: Avoid needless wakeups of no-CBs grace-period kthread Currently, the code provides an extra wakeup for the no-CBs grace-period kthread if one of its CPUs is generating excessive numbers of callbacks. But satisfying though it is to wake something up when things are going south, unless the thing being awakened can actually help solve the problem, that extra wakeup does nothing but consume additional CPU time, which is exactly what you don't want during a call_rcu() flood. This commit therefore avoids doing anything if the corresponding no-CBs callback kthread is going full tilt. Otherwise, if advancing callbacks immediately might help and if the leaf rcu_node structure's lock is immediately available, this commit invokes a new variant of rcu_advance_cbs() that advances callbacks only if doing so won't require awakening the grace-period kthread (not to be confused with any of the no-CBs grace-period kthreads). Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
|
#
ce0a825e |
|
23-May-2019 |
Paul E. McKenney <paulmck@kernel.org> |
rcu/nocb: Make __call_rcu_nocb_wake() safe for many callbacks It might be hard to imagine having more than two billion callbacks queued on a single CPU's ->cblist, but someone will do it sometime. This commit therefore makes __call_rcu_nocb_wake() handle this situation by upgrading local variable "len" from "int" to "long". Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
|
#
383e1332 |
|
23-May-2019 |
Paul E. McKenney <paulmck@kernel.org> |
rcu/nocb: Never downgrade ->nocb_defer_wakeup in wake_nocb_gp_defer() Currently, wake_nocb_gp_defer() simply stores whatever waketype was passed in, which can result in a RCU_NOCB_WAKE_FORCE being downgraded to RCU_NOCB_WAKE, which could in turn delay callback processing. This commit therefore adds a check so that wake_nocb_gp_defer() only updates ->nocb_defer_wakeup when the update increases the forcefulness, thus avoiding downgrades. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
|
#
aeeacd9d |
|
23-May-2019 |
Paul E. McKenney <paulmck@kernel.org> |
rcu/nocb: Enable re-awakening under high callback load The __call_rcu_nocb_wake() function and its predecessors set ->qlen_last_fqs_check to zero for the first callback and to LONG_MAX / 2 for forced reawakenings. The former can result in a too-quick reawakening when there are many callbacks ready to invoke and the latter prevents a second reawakening. This commit therefore sets ->qlen_last_fqs_check to the current number of callbacks in both cases. While in the area, this commit also moves both assignments under ->nocb_lock. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
|
#
0bd55c69 |
|
12-Aug-2019 |
Paul E. McKenney <paulmck@kernel.org> |
rcu/nohz: Turn off tick for offloaded CPUs Historically, no-CBs CPUs allowed the scheduler-clock tick to be unconditionally disabled on any transition to idle or nohz_full userspace execution (see the rcu_needs_cpu() implementations). Unfortunately, the checks used by rcu_needs_cpu() are defeated now that no-CBs CPUs use ->cblist, which might make users of battery-powered devices rather unhappy. This commit therefore adds explicit rcu_segcblist_is_offloaded() checks to return to the historical energy-efficient semantics. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
|
#
969974e5 |
|
22-May-2019 |
Paul E. McKenney <paulmck@kernel.org> |
rcu/nocb: Suppress uninitialized false-positive in nocb_gp_wait() Some compilers complain that wait_gp_seq might be used uninitialized in nocb_gp_wait(). This cannot actually happen because when wait_gp_seq is uninitialized, needwait_gp must be false, which prevents wait_gp_seq from being used. But this analysis is apparently beyond some compilers, so this commit adds a bogus initialization of wait_gp_seq for the sole purpose of suppressing the false-positive warning. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
|
#
2a777de7 |
|
21-May-2019 |
Paul E. McKenney <paulmck@kernel.org> |
rcu/nocb: Remove obsolete nocb_cb_tail and nocb_cb_head fields Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
|
#
c035280f |
|
21-May-2019 |
Paul E. McKenney <paulmck@kernel.org> |
rcu/nocb: Remove obsolete nocb_q_count and nocb_q_count_lazy fields This commit removes the obsolete nocb_q_count and nocb_q_count_lazy fields, also removing rcu_get_n_cbs_nocb_cpu(), adjusting rcu_get_n_cbs_cpu(), and making rcutree_migrate_callbacks() once again disable the ->cblist fields of offline CPUs. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
|
#
e7f4c5b3 |
|
21-May-2019 |
Paul E. McKenney <paulmck@kernel.org> |
rcu/nocb: Remove obsolete nocb_head and nocb_tail fields Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
|
#
5d6742b3 |
|
15-May-2019 |
Paul E. McKenney <paulmck@kernel.org> |
rcu/nocb: Use rcu_segcblist for no-CBs CPUs Currently the RCU callbacks for no-CBs CPUs are queued on a series of ad-hoc linked lists, which means that these callbacks cannot benefit from "drive-by" grace periods, thus suffering needless delays prior to invocation. In addition, the no-CBs grace-period kthreads first wait for callbacks to appear and later wait for a new grace period, which means that callbacks appearing during a grace-period wait can be delayed. These delays increase memory footprint, and could even result in an out-of-memory condition. This commit therefore enqueues RCU callbacks from no-CBs CPUs on the rcu_segcblist structure that is already used by non-no-CBs CPUs. It also restructures the no-CBs grace-period kthread to be checking for incoming callbacks while waiting for grace periods. Also, instead of waiting for a new grace period, it waits for the closest grace period that will cause some of the callbacks to be safe to invoke. All of these changes reduce callback latency and thus the number of outstanding callbacks, in turn reducing the probability of an out-of-memory condition. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
|
#
e83e73f5 |
|
14-May-2019 |
Paul E. McKenney <paulmck@kernel.org> |
rcu/nocb: Leave ->cblist enabled for no-CBs CPUs As a first step towards making no-CBs CPUs use the ->cblist, this commit leaves the ->cblist enabled for these CPUs. The main reason to make no-CBs CPUs use ->cblist is to take advantage of callback numbering, which will reduce the effects of missed grace periods which in turn will reduce forward-progress problems for no-CBs CPUs. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
|
#
ca5c8258 |
|
16-Apr-2019 |
Paul E. McKenney <paulmck@kernel.org> |
rcu/nocb: Remove deferred wakeup checks for extended quiescent states The idea behind the checks for extended quiescent states at the end of __call_rcu_nocb() is to handle cases where call_rcu() is invoked directly from within an extended quiescent state, for example, from the idle loop. However, this will result in a timer-mediated deferred wakeup, which will cause the needed wakeup to happen within a jiffy or thereabouts. There should be no forward-progress concerns, and if there are, the proper response is to exit the extended quiescent state while executing the endless blast of call_rcu() invocations, for example, using RCU_NONIDLE(). Given the more realistic case of an isolated call_rcu() invocation, there should be no problem. This commit therefore removes the checks for invoking call_rcu() within an extended quiescent state for on no-CBs CPUs. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
|
#
ce5215c1 |
|
12-Apr-2019 |
Paul E. McKenney <paulmck@kernel.org> |
rcu/nocb: Use separate flag to indicate offloaded ->cblist RCU callback processing currently uses rcu_is_nocb_cpu() to determine whether or not the current CPU's callbacks are to be offloaded. This works, but it is not so good for cache locality. Plus use of ->cblist for offloaded callbacks will greatly increase the frequency of these checks. This commit therefore adds a ->offloaded flag to the rcu_segcblist structure to provide a more flexible and cache-friendly means of checking for callback offloading. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
|
#
1bb5f9b9 |
|
12-Apr-2019 |
Paul E. McKenney <paulmck@kernel.org> |
rcu/nocb: Use separate flag to indicate disabled ->cblist NULLing the RCU_NEXT_TAIL pointer was a clever way to save a byte, but forward-progress considerations would require that this pointer be both NULL and non-NULL, which, absent a quantum-computer port of the Linux kernel, simply won't happen. This commit therefore creates as separate ->enabled flag to replace the current NULL checks. [ paulmck: Add include files per 0day test robot and -next. ] Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
|
#
18cd8c93 |
|
01-Jun-2019 |
Paul E. McKenney <paulmck@kernel.org> |
rcu/nocb: Print gp/cb kthread hierarchy if dump_tree This commit causes the no-CBs grace-period/callback hierarchy to be printed to the console when the dump_tree kernel boot parameter is set. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
|
#
f7c612b0 |
|
02-Apr-2019 |
Paul E. McKenney <paulmck@kernel.org> |
rcu/nocb: Rename rcu_nocb_leader_stride kernel boot parameter This commit changes the name of the rcu_nocb_leader_stride kernel boot parameter to rcu_nocb_gp_stride in order to account for the new distinction between callback and grace-period no-CBs kthreads. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
|
#
f7c9a9b6 |
|
01-Apr-2019 |
Paul E. McKenney <paulmck@kernel.org> |
rcu/nocb: Rename and document no-CB CB kthread sleep trace event The nocb_cb_wait() function traces a "FollowerSleep" trace_rcu_nocb_wake() event, which never was documented and is now misleading. This commit therefore changes "FollowerSleep" to "CBSleep", documents this, and updates the documentation for "Sleep" as well. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
|
#
0bdc33da |
|
31-Mar-2019 |
Paul E. McKenney <paulmck@kernel.org> |
rcu/nocb: Rename rcu_organize_nocb_kthreads() local variable This commit renames rdp_leader to rdp_gp in order to account for the new distinction between callback and grace-period no-CBs kthreads. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
|
#
0d52a665 |
|
31-Mar-2019 |
Paul E. McKenney <paulmck@kernel.org> |
rcu/nocb: Rename wake_nocb_leader_defer() to wake_nocb_gp_defer() This commit adjusts naming to account for the new distinction between callback and grace-period no-CBs kthreads. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
|
#
5f675ba6 |
|
31-Mar-2019 |
Paul E. McKenney <paulmck@kernel.org> |
rcu/nocb: Rename __wake_nocb_leader() to __wake_nocb_gp() This commit adjusts naming to account for the new distinction between callback and grace-period no-CBs kthreads. While in the area, it also updates local variables. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
|
#
5d62c08c |
|
31-Mar-2019 |
Paul E. McKenney <paulmck@kernel.org> |
rcu/nocb: Rename wake_nocb_leader() to wake_nocb_gp() This commit adjusts naming to account for the new distinction between callback and grace-period no-CBs kthreads. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
|
#
9fa471a8 |
|
31-Mar-2019 |
Paul E. McKenney <paulmck@kernel.org> |
rcu/nocb: Rename nocb_follower_wait() to nocb_cb_wait() This commit adjusts naming to account for the new distinction between callback and grace-period no-CBs kthreads. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
|
#
12f54c3a |
|
29-Mar-2019 |
Paul E. McKenney <paulmck@kernel.org> |
rcu/nocb: Provide separate no-CBs grace-period kthreads Currently, there is one no-CBs rcuo kthread per CPU, and these kthreads are divided into groups. The first rcuo kthread to come online in a given group is that group's leader, and the leader both waits for grace periods and invokes its CPU's callbacks. The non-leader rcuo kthreads only invoke callbacks. This works well in the real-time/embedded environments for which it was intended because such environments tend not to generate all that many callbacks. However, given huge floods of callbacks, it is possible for the leader kthread to be stuck invoking callbacks while its followers wait helplessly while their callbacks pile up. This is a good recipe for an OOM, and rcutorture's new callback-flood capability does generate such OOMs. One strategy would be to wait until such OOMs start happening in production, but similar OOMs have in fact happened starting in 2018. It would therefore be wise to take a more proactive approach. This commit therefore features per-CPU rcuo kthreads that do nothing but invoke callbacks. Instead of having one of these kthreads act as leader, each group has a separate rcog kthread that handles grace periods for its group. Because these rcuog kthreads do not invoke callbacks, callback floods on one CPU no longer block callbacks from reaching the rcuc callback-invocation kthreads on other CPUs. This change does introduce additional kthreads, however: 1. The number of additional kthreads is about the square root of the number of CPUs, so that a 4096-CPU system would have only about 64 additional kthreads. Note that recent changes decreased the number of rcuo kthreads by a factor of two (CONFIG_PREEMPT=n) or even three (CONFIG_PREEMPT=y), so this still represents a significant improvement on most systems. 2. The leading "rcuo" of the rcuog kthreads should allow existing scripting to affinity these additional kthreads as needed, the same as for the rcuop and rcuos kthreads. (There are no longer any rcuob kthreads.) 3. A state-machine approach was considered and rejected. Although this would allow the rcuo kthreads to continue their dual leader/follower roles, it complicates callback invocation and makes it more difficult to consolidate rcuo callback invocation with existing softirq callback invocation. The introduction of rcuog kthreads should thus be acceptable. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
|
#
6484fe54 |
|
28-Mar-2019 |
Paul E. McKenney <paulmck@kernel.org> |
rcu/nocb: Update comments to prepare for forward-progress work This commit simply rewords comments to prepare for leader nocb kthreads doing only grace-period work and callback shuffling. This will mean the addition of replacement kthreads to invoke callbacks. The "leader" and "follower" thus become less meaningful, so the commit changes no-CB comments with these strings to "GP" and "CB", respectively. (Give or take the usual grammatical transformations.) Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
|
#
58bf6f77 |
|
28-Mar-2019 |
Paul E. McKenney <paulmck@kernel.org> |
rcu/nocb: Rename rcu_data fields to prepare for forward-progress work This commit simply renames rcu_data fields to prepare for leader nocb kthreads doing only grace-period work and callback shuffling. This will mean the addition of replacement kthreads to invoke callbacks. The "leader" and "follower" thus become less meaningful, so the commit changes no-CB fields with these strings to "gp" and "cb", respectively. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
|
#
3545832f |
|
30-Jun-2019 |
Byungchul Park <byungchul.park@lge.com> |
rcu: Change return type of rcu_spawn_one_boost_kthread() The return value of rcu_spawn_one_boost_kthread() is not used any longer. This commit therefore changes its return type from int to void, and removes the cast to void from its callers. Signed-off-by: Byungchul Park <byungchul.park@lge.com> Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
|
#
1f3ebc82 |
|
04-Jun-2019 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Restore barrier() to rcu_read_lock() and rcu_read_unlock() Commit bb73c52bad36 ("rcu: Don't disable preemption for Tiny and Tree RCU readers") removed the barrier() calls from rcu_read_lock() and rcu_write_lock() in CONFIG_PREEMPT=n&&CONFIG_PREEMPT_COUNT=n kernels. Within RCU, this commit was OK, but it failed to account for things like get_user() that can pagefault and that can be reordered by the compiler. Lack of the barrier() calls in rcu_read_lock() and rcu_read_unlock() can cause these page faults to migrate into RCU read-side critical sections, which in CONFIG_PREEMPT=n kernels could result in too-short grace periods and arbitrary misbehavior. Please see commit 386afc91144b ("spinlocks and preemption points need to be at least compiler barriers") and Linus's commit 66be4e66a7f4 ("rcu: locking and unlocking need to always be at least barriers"), this last of which restores the barrier() call to both rcu_read_lock() and rcu_read_unlock(). This commit removes barrier() calls that are no longer needed given that the addition of them in Linus's commit noted above. The combination of this commit and Linus's commit effectively reverts commit bb73c52bad36 ("rcu: Don't disable preemption for Tiny and Tree RCU readers"). Reported-by: Herbert Xu <herbert@gondor.apana.org.au> Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com> [ paulmck: Fix embarrassing typo located by Alan Stern. ]
|
#
cb4dbbfa |
|
30-Jun-2019 |
Joel Fernandes (Google) <joel@joelfernandes.org> |
rcu: Simplify rcu_note_context_switch exit from critical section Because __rcu_read_unlock() can be preempted just before the call to rcu_read_unlock_special(), it is possible for a task to be preempted just before it would have fully exited its RCU read-side critical section. This would result in a needless extension of that critical section until that task was resumed, which might in turn result in a needlessly long grace period, needless RCU priority boosting, and needless force-quiescent-state actions. Therefore, rcu_note_context_switch() invokes __rcu_read_unlock() followed by rcu_preempt_deferred_qs() when it detects this situation. This action by rcu_note_context_switch() ends the RCU read-side critical section immediately. Of course, once the task resumes, it will invoke rcu_read_unlock_special() redundantly. This is harmless because the fact that a preemption happened means that interrupts, preemption, and softirqs cannot have been disabled, so there would be no deferred quiescent state. While ->rcu_read_lock_nesting remains less than zero, none of the ->rcu_read_unlock_special.b bits can be set, and they were all zeroed by the call to rcu_note_context_switch() at task-preemption time. Therefore, setting ->rcu_read_unlock_special.b.exp_hint to false has no effect. Therefore, the extra call to rcu_preempt_deferred_qs_irqrestore() would return immediately. With one possible exception, which is if an expedited grace period started just as the task was being resumed, which could leave ->exp_deferred_qs set. This will cause rcu_preempt_deferred_qs_irqrestore() to invoke rcu_report_exp_rdp(), reporting the quiescent state, just as it should. (Such an expedited grace period won't affect the preemption code path due to interrupts having already been disabled.) But when rcu_note_context_switch() invokes __rcu_read_unlock(), it is doing so with preemption disabled, hence __rcu_read_unlock() will unconditionally defer the quiescent state, only to immediately invoke rcu_preempt_deferred_qs(), thus immediately reporting the deferred quiescent state. It turns out to be safe (and faster) to instead just invoke rcu_preempt_deferred_qs() without the __rcu_read_unlock() middleman. Because this is the invocation during the preemption (as opposed to the invocation just after the resume), at least one of the bits in ->rcu_read_unlock_special.b must be set and ->rcu_read_lock_nesting must be negative. This means that rcu_preempt_need_deferred_qs() must return true, avoiding the early exit from rcu_preempt_deferred_qs(). Thus, rcu_preempt_deferred_qs_irqrestore() will be invoked immediately, as required. This commit therefore simplifies the CONFIG_PREEMPT=y version of rcu_note_context_switch() by removing the "else if" branch of its "if" statement. This change means that all callers that would have invoked rcu_read_unlock_special() followed by rcu_preempt_deferred_qs() will now simply invoke rcu_preempt_deferred_qs(), thus avoiding the rcu_read_unlock_special() middleman when __rcu_read_unlock() is preempted. Cc: rcu@vger.kernel.org Cc: kernel-team@android.com Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
|
#
87446b48 |
|
28-Jun-2019 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Make rcu_read_unlock_special() checks match raise_softirq_irqoff() Threaded interrupts provide additional interesting interactions between RCU and raise_softirq() that can result in self-deadlocks in v5.0-2 of the Linux kernel. These self-deadlocks can be provoked in susceptible kernels within a few minutes using the following rcutorture command on an 8-CPU system: tools/testing/selftests/rcutorture/bin/kvm.sh --duration 5 --configs "TREE03" --bootargs "threadirqs" Although post-v5.2 RCU commits have at least greatly reduced the probability of these self-deadlocks, this was entirely by accident. Although this sort of accident should be rowdily celebrated on those rare occasions when it does occur, such celebrations should be quickly followed by a principled patch, which is what this patch purports to be. The key point behind this patch is that when in_interrupt() returns true, __raise_softirq_irqoff() will never attempt a wakeup. Therefore, if in_interrupt(), calls to raise_softirq*() are both safe and extremely cheap. This commit therefore replaces the in_irq() calls in the "if" statement in rcu_read_unlock_special() with in_interrupt() and simplifies the "if" condition to the following: if (irqs_were_disabled && use_softirq && (in_interrupt() || (exp && !t->rcu_read_unlock_special.b.deferred_qs))) { raise_softirq_irqoff(RCU_SOFTIRQ); } else { /* Appeal to the scheduler. */ } The rationale behind the "if" condition is as follows: 1. irqs_were_disabled: If interrupts are enabled, we should instead appeal to the scheduler so as to let the upcoming irq_enable()/local_bh_enable() do the rescheduling for us. 2. use_softirq: If this kernel isn't using softirq, then raise_softirq_irqoff() will be unhelpful. 3. a. in_interrupt(): If this returns true, the subsequent call to raise_softirq_irqoff() is guaranteed not to do a wakeup, so that call will be both very cheap and quite safe. b. Otherwise, if !in_interrupt() the raise_softirq_irqoff() might do a wakeup, which is expensive and, in some contexts, unsafe. i. The "exp" (an expedited RCU grace period is being blocked) says that the wakeup is worthwhile, and: ii. The !.deferred_qs says that scheduler locks cannot be held, so the wakeup will be safe. Backporting this requires considerable care, so no auto-backport, please! Fixes: 05f415715ce45 ("rcu: Speed up expedited GPs when interrupting RCU reader") Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
|
#
d143b3d1 |
|
22-Jun-2019 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Simplify rcu_read_unlock_special() deferred wakeups In !use_softirq runs, we clearly cannot rely on raise_softirq() and its lightweight bit setting, so we must instead do some form of wakeup. In the absence of a self-IPI when interrupts are disabled, these wakeups can be delayed until the next interrupt occurs. This means that calling invoke_rcu_core() doesn't actually do any expediting. In this case, it is better to take the "else" clause, which sets the current CPU's resched bits and, if there is an expedited grace period in flight, uses IRQ-work to force the needed self-IPI. This commit therefore removes the "else if" clause that calls invoke_rcu_core(). Reported-by: Scott Wood <swood@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
|
#
cd6d17b4 |
|
29-Mar-2019 |
Neeraj Upadhyay <neeraju@codeaurora.org> |
rcu: Dump specified number of blocked tasks The dump_blkd_tasks() function dumps at most 10 blocked tasks, ignoring the value of the ncheck parameter. This commit therefore substitutes the value of ncheck for the hard-coded value of 10. Because all callers currently pass 10 as the number, this patch does not change behavior, but it is clearly an accident waiting to happen. Signed-off-by: Neeraj Upadhyay <neeraju@codeaurora.org> Reviewed-by: Mukesh Ojha <mojha@codeaurora.org> Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
|
#
1bb33644 |
|
27-Mar-2019 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Rename rcu_data's ->deferred_qs to ->exp_deferred_qs The rcu_data structure's ->deferred_qs field is used to indicate that the current CPU is blocking an expedited grace period (perhaps a future one). Given that it is used only for expedited grace periods, its current name is misleading, so this commit renames it to ->exp_deferred_qs. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
|
#
0864f057 |
|
04-Apr-2019 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Use irq_work to get scheduler's attention in clean context When rcu_read_unlock_special() is invoked with interrupts disabled, is either not in an interrupt handler or is not using RCU_SOFTIRQ, is not the first RCU read-side critical section in the chain, and either there is an expedited grace period in flight or this is a NO_HZ_FULL kernel, the end of the grace period can be unduly delayed. The reason for this is that it is not safe to do wakeups in this situation. This commit fixes this problem by using the irq_work subsystem to force a later interrupt handler in a clean environment. Because set_tsk_need_resched(current) and set_preempt_need_resched() are invoked prior to this, the scheduler will force a context switch upon return from this interrupt (though perhaps at the end of any interrupted preempt-disable or BH-disable region of code), which will invoke rcu_note_context_switch() (again in a clean environment), which will in turn give RCU the chance to report the deferred quiescent state. Of course, by then this task might be within another RCU read-side critical section. But that will be detected at that time and reporting will be further deferred to the outermost rcu_read_unlock(). See rcu_preempt_need_deferred_qs() and rcu_preempt_deferred_qs() for more details on the checking. Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
|
#
385b599e |
|
01-Apr-2019 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Allow rcu_read_unlock_special() to raise_softirq() if in_irq() When running in an interrupt handler, raise_softirq() and raise_softirq_irqoff() have extremely low overhead: They simply set a bit in a per-CPU mask, which is checked upon exit from that interrupt handler. Therefore, if rcu_read_unlock_special() is invoked within an interrupt handler and RCU_SOFTIRQ is in use, this commit make use of raise_softirq_irqoff() even if there is no expedited grace period in flight and even if this is not a nohz_full CPU. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
|
#
25102de6 |
|
01-Apr-2019 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Only do rcu_read_unlock_special() wakeups if expedited Currently, rcu_read_unlock_special() will do wakeups whenever it is safe to do so. However, wakeups are expensive, and they are only really needed when the just-ended RCU read-side critical section is blocking an expedited grace period (in which case speed is of the essence) or on a nohz_full CPU (where it might be a good long time before an interrupt arrives). This commit therefore checks for these conditions, and does the expensive wakeups only if doing so would be useful. Note it can be rather expensive to determine whether or not the current task (as opposed to the current CPU) is blocking the current expedited grace period. Doing so requires traversing the ->blkd_tasks list, which can be quite long. This commit therefore cheats: If the current task is on a given ->blkd_tasks list, and some task on that list is blocking the current expedited grace period, the code assumes that the current task is blocking that expedited grace period. Reported-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
|
#
23634ebc |
|
24-Mar-2019 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Check for wakeup-safe conditions in rcu_read_unlock_special() When RCU core processing is offloaded from RCU_SOFTIRQ to the rcuc kthreads, a full and unconditional wakeup is required to initiate RCU core processing. In contrast, when RCU core processing is carried out by RCU_SOFTIRQ, a raise_softirq() suffices. Of course, there are situations where raise_softirq() does a full wakeup, but these do not occur with normal usage of rcu_read_unlock(). The reason that full wakeups can be problematic is that the scheduler sometimes invokes rcu_read_unlock() with its pi or rq locks held, which can of course result in deadlock in CONFIG_PREEMPT=y kernels when rcu_read_unlock() invokes the scheduler. Scheduler invocations can happen in the following situations: (1) The just-ended reader has been subjected to RCU priority boosting, in which case rcu_read_unlock() must deboost, (2) Interrupts were disabled across the call to rcu_read_unlock(), so the quiescent state must be deferred, requiring a wakeup of the rcuc kthread corresponding to the current CPU. Now, the scheduler may hold one of its locks across rcu_read_unlock() only if preemption has been disabled across the entire RCU read-side critical section, which in the days prior to RCU flavor consolidation meant that rcu_read_unlock() never needed to do wakeups. However, this is no longer the case for any but the first rcu_read_unlock() following a condition (e.g., preempted RCU reader) requiring special rcu_read_unlock() attention. For example, an RCU read-side critical section might be preempted, but preemption might be disabled across the rcu_read_unlock(). The rcu_read_unlock() must defer the quiescent state, and therefore leaves the task queued on its leaf rcu_node structure. If a scheduler interrupt occurs, the scheduler might well invoke rcu_read_unlock() with one of its locks held. However, the preempted task is still queued, so rcu_read_unlock() will attempt to defer the quiescent state once more. When RCU core processing is carried out by RCU_SOFTIRQ, this works just fine: The raise_softirq() function simply sets a bit in a per-CPU mask and the RCU core processing will be undertaken upon return from interrupt. Not so when RCU core processing is carried out by the rcuc kthread: In this case, the required wakeup can result in deadlock. The initial solution to this problem was to use set_tsk_need_resched() and set_preempt_need_resched() to force a future context switch, which allows rcu_preempt_note_context_switch() to report the deferred quiescent state to RCU's core processing. Unfortunately for expedited grace periods, there can be a significant delay between the call for a context switch and the actual context switch. This commit therefore introduces a ->deferred_qs flag to the task_struct structure's rcu_special structure. This flag is initially false, and is set to true by the first call to rcu_read_unlock() requiring special attention, then finally reset back to false when the quiescent state is finally reported. Then rcu_read_unlock() attempts full wakeups only when ->deferred_qs is false, that is, on the first rcu_read_unlock() requiring special attention. Note that a chain of RCU readers linked by some other sort of reader may find that a later rcu_read_unlock() is once again able to do a full wakeup, courtesy of an intervening preemption: rcu_read_lock(); /* preempted */ local_irq_disable(); rcu_read_unlock(); /* Can do full wakeup, sets ->deferred_qs. */ rcu_read_lock(); local_irq_enable(); preempt_disable() rcu_read_unlock(); /* Cannot do full wakeup, ->deferred_qs set. */ rcu_read_lock(); preempt_enable(); /* preempted, >deferred_qs reset. */ local_irq_disable(); rcu_read_unlock(); /* Can again do full wakeup, sets ->deferred_qs. */ Such linked RCU readers do not yet seem to appear in the Linux kernel, and it is probably best if they don't. However, RCU needs to handle them, and some variations on this theme could make even raise_softirq() unsafe due to the possibility of its doing a full wakeup. This commit therefore also avoids invoking raise_softirq() when the ->deferred_qs set flag is set. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
|
#
48d07c04 |
|
20-Mar-2019 |
Sebastian Andrzej Siewior <bigeasy@linutronix.de> |
rcu: Enable elimination of Tree-RCU softirq processing Some workloads need to change kthread priority for RCU core processing without affecting other softirq work. This commit therefore introduces the rcutree.use_softirq kernel boot parameter, which moves the RCU core work from softirq to a per-CPU SCHED_OTHER kthread named rcuc. Use of SCHED_OTHER approach avoids the scalability problems that appeared with the earlier attempt to move RCU core processing to from softirq to kthreads. That said, kernels built with RCU_BOOST=y will run the rcuc kthreads at the RCU-boosting priority. Note that rcutree.use_softirq=0 must be specified to move RCU core processing to the rcuc kthreads: rcutree.use_softirq=1 is the default. Reported-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> [ paulmck: Adjust for invoke_rcu_callbacks() only ever being invoked from RCU core processing, in contrast to softirq->rcuc transition in old mainline RCU priority boosting. ] [ paulmck: Avoid wakeups when scheduler might have invoked rcu_read_unlock() while holding rq or pi locks, also possibly fixing a pre-existing latent bug involving raise_softirq()-induced wakeups. ] Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
|
#
59b73a27 |
|
11-Jan-2019 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Move FAST_NO_HZ stall-warning code to tree_stall.h This commit further consolidates the stall-warning code by moving print_cpu_stall_info() and its helper functions along with zero_cpu_stall_ticks() to kernel/rcu/tree_stall.h. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
|
#
40e69ac7 |
|
11-Jan-2019 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Inline RCU stall-warning info helper functions The print_cpu_stall_info_begin() and print_cpu_stall_info_end() print a single character each onto the console, and are a holdover from a time when RCU CPU stall warning messages could be abbreviated using a long-gone Kconfig option. This commit therefore adds these single characters to already-printed strings in the calling functions, and then eliminates both print_cpu_stall_info_begin() and print_cpu_stall_info_end(). Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
|
#
d87cda50 |
|
11-Jan-2019 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Move rcu_print_task_exp_stall() to tree_exp.h Because expedited CPU stall warnings are contained within the kernel/rcu/tree_exp.h file, rcu_print_task_exp_stall() should live there too. This commit carries out the required code motion. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
|
#
3fc3d170 |
|
11-Jan-2019 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Move RCU CPU stall-warning code out of tree_plugin.h The RCU CPU stall-warning code for normal grace periods is currently scattered across two files, due to earlier Tiny RCU support for RCU CPU stall warnings and for old Kconfig options that have long since been retired. Given that it is hard for the lead RCU maintainer to find relevant stall-warning code, it would be good to consolidate it. This commit continues this process by moving stall-warning code from kernel/rcu/tree_plugin.c to a new kernel/rcu/tree_stall.h file. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
|
#
add0d37b |
|
26-Mar-2019 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Correct READ_ONCE()/WRITE_ONCE() for ->rcu_read_unlock_special The task_struct structure's ->rcu_read_unlock_special field is only ever read or written by the owning task, but it is accessed both at process and interrupt levels. It may therefore be accessed using plain reads and writes while interrupts are disabled, but must be accessed using READ_ONCE() and WRITE_ONCE() or better otherwise. This commit makes a few adjustments to align with this discipline. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
|
#
a2badefa |
|
21-Mar-2019 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Eliminate redundant NULL-pointer check Because rcu_wake_cond() checks for a null task_struct pointer, there is no need for its callers to do so. This commit eliminates the redundant check. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
|
#
497e4260 |
|
06-Mar-2019 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Report error for bad rcu_nocbs= parameter values This commit prints a console message when cpulist_parse() reports a bad list of CPUs, and sets all CPUs' bits in that case. The reason for setting all CPUs' bits is that this is the safe(r) choice for real-time workloads, which would normally be the ones using the rcu_nocbs= kernel boot parameter. Either way, later RCU console log messages list the actual set of CPUs whose RCU callbacks will be offloaded. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
|
#
da8739f2 |
|
05-Mar-2019 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Allow rcu_nocbs= to specify all CPUs Currently, the rcu_nocbs= kernel boot parameter requires that a specific list of CPUs be specified, and has no way to say "all of them". As noted by user RavFX in a comment to Phoronix topic 1002538, this is an inconvenient side effect of the removal of the RCU_NOCB_CPU_ALL Kconfig option. This commit therefore enables the rcu_nocbs= kernel boot parameter to be given the string "all", as in "rcu_nocbs=all" to specify that all CPUs on the system are to have their RCU callbacks offloaded. Another approach would be to make cpulist_parse() check for "all", but there are uses of cpulist_parse() that do other checking, which could conflict with an "all". This commit therefore focuses on the specific use of cpulist_parse() in rcu_nocb_setup(). Just a note to other people who would like changes to Linux-kernel RCU: If you send your requests to me directly, they might get fixed somewhat faster. RavFX's comment was posted on January 22, 2018 and I first saw it on March 5, 2019. And the only reason that I found it -at- -all- was that I was looking for projects using RCU, and my search engine showed me that Phoronix comment quite by accident. Your choice, though! ;-) Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
|
#
884157ce |
|
11-Feb-2019 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Make exit_rcu() handle non-preempted RCU readers The purpose of exit_rcu() is to handle cases where buggy code causes a task to exit within an RCU read-side critical section. It currently does that in the case where said RCU read-side critical section was preempted at least once, but fails to handle cases where preemption did not occur. This case needs to be handled because otherwise the final context switch away from the exiting task will incorrectly behave as if task exit were instead a preemption of an RCU read-side critical section, and will therefore queue the exiting task. The exiting task will have exited, and thus won't ever execute rcu_read_unlock(), which means that it will remain queued forever, blocking all subsequent grace periods, and eventually resulting in OOM. Although this is arguably better than letting grace periods proceed and having a later rcu_read_unlock() access the now-freed task structure that once belonged to the exiting tasks, it would obviously be better to correctly handle this case. This commit therefore sets ->rcu_read_lock_nesting to 1 in that case, so that the subsequence call to __rcu_read_unlock() causes the exiting task to exit its dangling RCU read-side critical section. Note that deferred quiescent states need not be considered. The reason is that removing the task from the ->blkd_tasks[] list in the call to rcu_preempt_deferred_qs() handles the per-task component of any deferred quiescent state, and all other components of any deferred quiescent state are associated with the CPU, which isn't going anywhere until some later CPU-hotplug operation, which will report any remaining deferred quiescent states from within the rcu_report_dead() function. Note also that negative values of ->rcu_read_lock_nesting need not be considered. First, these won't show up in exit_rcu() unless there is a serious bug in RCU, and second, setting ->rcu_read_lock_nesting sets the state so that the RCU read-side critical section will be exited normally. Again, this code has no effect unless there has been some prior bug that prevents a task from leaving an RCU read-side critical section before exiting. Furthermore, there have been no reports of the bug fixed by this commit appearing in production. This commit is therefore absolutely -not- recommended for backporting to -stable. Reported-by: ABHISHEK DUBEY <dabhishek@iisc.ac.in> Reported-by: BHARATH Y MOURYA <bharathm@iisc.ac.in> Reported-by: Aravinda Prasad <aravinda@iisc.ac.in> Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com> Tested-by: ABHISHEK DUBEY <dabhishek@iisc.ac.in>
|
#
22e40925 |
|
17-Jan-2019 |
Paul E. McKenney <paulmck@kernel.org> |
rcu/tree: Convert to SPDX license identifier Replace the license boiler plate with a SPDX license identifier. While in the area, update an email address. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com> [ paulmck: Update .h file SPDX comment format per Joe Perches. ] Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
|
#
c98cac60 |
|
21-Nov-2018 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Rename rcu_check_callbacks() to rcu_sched_clock_irq() The name rcu_check_callbacks() arguably made sense back in the early 2000s when RCU was quite a bit simpler than it is today, but it has become quite misleading, especially with the advent of dyntick-idle and NO_HZ_FULL. The rcu_check_callbacks() function is RCU's hook into the scheduling-clock interrupt, and is now but one of many ways that callbacks get promoted to invocable state. This commit therefore changes the name to rcu_sched_clock_irq(), which is the same number of characters and clearly indicates this function's relation to the rest of the Linux kernel. In addition, for the sake of consistency, rcu_flavor_check_callbacks() is also renamed to rcu_flavor_sched_clock_irq(). While in the area, the header comments for both functions are reworked. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
|
#
a9fefdb2 |
|
03-Dec-2018 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Update NOCB comments This commit updates a few obsolete comments in the RCU callback-offload code. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
|
#
f7e972ee |
|
30-Nov-2018 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Move rcu_cpu_has_work to rcu_data structure Given that RCU has a perfectly good per-CPU rcu_data structure, most per-CPU quantities should be stored there. This commit therefore moves the rcu_cpu_has_work per-CPU variable to the rcu_data structure. This also makes this variable unconditionally present, which should be acceptable given the memory reduction due to the RCU flavor consolidation and also due to simplifications this will enable. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
|
#
8b4d0f48 |
|
30-Nov-2018 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Remove unused rcu_cpu_kthread_loops per-CPU variable The rcu_cpu_kthread_loops variable used to provide debugfs information, but is no longer used. This commit therefore removes it. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
|
#
6ffdde28 |
|
30-Nov-2018 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Move rcu_cpu_kthread_status to rcu_data structure Given that RCU has a perfectly good per-CPU rcu_data structure, most per-CPU quantities should be stored there. This commit therefore moves the rcu_cpu_kthread_status per-CPU variable to the rcu_data structure. This also makes this variable unconditionally present, which should be acceptable given the memory reduction due to the RCU flavor consolidation and also due to simplifications this will enable. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
|
#
37f62d7c |
|
30-Nov-2018 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Move rcu_cpu_kthread_task to rcu_data structure Given that RCU has a perfectly good per-CPU rcu_data structure, most per-CPU quantities should be stored there. This commit therefore moves the rcu_cpu_kthread_task per-CPU variable to the rcu_data structure. This also makes this variable unconditionally present, which should be acceptable given the memory reduction due to the RCU flavor consolidation and also due to simplifications this will enable. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
|
#
260e1e4f |
|
29-Nov-2018 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Discard separate per-CPU callback counts Back when there were multiple flavors of RCU, it was necessary to separately count lazy and non-lazy callbacks for each CPU. These counts were used in CONFIG_RCU_FAST_NO_HZ kernels to determine how long a newly idle CPU should be allowed to sleep before handling its RCU callbacks. But now that there is only one flavor, the callback counts for a given CPU's sole rcu_data structure are the counts for that CPU. This commit therefore removes the rcu_data structure's ->nonlazy_posted and ->nonlazy_posted_snap fields, the rcu_idle_count_callbacks_posted() and rcu_cpu_has_callbacks() functions, repurposes the rcu_data structure's ->all_lazy field to record the laziness state at the beginning of the latest idle sojourn, and modifies CONFIG_RCU_FAST_NO_HZ RCU CPU stall warnings accordingly. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
|
#
e5bc3af7 |
|
29-Nov-2018 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Consolidate PREEMPT and !PREEMPT synchronize_rcu() Now that rcu_blocking_is_gp() makes the correct immediate-return decision for both PREEMPT and !PREEMPT, a single implementation of synchronize_rcu() will work correctly under both configurations. This commit therefore eliminates a few lines of code by consolidating the two implementations of synchronize_rcu(). Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
|
#
c46f497a |
|
28-Nov-2018 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Inline rcu_kthread_do_work() into its sole remaining caller The rcu_kthread_do_work() function has a single-line body and only one remaining caller. This commit therefore saves a few lines of code by inlining rcu_kthread_do_work() into its sole remaining caller. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
|
#
ad368d15 |
|
27-Nov-2018 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Rename and comment changes due to only one rcuo kthread per CPU Given RCU flavor consolidation, the name rcu_spawn_all_nocb_kthreads() is quite misleading. It no longer ever creates more than one kthread, and it does so only for the specified CPU. This commit therefore changes this name to the more descriptive rcu_spawn_cpu_nocb_kthread(), and also fixes up a similar issue in its header comment while in the area. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
|
#
903ee83d |
|
02-Oct-2018 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Account for nocb-CPU callback counts in RCU CPU stall warnings The RCU CPU stall warnings print an estimate of the total number of RCU callbacks queued in the system, but this estimate leaves out the callbacks queued for nocbs CPUs. This commit therefore introduces rcu_get_n_cbs_cpu(), which gives an accurate callback estimate for both nocbs and normal CPUs, and uses this new function as needed. This commit also introduces a rcu_get_n_cbs_nocb_cpu() helper function that returns the number of callbacks for nocbs CPUs or zero otherwise, and also uses this function in place of direct access to ->nocb_q_count while in the area (fewer characters, you see). Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
|
#
5ab7ab83 |
|
21-Sep-2018 |
Paul E. McKenney <paulmck@kernel.org> |
rcutorture: Affinity forward-progress test to avoid housekeeping CPUs This commit affinities the forward-progress tests to avoid hogging a housekeeping CPU on the theory that the offloaded callbacks will be running on those housekeeping CPUs. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com> [ paulmck: Fix NULL-pointer issue located by kbuild test robot. ] Tested-by: Rong Chen <rong.a.chen@intel.com>
|
#
5f1a6ef3 |
|
29-Oct-2018 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Avoid signed integer overflow in rcu_preempt_deferred_qs() Subtracting INT_MIN can be interpreted as unconditional signed integer overflow, which according to the C standard is undefined behavior. Therefore, kernel build arguments notwithstanding, it would be good to future-proof the code. This commit therefore substitutes INT_MAX for INT_MIN in order to avoid undefined behavior. While in the neighborhood, this commit also creates some meaningful names for INT_MAX and friends in order to improve readability, as suggested by Joel Fernandes. Reported-by: Ran Rozenstein <ranro@mellanox.com> Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
|
#
117f683c |
|
05-Nov-2018 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Replace this_cpu_ptr() with __this_cpu_read() Because __this_cpu_read() can be lighter weight than equivalent uses of this_cpu_ptr(), this commit replaces the latter with the former. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
|
#
05f41571 |
|
16-Oct-2018 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Speed up expedited GPs when interrupting RCU reader In PREEMPT kernels, an expedited grace period might send an IPI to a CPU that is executing an RCU read-side critical section. In that case, it would be nice if the rcu_read_unlock() directly interacted with the RCU core code to immediately report the quiescent state. And this does happen in the case where the reader has been preempted. But it would also be a nice performance optimization if immediate reporting also happened in the preemption-free case. This commit therefore adds an ->exp_hint field to the task_struct structure's ->rcu_read_unlock_special field. The IPI handler sets this hint when it has interrupted an RCU read-side critical section, and this causes the outermost rcu_read_unlock() call to invoke rcu_read_unlock_special(), which, if preemption is enabled, reports the quiescent state immediately. If preemption is disabled, then the report is required to be deferred until preemption (or bottom halves or interrupts or whatever) is re-enabled. Because this is a hint, it does nothing for more complicated cases. For example, if the IPI interrupts an RCU reader, but interrupts are disabled across the rcu_read_unlock(), but another rcu_read_lock() is executed before interrupts are re-enabled, the hint will already have been cleared. If you do crazy things like this, reporting will be deferred until some later RCU_SOFTIRQ handler, context switch, cond_resched(), or similar. Reported-by: Joel Fernandes <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com> Acked-by: Joel Fernandes (Google) <joel@joelfernandes.org>
|
#
9213784b |
|
22-Oct-2018 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Eliminate BUG_ON() for kernel/rcu/tree_plugin.h The tree_plugin.h file has a number of calls to BUG_ON(), which panics the kernel, which is not a good strategy for devices (like embedded) that don't have a way to capture console output. This commit therefore converts these BUG_ON() calls to WARN_ON_ONCE() and WARN_ONCE(). Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com> [ paulmck: Fix typo: s/rcuo/rcub/. ]
|
#
dc5a4f29 |
|
03-Aug-2018 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Switch ->dynticks to rcu_data structure, remove rcu_dynticks This commit move ->dynticks from the rcu_dynticks structure to the rcu_data structure, replacing the field of the same name. It also updates the code to access ->dynticks from the rcu_data structure and to use the rcu_data structure rather than following to now-gone ->dynticks field to the now-gone rcu_dynticks structure. While in the area, this commit also fixes up comments. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
4c5273bf |
|
03-Aug-2018 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Switch dyntick nesting counters to rcu_data structure This commit removes ->dynticks_nesting and ->dynticks_nmi_nesting from the rcu_dynticks structure and updates the code to access them from the rcu_data structure. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
2dba13f0 |
|
03-Aug-2018 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Switch urgent quiescent-state requests to rcu_data structure This commit removes ->rcu_need_heavy_qs and ->rcu_urgent_qs from the rcu_dynticks structure and updates the code to access them from the rcu_data structure. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
c458a89e |
|
03-Aug-2018 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Switch lazy counts to rcu_data structure This commit removes ->all_lazy, ->nonlazy_posted and ->nonlazy_posted_snap from the rcu_dynticks structure and updates the code to access them from the rcu_data structure. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
5998a75a |
|
03-Aug-2018 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Switch last accelerate/advance to rcu_data structure This commit removes ->last_accelerate and ->last_advance_all from the rcu_dynticks structure and updates the code to access them from the rcu_data structure. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
0fd79e75 |
|
03-Aug-2018 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Switch ->tick_nohz_enabled_snap to rcu_data structure This commit removes ->tick_nohz_enabled_snap from the rcu_dynticks structure and updates the code to access it from the rcu_data structure. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
fced9c8c |
|
26-Jul-2018 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Avoid resched_cpu() when rescheduling the current CPU The resched_cpu() interface is quite handy, but it does acquire the specified CPU's runqueue lock, which does not come for free. This commit therefore substitutes the following when directing resched_cpu() at the current CPU: set_tsk_need_resched(current); set_preempt_need_resched(); Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org>
|
#
d3052109 |
|
25-Jul-2018 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: More aggressively enlist scheduler aid for nohz_full CPUs Because nohz_full CPUs can leave the scheduler-clock interrupt disabled even when in kernel mode, RCU cannot rely on rcu_check_callbacks() to enlist the scheduler's aid in extracting a quiescent state from such CPUs. This commit therefore more aggressively uses resched_cpu() on nohz_full CPUs that fail to pass through a quiescent state in a timely manner. By default, the resched_cpu() beating starts 300 milliseconds into the quiescent state. While in the neighborhood, add a ->last_fqs_resched field to the rcu_data structure in order to rate-limit resched_cpu() calls from the RCU grace-period kthread. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
c06aed0e |
|
25-Jul-2018 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Compute jiffies_till_sched_qs from other kernel parameters The jiffies_till_sched_qs value used to determine how old a grace period must be before RCU enlists the help of the scheduler to force a quiescent state on the holdout CPU. Currently, this defaults to HZ/10 regardless of system size and may be set only at boot time. This can be a problem for very large systems, because if the values of the jiffies_till_first_fqs and jiffies_till_next_fqs kernel parameters are left at their defaults, they are calculated to increase as the number of CPUs actually configured on the system increases. Thus, on a sufficiently large system, RCU would enlist the help of the scheduler before the grace-period kthread had a chance to scan for idle CPUs, which wastes CPU time. This commit therefore allows jiffies_till_sched_qs to be set, if desired, but if left as default, computes is as jiffies_till_first_fqs plus twice jiffies_till_next_fqs, thus allowing three force-quiescent-state scans for idle CPUs. This scales with the number of CPUs, providing sensible default values. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
7e28c5af |
|
11-Jul-2018 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Eliminate ->rcu_qs_ctr from the rcu_dynticks structure The ->rcu_qs_ctr counter was intended to allow providing a lightweight report of a quiescent state to all RCU flavors. But now that there is only one flavor of RCU in any one running kernel, there is no point in having this feature. This commit therefore removes the ->rcu_qs_ctr field from the rcu_dynticks structure and the ->rcu_qs_ctr_snap field from the rcu_data structure. This results in the "rqc" option to the rcu_fqs trace event no longer being used, so this commit also removes the "rqc" description from the header comment. While in the neighborhood, this commit also causes the forward-progress request .rcu_need_heavy_qs be set one jiffies_till_sched_qs interval later in the grace period than the first setting of .rcu_urgent_qs. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
dd46a788 |
|
10-Jul-2018 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Inline _rcu_barrier() into its sole remaining caller Because rcu_barrier() is a one-line wrapper function for _rcu_barrier() and because nothing else calls _rcu_barrier(), this commit inlines _rcu_barrier() into rcu_barrier(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
395a2f09 |
|
10-Jul-2018 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Define rcu_all_qs() only in !PREEMPT builds Now that rcu_all_qs() is used only in !PREEMPT builds, move it to tree_plugin.h so that it is defined only in those builds. This in turn means that rcu_momentary_dyntick_idle() is only used in !PREEMPT builds, but it is simply marked __maybe_unused in order to keep it near the rest of the dyntick-idle code. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
0ae86a27 |
|
07-Jul-2018 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Clean up flavor-related definitions and comments in tree_plugin.h Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
4e95020c |
|
05-Jul-2018 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Inline increment_cpu_stall_ticks() into its sole caller Consolidation of the RCU flavors into one makes increment_cpu_stall_ticks() a trivial one-line function with only one caller. This commit therefore inlines it. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
b97d23c5 |
|
04-Jul-2018 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Remove for_each_rcu_flavor() flavor-traversal macro Now that there is only ever a single flavor of RCU in a given kernel build, there isn't a whole lot of point in having a flavor-traversal macro. This commit therefore removes it and converts calls to it to straightline code, inlining trivial functions as appropriate. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
564a9ae6 |
|
04-Jul-2018 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Remove last non-flavor-traversal rsp local variable from tree_plugin.h This commit removes the last non-flavor-traversal rsp local variable from kernel/rcu/tree_plugin.h in favor of &rcu_state. The flavor-traversal locals will be removed with the removal of flavor traversal. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
88d1bead |
|
04-Jul-2018 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Remove rcu_data structure's ->rsp field Now that there is only one rcu_state structure, there is no need for the rcu_data structure to indicate which it corresponds to. This commit therefore removes the rcu_data structure's ->rsp field, replacing all remaining uses of it with &rcu_state. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
aedf4ba9 |
|
04-Jul-2018 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Remove rsp parameter from rcu_node tree accessor macros There now is only one rcu_state structure in a given build of the Linux kernel, so there is no need to pass it as a parameter to RCU's rcu_node tree's accessor macros. This commit therefore removes the rsp parameter from those macros in kernel/rcu/rcu.h, and removes some now-unused rsp local variables while in the area. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
63d4c8c9 |
|
03-Jul-2018 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Remove rsp parameter from expedited grace-period functions There now is only one rcu_state structure in a given build of the Linux kernel, so there is no need to pass it as a parameter to RCU's functions. This commit therefore removes the rsp parameter from the code in kernel/rcu/tree_exp.h, and removes all of the rsp local variables while in the area. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
4580b054 |
|
03-Jul-2018 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Remove rsp parameter from no-CBs CPU functions There now is only one rcu_state structure in a given build of the Linux kernel, so there is no need to pass it as a parameter to RCU's functions. This commit therefore removes the rsp parameter from rcu_nocb_cpu_needs_barrier(), rcu_spawn_one_nocb_kthread(), rcu_organize_nocb_kthreads(), rcu_nocb_cpu_needs_barrier(), and rcu_nohz_full_cpu(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
b21ebed9 |
|
03-Jul-2018 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Remove rsp parameter from print_cpu_stall_info() There now is only one rcu_state structure in a given build of the Linux kernel, so there is no need to pass it as a parameter to RCU's functions. This commit therefore removes the rsp parameter from print_cpu_stall_info(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
6dbfdc14 |
|
03-Jul-2018 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Remove rsp parameter from rcu_spawn_one_boost_kthread() There now is only one rcu_state structure in a given build of the Linux kernel, so there is no need to pass it as a parameter to RCU's functions. This commit therefore removes the rsp parameter from rcu_spawn_one_boost_kthread(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
81ab59a3 |
|
03-Jul-2018 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Remove rsp parameter from dump_blkd_tasks() and friend There now is only one rcu_state structure in a given build of the Linux kernel, so there is no need to pass it as a parameter to RCU's functions. This commit therefore removes the rsp parameter from dump_blkd_tasks() and rcu_preempt_blocked_readers_cgp(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
a2887cd8 |
|
03-Jul-2018 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Remove rsp parameter from rcu_print_detail_task_stall() There now is only one rcu_state structure in a given build of the Linux kernel, so there is no need to pass it as a parameter to RCU's functions. This commit therefore removes the rsp parameter from rcu_print_detail_task_stall(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
5bb5d09c |
|
03-Jul-2018 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Remove rsp parameter from rcu_do_batch() There now is only one rcu_state structure in a given build of the Linux kernel, so there is no need to pass it as a parameter to RCU's functions. This commit therefore removes the rsp parameter from rcu_do_batch(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
15cabdff |
|
03-Jul-2018 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Remove rsp parameter from note_gp_changes() There now is only one rcu_state structure in a given build of the Linux kernel, so there is no need to pass it as a parameter to RCU's functions. This commit therefore removes the rsp parameter from note_gp_changes(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
02f50142 |
|
03-Jul-2018 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Remove rsp parameter from rcu_accelerate_cbs() There now is only one rcu_state structure in a given build of the Linux kernel, so there is no need to pass it as a parameter to RCU's functions. This commit therefore removes the rsp parameter from rcu_accelerate_cbs(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
532c00c9 |
|
03-Jul-2018 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Remove rsp parameter from rcu_gp_kthread_wake() There now is only one rcu_state structure in a given build of the Linux kernel, so there is no need to pass it as a parameter to RCU's functions. This commit therefore removes the rsp parameter from rcu_gp_kthread_wake(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
336a4f6c |
|
03-Jul-2018 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Remove rsp parameter from rcu_get_root() There now is only one rcu_state structure in a given build of the Linux kernel, so there is no need to pass it as a parameter to RCU's functions. This commit therefore removes the rsp parameter from rcu_get_root(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
de8e8730 |
|
03-Jul-2018 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Remove rsp parameter from rcu_gp_in_progress() There now is only one rcu_state structure in a given build of the Linux kernel, so there is no need to pass it as a parameter to RCU's functions. This commit therefore removes the rsp parameter from rcu_gp_in_progress(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
139ad4da |
|
03-Jul-2018 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Remove rsp parameter from rcu_report_unblock_qs_rnp() There now is only one rcu_state structure in a given build of the Linux kernel, so there is no need to pass it as a parameter to RCU's functions. This commit therefore removes the rsp parameter from rcu_report_unblock_qs_rnp(), which is particularly appropriate in this case given that this parameter is no longer used. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
2280ee5a |
|
03-Jul-2018 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Remove rcu_data_p pointer to default rcu_data structure The rcu_data_p pointer references the default set of per-CPU rcu_data structures, that is, those that call_rcu() uses, as opposed to call_rcu_bh() and sometimes call_rcu_sched(). But there is now only one set of per-CPU rcu_data structures, so that one set is by definition the default, which means that the rcu_data_p pointer no longer serves any useful purpose. This commit therefore removes it. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
16fc9c60 |
|
03-Jul-2018 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Remove rcu_state_p pointer to default rcu_state structure The rcu_state_p pointer references the default rcu_state structure, that is, the one that call_rcu() uses, as opposed to call_rcu_bh() and sometimes call_rcu_sched(). But there is now only one rcu_state structure, so that one structure is by definition the default, which means that the rcu_state_p pointer no longer serves any useful purpose. This commit therefore removes it. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
da1df50d |
|
03-Jul-2018 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Remove rcu_state structure's ->rda field The rcu_state structure's ->rda field was used to find the per-CPU rcu_data structures corresponding to that rcu_state structure. But now there is only one rcu_state structure (creatively named "rcu_state") and one set of per-CPU rcu_data structures (creatively named "rcu_data"). Therefore, uses of the ->rda field can always be replaced by "rcu_data, and this commit makes that change and removes the ->rda field. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
45975c7d |
|
02-Jul-2018 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Define RCU-sched API in terms of RCU for Tree RCU PREEMPT builds Now that RCU-preempt knows about preemption disabling, its implementation of synchronize_rcu() works for synchronize_sched(), and likewise for the other RCU-sched update-side API members. This commit therefore confines the RCU-sched update-side code to CONFIG_PREEMPT=n builds, and defines RCU-sched's update-side API members in terms of those of RCU-preempt. This means that any given build of the Linux kernel has only one update-side flavor of RCU, namely RCU-preempt for CONFIG_PREEMPT=y builds and RCU-sched for CONFIG_PREEMPT=n builds. This in turn means that kernels built with CONFIG_RCU_NOCB_CPU=y have only one rcuo kthread per CPU. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Andi Kleen <ak@linux.intel.com>
|
#
2bbfc25b |
|
02-Jul-2018 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Drop "wake" parameter from rcu_report_exp_rdp() The rcu_report_exp_rdp() function is always invoked with its "wake" argument set to "true", so this commit drops this parameter. The only potential call site that would use "false" is in the code driving the expedited grace period, and that code uses rcu_report_exp_cpu_mult() instead, which therefore retains its "wake" parameter. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
65cfe358 |
|
01-Jul-2018 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Define RCU-bh update API in terms of RCU Now that the main RCU API knows about softirq disabling and softirq's quiescent states, the RCU-bh update code can be dispensed with. This commit therefore removes the RCU-bh update-side implementation and defines RCU-bh's update-side API in terms of that of either RCU-preempt or RCU-sched, depending on the setting of the CONFIG_PREEMPT Kconfig option. In kernels built with CONFIG_RCU_NOCB_CPU=y this has the knock-on effect of reducing by one the number of rcuo kthreads per CPU. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
ba1c64c2 |
|
30-Jun-2018 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Report expedited grace periods at context-switch time This commit reduces the latency of expedited RCU grace periods by reporting a quiescent state for the CPU at context-switch time. In CONFIG_PREEMPT=y kernels, if the outgoing task is still within an RCU read-side critical section (and thus still blocking some grace period, perhaps including this expedited grace period), then that task will already have been placed on one of the leaf rcu_node structures' ->blkd_tasks list. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
d28139c4 |
|
28-Jun-2018 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Apply RCU-bh QSes to RCU-sched and RCU-preempt when safe One necessary step towards consolidating the three flavors of RCU is to make sure that the resulting consolidated "one flavor to rule them all" correctly handles networking denial-of-service attacks. One thing that allows RCU-bh to do so is that __do_softirq() invokes rcu_bh_qs() every so often, and so something similar has to happen for consolidated RCU. This must be done carefully. For example, if a preemption-disabled region of code takes an interrupt which does softirq processing before returning, consolidated RCU must ignore the resulting rcu_bh_qs() invocations -- preemption is still disabled, and that means an RCU reader for the consolidated flavor. This commit therefore creates a new rcu_softirq_qs() that is called only from the ksoftirqd task, thus avoiding the interrupted-a-preempted-region problem. This new rcu_softirq_qs() function invokes rcu_sched_qs(), rcu_preempt_qs(), and rcu_preempt_deferred_qs(). The latter call handles any deferred quiescent states. Note that __do_softirq() still invokes rcu_bh_qs(). It will continue to do so until a later stage of cleanup when the RCU-bh flavor is removed. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> [ paulmck: Fix !SMP issue located by kbuild test robot. ]
|
#
fcc878e4 |
|
28-Jun-2018 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Remove now-unused ->b.exp_need_qs field from the rcu_special union The ->b.exp_need_qs field is now set only to false, so this commit removes it. The job this field used to do is now done by the rcu_data structure's ->deferred_qs field, which is a consequence of a better split between task-based (the rcu_node structure's ->exp_tasks field) and CPU-based (the aforementioned rcu_data structure's ->deferred_qs field) tracking of quiescent states for RCU-preempt expedited grace periods. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
27c744e3 |
|
27-Jun-2018 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Allow processing deferred QSes for exiting RCU-preempt readers If an RCU-preempt read-side critical section is exiting, that is, ->rcu_read_lock_nesting is negative, then it is a good time to look at the possibility of reporting deferred quiescent states. This commit therefore updates the checks in rcu_preempt_need_deferred_qs() to allow exiting critical sections to report deferred quiescent states. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
3e310098 |
|
21-Jun-2018 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Defer reporting RCU-preempt quiescent states when disabled This commit defers reporting of RCU-preempt quiescent states at rcu_read_unlock_special() time when any of interrupts, softirq, or preemption are disabled. These deferred quiescent states are reported at a later RCU_SOFTIRQ, context switch, idle entry, or CPU-hotplug offline operation. Of course, if another RCU read-side critical section has started in the meantime, the reporting of the quiescent state will be further deferred. This also means that disabling preemption, interrupts, and/or softirqs will act as an RCU-preempt read-side critical section. This is enforced by checking preempt_count() as needed. Some special cases must be handled on an ad-hoc basis, for example, context switch is a quiescent state even though both the scheduler and do_exit() disable preemption. In these cases, additional calls to rcu_preempt_deferred_qs() override the preemption disabling. Similar logic overrides disabled interrupts in rcu_preempt_check_callbacks() because in this case the quiescent state happened just before the corresponding scheduling-clock interrupt. In theory, this change lifts a long-standing restriction that required that if interrupts were disabled across a call to rcu_read_unlock() that the matching rcu_read_lock() also be contained within that interrupts-disabled region of code. Because the reporting of the corresponding RCU-preempt quiescent state is now deferred until after interrupts have been enabled, it is no longer possible for this situation to result in deadlocks involving the scheduler's runqueue and priority-inheritance locks. This may allow some code simplification that might reduce interrupt latency a bit. Unfortunately, in practice this would also defer deboosting a low-priority task that had been subjected to RCU priority boosting, so real-time-response considerations might well force this restriction to remain in place. Because RCU-preempt grace periods are now blocked not only by RCU read-side critical sections, but also by disabling of interrupts, preemption, and softirqs, it will be possible to eliminate RCU-bh and RCU-sched in favor of RCU-preempt in CONFIG_PREEMPT=y kernels. This may require some additional plumbing to provide the network denial-of-service guarantees that have been traditionally provided by RCU-bh. Once these are in place, CONFIG_PREEMPT=n kernels will be able to fold RCU-bh into RCU-sched. This would mean that all kernels would have but one flavor of RCU, which would open the door to significant code cleanup. Moving to a single flavor of RCU would also have the beneficial effect of reducing the NOCB kthreads by at least a factor of two. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> [ paulmck: Apply rcu_read_unlock_special() preempt_count() feedback from Joel Fernandes. ] [ paulmck: Adjust rcu_eqs_enter() call to rcu_preempt_deferred_qs() in response to bug reports from kbuild test robot. ] [ paulmck: Fix bug located by kbuild test robot involving recursion via rcu_preempt_deferred_qs(). ]
|
#
89b4cd4b |
|
11-Jun-2018 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Print stall-warning NMI dyntick state in hexadecimal The ->dynticks_nmi_nesting field records the nesting depth of both interrupt and NMI handlers. Because the kernel can enter interrupts and never leave them (and vice versa) and because NMIs can interrupt manipulation of the ->dynticks_nmi_nesting field, the values in this field must be both chosen and maniupated very carefully. As a result, although the value is zero when the corresponding CPU is executing neither an interrupt nor an NMI handler, it is 4,611,686,018,427,387,906 on 64-bit systems when there is a single level of interrupt/NMI handling in progress. This number is difficult to remember and interpret, so this commit switches the output to hexadecimal, resulting in the much nicer 0x4000000000000002. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
ab6b8214 |
|
17-May-2018 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Remove unused local variable "cpu" One danger of using __maybe_unused is that the compiler doesn't yell at you when you remove the last reference, witness rcu_bind_gp_kthread() and its local variable "cpu". This commit removes this local variable. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
164ba3fc |
|
16-May-2018 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Remove unused rcu_kick_nohz_cpu() function The rcu_kick_nohz_cpu() function is no longer used, and the functionality it used to provide is now provided by a call to resched_cpu() in the force-quiescent-state function rcu_implicit_dynticks_qs(). This commit therefore removes rcu_kick_nohz_cpu(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
c7037ff5 |
|
16-May-2018 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Clarify and correct the rcu_preempt_qs() header comment The rcu_preempt_qs() function only applies to the CPU, not the task. A task really is allowed to invoke this function while in an RCU-preempt read-side critical section, but only if it has first added itself to some leaf rcu_node structure's ->blkd_tasks list. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
15651201 |
|
16-May-2018 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Mark task as .need_qs less aggressively If any scheduling-clock interrupt interrupts an RCU-preempt read-side critical section, the interrupted task's ->rcu_read_unlock_special.b.need_qs field is set. This causes the outermost rcu_read_unlock() to incur the extra overhead of calling into rcu_read_unlock_special(). This commit reduces that overhead by setting ->rcu_read_unlock_special.b.need_qs only if the grace period has been in effect for more than one second. Why one second? Because this is comfortably smaller than the minimum RCU CPU stall-warning timeout of three seconds, but long enough that the .need_qs marking should happen quite rarely. And if your RCU read-side critical section has run on-CPU for a full second, it is not unreasonable to invest some CPU time in ending the grace period quickly. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
a7538352 |
|
14-May-2018 |
Joe Perches <joe@perches.com> |
rcu: Use pr_fmt to prefix "rcu: " to logging output This commit also adjusts some whitespace while in the area. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> [ paulmck: Revert string-breaking %s as requested by Andy Shevchenko. ]
|
#
3949fa9b |
|
08-May-2018 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Make rcu_read_unlock_special() static Because rcu_read_unlock_special() is no longer used outside of kernel/rcu/tree_plugin.h, this commit makes it static. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
57738942 |
|
08-May-2018 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Add CPU online/offline state to dump_blkd_tasks() Interactions between CPU-hotplug operations and grace-period initialization can result in dump_blkd_tasks(). One of the first debugging actions in this case is to search back in dmesg to work out which of the affected rcu_node structure's CPUs are online and to determine the last CPU-hotplug operation affecting any of those CPUs. This can be laborious and error-prone, especially when console output is lost. This commit therefore causes dump_blkd_tasks() to dump the state of the affected rcu_node structure's CPUs and the last grace period during which the last offline and online operation affected each of these CPUs. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
ff3cee39 |
|
08-May-2018 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Add up-tree information to dump_blkd_tasks() diagnostics This commit updates dump_blkd_tasks() to print out quiescent-state bitmasks for the rcu_node structures further up the tree. This information helps debugging of interactions between CPU-hotplug operations and RCU grace-period initialization. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
1f3e5f51 |
|
03-May-2018 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Add RCU-preempt check for waiting on newly onlined CPU RCU should only be waiting on CPUs that were online at the time that the current grace period started. Failure to abide by this rule can result in confusing splats during grace-period cleanup and initialization. This commit therefore adds a check to RCU-preempt's preempted-task queuing that checks for waiting on newly onlined CPUs. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
0b107d24 |
|
08-May-2018 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Suppress false-positive splats from mid-init task resume Consider the following sequence of events in a PREEMPT=y kernel: 1. All CPUs corresponding to a given leaf rcu_node structure are offline. 2. The first phase of the rcu_gp_init() function's grace-period initialization runs, and sets that rcu_node structure's ->qsmaskinit to zero, as it should. 3. One of the CPUs corresponding to that rcu_node structure comes back online. Note that because this CPU came online after the grace period started, this grace period can safely ignore this newly onlined CPU. 4. A task running on the newly onlined CPU enters an RCU-preempt read-side critical section, and is then preempted. Because the corresponding rcu_node structure's ->qsmask is zero, rcu_preempt_ctxt_queue() leaves the rcu_node structure's ->gp_tasks field NULL, as it should. 5. The rcu_gp_init() function continues running the second phase of grace-period initialization. The ->qsmask field of the parent of the aforementioned leaf rcu_node structure is set to not expect a quiescent state from the leaf, as is only right and proper. However, when rcu_gp_init() reaches the leaf, it invokes rcu_preempt_check_blocked_tasks(), which sees that the leaf's ->blkd_tasks list is non-empty, and therefore sets the leaf's ->gp_tasks field to reference the first task on that list. 6. The grace period ends before the preempted task resumes, which is perfectly fine, given that this grace period was under no obligation to wait for that task to exit its late-starting RCU-preempt read-side critical section. Unfortunately, the leaf's ->gp_tasks field is non-NULL, so rcu_gp_cleanup() splats. After all, it appears to rcu_gp_cleanup() that the grace period failed to wait for a task that was supposed to be blocking that grace period. This commit avoids this false-positive splat by adding a check of both ->qsmaskinit and ->wait_blkd_tasks to rcu_preempt_check_blocked_tasks(). If both ->qsmaskinit and ->wait_blkd_tasks are zero, then the task must have entered its RCU-preempt read-side critical section late (after all, the CPU that it is running on was not online at that time), which means that the upper-level rcu_node structure won't be waiting for anything on the leaf anyway. If ->wait_blkd_tasks is non-zero, then there is at least one task on ths rcu_node structure's ->blkd_tasks list whose RCU read-side critical section predates the current grace period. If ->qsmaskinit is non-zero, there is at least one CPU that was online at the start of the current grace period. Thus, if both are zero, there is nothing to wait for. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
77cfc7bf |
|
01-May-2018 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Fix typo and add additional debug This commit fixes a typo and adds some additional debugging to the message emitted when a task blocking the current grace period is listed as blocking it when either that grace period ends or the next grace period begins. This commit also reformats the console message for readability. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
ff3bb6f4 |
|
01-May-2018 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Remove ->gpnum and ->completed Now that everything has been converted to use ->gp_seq instead of ->gpnum and ->completed, this commit removes ->gpnum and ->completed. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
db023296 |
|
01-May-2018 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Convert rcu_quiescent_state_report tracepoint to ->gp_seq This commit makes the rcu_quiescent_state_report tracepoint use ->gp_seq instead of ->gpnum. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
865aa1e0 |
|
01-May-2018 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Convert rcu_unlock_preempted_task tracepoint to ->gp_seq This commit makes the rcu_unlock_preempted_task tracepoint use ->gp_seq instead of ->gpnum. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
598ce094 |
|
01-May-2018 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Convert rcu_preempt_task tracepoint to ->gp_seq This commit makes the rcu_preempt_task tracepoint use ->gp_seq instead of ->gpnum. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
477351f7 |
|
01-May-2018 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Convert rcu_grace_period tracepoint to gp_seq This commit makes the rcu_grace_period tracepoint use gp_seq instead of ->gpnum or ->completed. It also introduces a "cpuofl-bgp" string to less obscurely indicate when a CPU has gone offline while a grace period is waiting on it. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
ab5e869c |
|
01-May-2018 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Make rcu_nocb_wait_gp() check if GP already requested This commit makes rcu_nocb_wait_gp() check rdp->gp_seq_needed to see if the current CPU already knows about the needed grace period having already been requested. If so, it avoids acquiring the corresponding leaf rcu_node structure's ->lock, thus decreasing contention. This optimization is intended for cases where either multiple leader rcuo kthreads are running on the same CPU or these kthreads are running on a non-offloaded (e.g., housekeeping) CPU. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> [ paulmck: Move lock release past "if" as suggested by Joel Fernandes. ] [ paulmck: Fix caching of furthest-future requested grace period. ]
|
#
471f87c3 |
|
30-Apr-2018 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Make RCU CPU stall warnings use ->gp_seq This commit makes the RCU CPU stall-warning code in print_other_cpu_stall(), print_cpu_stall(), and check_cpu_stall() use ->gp_seq instead of ->gpnum and ->completed. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
29365e56 |
|
30-Apr-2018 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Convert grace-period requests to ->gp_seq This commit converts the grace-period request code paths from ->completed and ->gpnum to ->gp_seq. The need_future_gp_element() macro encapsulates the shift operation required to use ->gp_seq as an index to the ->need_future_gp[] array. The rcu_cbs_completed() function is removed in favor of the rcu_seq_snap() function. The rcu_start_this_gp() gets some temporary consistency checks and uses rcu_seq_done(), rcu_seq_current(), rcu_seq_state(), and rcu_gp_in_progress() in place of the earlier open-coded comparisons of ->gpnum and ->completed. The rcu_future_gp_cleanup() function replaces use of ->completed with ->gp_seq. The rcu_accelerate_cbs() function replaces a call to rcu_cbs_completed() with one to rcu_seq_snap(). The rcu_advance_cbs() function replaces an access to >completed with one to ->gp_seq and adds some temporary warnings. The rcu_nocb_wait_gp() function replaces a call to rcu_cbs_completed() with one to rcu_seq_snap() and an open-coded comparison with rcu_seq_done(). The temporary warnings will be removed when the various ->gpnum and ->completed fields are removed. Their purpose is to locate code who might still be using ->gpnum and ->completed. (Much easier that way than trying to trace down the causes of too-short grace periods and grace-period hangs!) Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
d43a5d32 |
|
28-Apr-2018 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Convert ->completedqs to ->gp_seq This commit switches the quiescent-state no-backtracking checks from ->gpnum and ->completed to ->gp_seq. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
8aa670cd |
|
28-Apr-2018 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Convert ->rcu_iw_gpnum to ->gp_seq This commit switches the interrupt-disabled detection mechanism to ->gp_seq. This mechanism is used as part of RCU CPU stall warnings, and detects cases where the stall is due to a CPU having interrupts disabled. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
e0da2374 |
|
27-Apr-2018 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Move rcu_nocb_gp_get() to ->gp_seq This commit makes rcu_try_advance_all_cbs() use ->gp_seq. It uses rcu_seq_ctr() in order to shift away the state bits, so that the low-order bits of the result may safely be used to index ->nocb_gp_wq[]. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
03c8cb76 |
|
27-Apr-2018 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Move rcu_try_advance_all_cbs() to ->gp_seq This commit makes rcu_try_advance_all_cbs() use ->gp_seq, with the exception of tracing, which will be converted later. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
ce11fae8 |
|
08-Mar-2018 |
Boqun Feng <boqun.feng@gmail.com> |
rcu: Use the proper lockdep annotation in dump_blkd_tasks() Sparse reported this: | kernel/rcu/tree_plugin.h:814:9: warning: incorrect type in argument 1 (different modifiers) | kernel/rcu/tree_plugin.h:814:9: expected struct lockdep_map const *lock | kernel/rcu/tree_plugin.h:814:9: got struct lockdep_map [noderef] *<noident> This is caused by using vanilla lockdep annotations on rcu_node::lock, and that requires accessing ->lock of rcu_node directly. However we need to keep rcu_node::lock __private to avoid breaking its extra ordering guarantee. And we have a dedicated lockdep annotation for rcu_node::lock, so use it. Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
4bc8d555 |
|
27-Nov-2017 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Add debugging info to assertion The WARN_ON_ONCE(rcu_preempt_blocked_readers_cgp()) in rcu_gp_cleanup() triggers (inexplicably, of course) every so often. This commit therefore extracts more information. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
b3dae109 |
|
12-Jun-2018 |
Peter Zijlstra <peterz@infradead.org> |
sched/swait: Rename to exclusive Since swait basically implemented exclusive waits only, make sure the API reflects that. $ git grep -l -e "\<swake_up\>" -e "\<swait_event[^ (]*" -e "\<prepare_to_swait\>" | while read file; do sed -i -e 's/\<swake_up\>/&_one/g' -e 's/\<swait_event[^ (]*/&_exclusive/g' -e 's/\<prepare_to_swait\>/&_exclusive/g' $file; done With a few manual touch-ups. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: bigeasy@linutronix.de Cc: oleg@redhat.com Cc: paulmck@linux.vnet.ibm.com Cc: pbonzini@redhat.com Link: https://lkml.kernel.org/r/20180612083909.261946548@infradead.org
|
#
41e80595 |
|
12-Apr-2018 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Make rcu_start_future_gp() caller select grace period The rcu_accelerate_cbs() function selects a grace-period target, which it uses to have rcu_segcblist_accelerate() assign numbers to recently queued callbacks. Then it invokes rcu_start_future_gp(), which selects a grace-period target again, which is a bit pointless. This commit therefore changes rcu_start_future_gp() to take the grace-period target as a parameter, thus avoiding double selection. This commit also changes the name of rcu_start_future_gp() to rcu_start_this_gp() to reflect this change in functionality, and also makes a similar change to the name of trace_rcu_future_gp(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Nicholas Piggin <npiggin@gmail.com>
|
#
fb31340f |
|
12-Apr-2018 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Make rcu_gp_cleanup() more accurately predict need for new GP Currently, rcu_gp_cleanup() scans the rcu_node tree in order to reset state to reflect the end of the grace period. It also checks to see whether a new grace period is needed, but in a number of cases, rather than directly cause the new grace period to be immediately started, it instead leaves the grace-period-needed state where various fail-safes can find it. This works fine, but results in higher contention on the root rcu_node structure's ->lock, which is undesirable, and contention on that lock has recently become noticeable. This commit therefore makes rcu_gp_cleanup() immediately start a new grace period if there is any need for one. It is quite possible that it will later be necessary to throttle the grace-period rate, but that can be dealt with when and if. Reported-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Nicholas Piggin <npiggin@gmail.com>
|
#
c91a8675 |
|
18-Apr-2018 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Add accessor macros for the ->need_future_gp[] array Accessors for the ->need_future_gp[] array are currently open-coded, which makes them difficult to change. To improve maintainability, this commit adds need_future_gp_mask() to compute the indexing mask from the array size, need_future_gp_element() to access the element corresponding to the specified grace-period number, and need_any_future_gp() to determine if any future grace period is needed. This commit also applies need_future_gp_element() to existing open-coded single-element accesses. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Nicholas Piggin <npiggin@gmail.com>
|
#
5b4c11d5 |
|
13-Apr-2018 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Add leaf-node macros This commit adds rcu_first_leaf_node() that returns a pointer to the first leaf rcu_node structure in the specified RCU flavor and an rcu_is_leaf_node() that returns true iff the specified rcu_node structure is a leaf. This commit also uses these macros where appropriate. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Nicholas Piggin <npiggin@gmail.com>
|
#
265f5f28 |
|
19-Mar-2018 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Update rcu_bind_gp_kthread() header comment The header comment for rcu_bind_gp_kthread() refers to sysidle, which is no longer with us. However, it is still important to bind RCU's grace-period kthreads to the housekeeping CPU(s), so rather than remove rcu_bind_gp_kthread(), this commit updates the comment. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Nicholas Piggin <npiggin@gmail.com>
|
#
0e5da22e |
|
19-Mar-2018 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Move __rcu_read_lock() and __rcu_read_unlock() to tree_plugin.h The __rcu_read_lock() and __rcu_read_unlock() functions were moved to kernel/rcu/update.c in order to implement tiny preemptible RCU. However, tiny preemptible RCU was removed from the kernel a long time ago, so this commit belatedly moves them back into the only remaining preemptible-RCU code. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Nicholas Piggin <npiggin@gmail.com>
|
#
cee43939 |
|
02-Mar-2018 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Rename cond_resched_rcu_qs() to cond_resched_tasks_rcu_qs() Commit e31d28b6ab8f ("trace: Eliminate cond_resched_rcu_qs() in favor of cond_resched()") substituted cond_resched() for the earlier call to cond_resched_rcu_qs(). However, the new-age cond_resched() does not do anything to help RCU-tasks grace periods because (1) RCU-tasks is only enabled when CONFIG_PREEMPT=y and (2) cond_resched() is a complete no-op when preemption is enabled. This situation results in hangs when running the trace benchmarks. A number of potential fixes were discussed on LKML (https://lkml.kernel.org/r/20180224151240.0d63a059@vmware.local.home), including making cond_resched() not be a no-op; making cond_resched() not be a no-op, but only when running tracing benchmarks; reverting the aforementioned commit (which works because cond_resched_rcu_qs() does provide an RCU-tasks quiescent state; and adding a call to the scheduler/RCU rcu_note_voluntary_context_switch() function. All were deemed unsatisfactory, either due to added cond_resched() overhead or due to magic functions inviting cargo culting. This commit renames cond_resched_rcu_qs() to cond_resched_tasks_rcu_qs(), which provides a clear hint as to what this function is doing and why and where it should be used, and then replaces the call to cond_resched() with cond_resched_tasks_rcu_qs() in the trace benchmark's benchmark_event_kthread() function. Reported-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Nicholas Piggin <npiggin@gmail.com>
|
#
efcd2d54 |
|
27-Feb-2018 |
Byungchul Park <byungchul.park@lge.com> |
rcu: Call wake_nocb_leader_defer() with 'FORCE' when nocb_q_count is high If an excessive number of callbacks have been queued, but the NOCB leader kthread's wakeup must be deferred, then we should wake up the leader unconditionally once it is safe to do so. This was handled correctly in commit fbce7497ee ("rcu: Parallelize and economize NOCB kthread wakeups"), but then commit 8be6e1b15c ("rcu: Use timer as backstop for NOCB deferred wakeups") passed RCU_NOCB_WAKE instead of the correct RCU_NOCB_WAKE_FORCE to wake_nocb_leader_defer(). As an interesting aside, RCU_NOCB_WAKE_FORCE is never passed to anything, which should have been taken as a hint. ;-) This commit therefore passes RCU_NOCB_WAKE_FORCE instead of RCU_NOCB_WAKE to wake_nocb_leader_defer() when a callback is queued onto a NOCB CPU that already has an excessive number of callbacks pending. Signed-off-by: Byungchul Park <byungchul.park@lge.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Nicholas Piggin <npiggin@gmail.com>
|
#
ef126206 |
|
28-Feb-2018 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Don't allocate rcu_nocb_mask if no one needs it Commit 44c65ff2e3b0 ("rcu: Eliminate NOCBs CPU-state Kconfig options") made allocation of rcu_nocb_mask depend only on the rcu_nocbs=, nohz_full=, or isolcpus= kernel boot parameters. However, it failed to change the initial value of rcu_init_nohz()'s local variable need_rcu_nocb_mask to false, which can result in useless allocation of an all-zero rcu_nocb_mask. This commit therefore fixes this bug by changing the initial value of need_rcu_nocb_mask to false. While we are in the area, also correct the error message that is printed when someone specifies that can-never-exist CPUs should be NOCBs CPUs. Reported-by: Byungchul Park <byungchul.park@lge.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Byungchul Park <byungchul.park@lge.com> Tested-by: Nicholas Piggin <npiggin@gmail.com>
|
#
be01b4ca |
|
25-Feb-2018 |
Byungchul Park <byungchul.park@lge.com> |
rcu: Inline rcu_preempt_do_callback() into its sole caller The rcu_preempt_do_callbacks() function was introduced in commit 09223371dea(rcu: Use softirq to address performance regression), where it was necessary to handle kernel builds both containing and not containing RCU-preempt. Since then, various changes (most notably f8b7fc6b51 ("rcu: use softirq instead of kthreads except when RCU_BOOST=y")) have resulted in this function being invoked only from rcu_kthread_do_work(), which is present only in kernels containing RCU-preempt, which in turn means that the rcu_preempt_do_callbacks() function is no longer needed. This commit therefore inlines rcu_preempt_do_callbacks() into its sole remaining caller and also removes the rcu_state_p and rcu_data_p indirection for added clarity. Signed-off-by: Byungchul Park <byungchul.park@lge.com> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> [ paulmck: Remove the rcu_state_p and rcu_data_p indirection. ] Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Nicholas Piggin <npiggin@gmail.com>
|
#
a32e01ee |
|
17-Jan-2018 |
Matthew Wilcox <willy@infradead.org> |
rcu: Use wrapper for lockdep asserts Commits c0b334c5bfa9 and ea9b0c8a26a2 introduced new sparse warnings by accessing rcu_node->lock directly and ignoring the __private marker. Introduce a new wrapper and use it. Also fix a similar problem in srcutree.c introduced by a3883df3935e. Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
62df63e0 |
|
10-Jan-2018 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Remove obsolete callback-invocation statistics for debugfs The debugfs interface displayed statistics on RCU callback invocation but this interface has since been removed. This commit therefore removes the no-longer-used rcu_data structure's ->n_cbs_invoked and ->n_nocbs_invoked fields along with their updates. If this information proves necessary in the future, the corresponding event traces will be added. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
bec06785 |
|
10-Jan-2018 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Remove obsolete boost statistics for debugfs The debugfs interface displayed statistics on RCU priority boosting, but this interface has since been removed. This commit therefore removes the no-longer-used rcu_data structure's ->n_tasks_boosted, ->n_exp_boosts, and ->n_exp_boosts and their updates. If this information proves necessary in the future, the corresponding event traces will be added. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
3caa973b |
|
09-Jan-2018 |
Tejun Heo <tj@kernel.org> |
rcu: Call touch_nmi_watchdog() while printing stall warnings When RCU stall warning triggers, it can print out a lot of messages while holding spinlocks. If the console device is slow (e.g. an actual or IPMI serial console), it may end up triggering NMI hard lockup watchdog like the following. *** CPU printking while holding RCU spinlock PID: 4149739 TASK: ffff881a46baa880 CPU: 13 COMMAND: "CPUThreadPool8" #0 [ffff881fff945e48] crash_nmi_callback at ffffffff8103f7d0 #1 [ffff881fff945e58] nmi_handle at ffffffff81020653 #2 [ffff881fff945eb0] default_do_nmi at ffffffff81020c36 #3 [ffff881fff945ed0] do_nmi at ffffffff81020d32 #4 [ffff881fff945ef0] end_repeat_nmi at ffffffff81956a7e [exception RIP: io_serial_in+21] RIP: ffffffff81630e55 RSP: ffff881fff943b88 RFLAGS: 00000002 RAX: 000000000000ca00 RBX: ffffffff8230e188 RCX: 0000000000000000 RDX: 00000000000002fd RSI: 0000000000000005 RDI: ffffffff8230e188 RBP: ffff881fff943bb0 R8: 0000000000000000 R9: ffffffff820cb3c4 R10: 0000000000000019 R11: 0000000000002000 R12: 00000000000026e1 R13: 0000000000000020 R14: ffffffff820cd398 R15: 0000000000000035 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0000 --- <NMI exception stack> --- #5 [ffff881fff943b88] io_serial_in at ffffffff81630e55 #6 [ffff881fff943b90] wait_for_xmitr at ffffffff8163175c #7 [ffff881fff943bb8] serial8250_console_putchar at ffffffff816317dc #8 [ffff881fff943bd8] uart_console_write at ffffffff8162ac00 #9 [ffff881fff943c08] serial8250_console_write at ffffffff81634691 #10 [ffff881fff943c80] univ8250_console_write at ffffffff8162f7c2 #11 [ffff881fff943c90] console_unlock at ffffffff810dfc55 #12 [ffff881fff943cf0] vprintk_emit at ffffffff810dffb5 #13 [ffff881fff943d50] vprintk_default at ffffffff810e01bf #14 [ffff881fff943d60] vprintk_func at ffffffff810e1127 #15 [ffff881fff943d70] printk at ffffffff8119a8a4 #16 [ffff881fff943dd0] print_cpu_stall_info at ffffffff810eb78c #17 [ffff881fff943e88] rcu_check_callbacks at ffffffff810ef133 #18 [ffff881fff943ee8] update_process_times at ffffffff810f3497 #19 [ffff881fff943f10] tick_sched_timer at ffffffff81103037 #20 [ffff881fff943f38] __hrtimer_run_queues at ffffffff810f3f38 #21 [ffff881fff943f88] hrtimer_interrupt at ffffffff810f442b *** CPU triggering the hardlockup watchdog PID: 4149709 TASK: ffff88010f88c380 CPU: 26 COMMAND: "CPUThreadPool35" #0 [ffff883fff1059d0] machine_kexec at ffffffff8104a874 #1 [ffff883fff105a30] __crash_kexec at ffffffff811116cc #2 [ffff883fff105af0] __crash_kexec at ffffffff81111795 #3 [ffff883fff105b08] panic at ffffffff8119a6ae #4 [ffff883fff105b98] watchdog_overflow_callback at ffffffff81135dbd #5 [ffff883fff105bb0] __perf_event_overflow at ffffffff81186866 #6 [ffff883fff105be8] perf_event_overflow at ffffffff81192bc4 #7 [ffff883fff105bf8] intel_pmu_handle_irq at ffffffff8100b265 #8 [ffff883fff105df8] perf_event_nmi_handler at ffffffff8100489f #9 [ffff883fff105e58] nmi_handle at ffffffff81020653 #10 [ffff883fff105eb0] default_do_nmi at ffffffff81020b94 #11 [ffff883fff105ed0] do_nmi at ffffffff81020d32 #12 [ffff883fff105ef0] end_repeat_nmi at ffffffff81956a7e [exception RIP: queued_spin_lock_slowpath+248] RIP: ffffffff810da958 RSP: ffff883fff103e68 RFLAGS: 00000046 RAX: 0000000000000000 RBX: 0000000000000046 RCX: 00000000006d0000 RDX: ffff883fff49a950 RSI: 0000000000d10101 RDI: ffffffff81e54300 RBP: ffff883fff103e80 R8: ffff883fff11a950 R9: 0000000000000000 R10: 000000000e5873ba R11: 000000000000010f R12: ffffffff81e54300 R13: 0000000000000000 R14: ffff88010f88c380 R15: ffffffff81e54300 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- <NMI exception stack> --- #13 [ffff883fff103e68] queued_spin_lock_slowpath at ffffffff810da958 #14 [ffff883fff103e70] _raw_spin_lock_irqsave at ffffffff8195550b #15 [ffff883fff103e88] rcu_check_callbacks at ffffffff810eed18 #16 [ffff883fff103ee8] update_process_times at ffffffff810f3497 #17 [ffff883fff103f10] tick_sched_timer at ffffffff81103037 #18 [ffff883fff103f38] __hrtimer_run_queues at ffffffff810f3f38 #19 [ffff883fff103f88] hrtimer_interrupt at ffffffff810f442b --- <IRQ stack> --- Avoid spuriously triggering NMI hardlockup watchdog by touching it from the print functions. show_state_filter() shares the same problem and solution. v2: Relocate the comment to where it belongs. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
3016611e |
|
04-Dec-2017 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Fix CPU offload boot message when no CPUs are offloaded In CONFIG_RCU_NOCB_CPU=y kernels, if the boot parameters indicate that none of the CPUs should in fact be offloaded, the following somewhat obtuse message appears: Offload RCU callbacks from CPUs: . This commit therefore makes the message at least grammatically correct in this case: Offload RCU callbacks from CPUs: (none) Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
84b12b75 |
|
17-Nov-2017 |
Rakib Mullick <rakib.mullick@gmail.com> |
rcu: Remove have_rcu_nocb_mask from tree_plugin.h Currently have_rcu_nocb_mask is used to avoid double allocation of rcu_nocb_mask during boot up. Due to different representation of cpumask_var_t on different kernel config CPUMASK=y(or n) it was okay. But now we have a helper cpumask_available(), which can be utilized to check whether rcu_nocb_mask has been allocated or not without using a variable. Removing the variable also reduces vmlinux size. Unpatched version: text data bss dec hex filename 13050393 7852470 14543408 35446271 21cddff vmlinux Patched version: text data bss dec hex filename 13050390 7852438 14543408 35446236 21cdddc vmlinux Signed-off-by: Rakib Mullick <rakib.mullick@gmail.com> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
84585aa8 |
|
04-Oct-2017 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Shrink ->dynticks_{nmi_,}nesting from long long to long Because the ->dynticks_nesting field now only contains the process-based nesting level instead of a value encoding both the process nesting level and the irq "nesting" level, we no longer need a long long, even on 32-bit systems. This commit therefore changes both the ->dynticks_nesting and ->dynticks_nmi_nesting fields to long. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
b04db8e1 |
|
06-Nov-2017 |
Frederic Weisbecker <frederic@kernel.org> |
rcu: Use lockdep to assert IRQs are disabled/enabled Lockdep now has an integrated IRQs disabled/enabled sanity check. Just use it instead of the ad-hoc RCU version. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: David S . Miller <davem@davemloft.net> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/1509980490-4285-15-git-send-email-frederic@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
fd30b717 |
|
22-Oct-2017 |
Kees Cook <keescook@chromium.org> |
rcu: Convert timers to use timer_setup() In preparation for unconditionally passing the struct timer_list pointer to all timer callbacks, switch to using the new timer_setup() and from_timer() to pass the timer pointer explicitly. Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Signed-off-by: Kees Cook <keescook@chromium.org>
|
#
de201559 |
|
26-Oct-2017 |
Frederic Weisbecker <frederic@kernel.org> |
sched/isolation: Introduce housekeeping flags Before we implement isolcpus under housekeeping, we need the isolation features to be more finegrained. For example some people want NOHZ_FULL without the full scheduler isolation, others want full scheduler isolation without NOHZ_FULL. So let's cut all these isolation features piecewise, at the risk of overcutting it right now. We can still merge some flags later if they always make sense together. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Chris Metcalf <cmetcalf@mellanox.com> Cc: Christoph Lameter <cl@linux.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luiz Capitulino <lcapitulino@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Wanpeng Li <kernellwp@gmail.com> Link: http://lkml.kernel.org/r/1509072159-31808-9-git-send-email-frederic@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
78634061 |
|
26-Oct-2017 |
Frederic Weisbecker <frederic@kernel.org> |
sched/isolation: Move housekeeping related code to its own file The housekeeping code is currently tied to the NOHZ code. As we are planning to make housekeeping independent from it, start with moving the relevant code to its own file. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Chris Metcalf <cmetcalf@mellanox.com> Cc: Christoph Lameter <cl@linux.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luiz Capitulino <lcapitulino@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Wanpeng Li <kernellwp@gmail.com> Link: http://lkml.kernel.org/r/1509072159-31808-2-git-send-email-frederic@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
02a7c234 |
|
19-Sep-2017 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Suppress lockdep false-positive ->boost_mtx complaints RCU priority boosting uses rt_mutex_init_proxy_locked() to initialize an rt_mutex structure in locked state held by some other task. When that other task releases it, lockdep complains (quite accurately, but a bit uselessly) that the other task never acquired it. This complaint can suppress other, more helpful, lockdep complaints, and in any case it is a false positive. This commit therefore switches from rt_mutex_unlock() to rt_mutex_futex_unlock(), thereby avoiding the lockdep annotations. Of course, if lockdep ever learns about rt_mutex_init_proxy_locked(), addtional adjustments will be required. Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
b8869781 |
|
18-Oct-2017 |
Sebastian Andrzej Siewior <bigeasy@linutronix.de> |
rcu: Do not include rtmutex_common.h unconditionally This commit adjusts include files and provides definitions in preparation for suppressing lockdep false-positive ->boost_mtx complaints. Without this preparation, architectures not supporting rt_mutex will get build failures. Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
9b9500da |
|
17-Aug-2017 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Make RCU CPU stall warnings check for irq-disabled CPUs One common question upon seeing an RCU CPU stall warning is "did the stalled CPUs have interrupts disabled?" However, the current stall warnings are silent on this point. This commit therefore uses irq_work to check whether stalled CPUs still respond to IPIs, and flags this state in the RCU CPU stall warning console messages. Reported-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
135bd1a2 |
|
06-Aug-2017 |
Neeraj Upadhyay <neeraju@codeaurora.org> |
rcu: Fix up pending cbs check in rcu_prepare_for_idle The pending-callbacks check in rcu_prepare_for_idle() is backwards. It should accelerate if there are pending callbacks, but the check rather uselessly accelerates only if there are no callbacks. This commit therefore inverts this check. Fixes: 15fecf89e46a ("srcu: Abstract multi-tail callback list handling") Signed-off-by: Neeraj Upadhyay <neeraju@codeaurora.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: <stable@vger.kernel.org> # 4.12.x
|
#
9b130ad5 |
|
08-Sep-2017 |
Alexey Dobriyan <adobriyan@gmail.com> |
treewide: make "nr_cpu_ids" unsigned First, number of CPUs can't be negative number. Second, different signnnedness leads to suboptimal code in the following cases: 1) kmalloc(nr_cpu_ids * sizeof(X)); "int" has to be sign extended to size_t. 2) while (loff_t *pos < nr_cpu_ids) MOVSXD is 1 byte longed than the same MOV. Other cases exist as well. Basically compiler is told that nr_cpu_ids can't be negative which can't be deduced if it is "int". Code savings on allyesconfig kernel: -3KB add/remove: 0/0 grow/shrink: 25/264 up/down: 261/-3631 (-3370) function old new delta coretemp_cpu_online 450 512 +62 rcu_init_one 1234 1272 +38 pci_device_probe 374 399 +25 ... pgdat_reclaimable_pages 628 556 -72 select_fallback_rq 446 369 -77 task_numa_find_cpu 1923 1807 -116 Link: http://lkml.kernel.org/r/20170819114959.GA30580@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
2dee9404 |
|
11-Jul-2017 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Add assertions verifying blocked-tasks list This commit adds assertions verifying the consistency of the rcu_node structure's ->blkd_tasks list and its ->gp_tasks, ->exp_tasks, and ->boost_tasks pointers. In particular, the ->blkd_tasks lists must be empty except for leaf rcu_node structures. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
c5ebe66c |
|
19-Jun-2017 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Add event tracing to ->gp_tasks update at GP start There is currently event tracing to track when a task is preempted within a preemptible RCU read-side critical section, and also when that task subsequently reaches its outermost rcu_read_unlock(), but none indicating when a new grace period starts when that grace period must wait on pre-existing readers that have been been preempted at least once since the beginning of their current RCU read-side critical sections. This commit therefore adds an event trace at grace-period start in the case where there are such readers. Note that only the first reader in the list is traced. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
#
bedbb648 |
|
06-Jun-2017 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Add TPS() to event-traced strings Strings used in event tracing need to be specially handled, for example, using the TPS() macro. Without the TPS() macro, although output looks fine from within a running kernel, extracting traces from a crash dump produces garbage instead of strings. This commit therefore adds the TPS() macro to some unadorned strings that were passed to event-tracing macros. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
#
b1a2d79f |
|
26-Jun-2017 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Make NOCB CPUs migrate CBs directly from outgoing CPU RCU's CPU-hotplug callback-migration code first moves the outgoing CPU's callbacks to ->orphan_done and ->orphan_pend, and only then moves them to the NOCB callback list. This commit avoids the extra step (and simplifies the code) by moving the callbacks directly from the outgoing CPU's callback list to the NOCB callback list. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
8be6e1b1 |
|
29-Apr-2017 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Use timer as backstop for NOCB deferred wakeups The handling of RCU's no-CBs CPUs has a maintenance headache, namely that if call_rcu() is invoked with interrupts disabled, the rcuo kthread wakeup must be defered to a point where we can be sure that scheduler locks are not held. Of course, there are a lot of code paths leading from an interrupts-disabled invocation of call_rcu(), and missing any one of these can result in excessive callback-invocation latency, and potentially even system hangs. This commit therefore uses a timer to guarantee that the wakeup will eventually occur. If one of the deferred-wakeup points kicks in, then the timer is simply cancelled. This commit also fixes up an incomplete removal of commits that were intended to plug remaining exit paths, which should have the added benefit of reducing the overhead of RCU's context-switch hooks. In addition, it simplifies leader-to-follower callback-list handoff by introducing locking. The call_rcu()-to-leader handoff continues to use atomic operations in order to maintain good real-time latency for common-case use of call_rcu(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> [ paulmck: Dan Carpenter fix for mod_timer() usage bug found by smatch. ]
|
#
44c65ff2 |
|
15-May-2017 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Eliminate NOCBs CPU-state Kconfig options The CONFIG_RCU_NOCB_CPU_ALL, CONFIG_RCU_NOCB_CPU_NONE, and CONFIG_RCU_NOCB_CPU_ZERO Kconfig options are used only in testing and are redundant with the rcu_nocbs= boot parameter. This commit therefore removes these three Kconfig options and adjusts the rcutorture scripts to use the boot parameter instead. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
ae91aa0a |
|
15-May-2017 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Remove debugfs tracing RCU's debugfs tracing used to be the only reasonable low-level debug information available, but ftrace and event tracing has since surpassed the RCU debugfs level of usefulness. This commit therefore removes RCU's debugfs tracing. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
c4a09ff7 |
|
12-May-2017 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Remove the now-obsolete PROVE_RCU_REPEATEDLY Kconfig option The PROVE_RCU_REPEATEDLY Kconfig option was initially added due to the volume of messages from PROVE_RCU: Doing just one per boot would have required excessive numbers of boots to locate them all. However, PROVE_RCU messages are now relatively rare, so there is no longer any reason to need more than one such message per boot. This commit therefore removes the PROVE_RCU_REPEATEDLY Kconfig option. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@kernel.org>
|
#
fe5ac724 |
|
11-May-2017 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Remove nohz_full full-system-idle state machine The NO_HZ_FULL_SYSIDLE full-system-idle capability was added in 2013 by commit 0edd1b1784cb ("nohz_full: Add full-system-idle state machine"), but has not been used. This commit therefore removes it. If it turns out to be needed later, this commit can always be reverted. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Rik van Riel <riel@redhat.com> Cc: Ingo Molnar <mingo@kernel.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
90040c9e |
|
10-May-2017 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Remove *_SLOW_* Kconfig options The RCU_TORTURE_TEST_SLOW_PREINIT, RCU_TORTURE_TEST_SLOW_PREINIT_DELAY, RCU_TORTURE_TEST_SLOW_PREINIT_DELAY, RCU_TORTURE_TEST_SLOW_INIT, RCU_TORTURE_TEST_SLOW_INIT_DELAY, RCU_TORTURE_TEST_SLOW_CLEANUP, and RCU_TORTURE_TEST_SLOW_CLEANUP_DELAY Kconfig options are only useful for torture testing, and there are the rcutree.gp_cleanup_delay, rcutree.gp_init_delay, and rcutree.gp_preinit_delay kernel boot parameters that rcutorture can use instead. The effect of these parameters is to artificially slow down grace period initialization and cleanup in order to make some types of race conditions happen more often. This commit therefore simplifies Tree RCU a bit by removing the Kconfig options and adding the corresponding kernel parameters to rcutorture's .boot files instead. However, this commit also leaves out the kernel parameters for TREE02, TREE04, and TREE07 in order to have about the same number of tests slowed as not slowed. TREE01, TREE03, TREE05, and TREE06 are slowed, and the rest are not slowed. Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
a68a2bb2 |
|
03-May-2017 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Move docbook comments out of rcupdate.h The include/linux/rcupdate.h file is included by more than 200 files, so shrinking it should provide some build-time benefits. This commit therefore moves several docbook comments from rcupdate.h to kernel/rcu/update.c, kernel/rcu/tree.c, and kernel/rcu/tree_plugin.h, thus reducing the number of times that the compiler has to scan these comments. This likely provides only a small benefit, but every little bit helps. This commit also fixes a malformed bulleted list noted by the 0day Test Robot. Reported-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
6b5fc3a1 |
|
28-Apr-2017 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Add memory barriers for NOCB leader wakeup Wait/wakeup operations do not guarantee ordering on their own. Instead, either locking or memory barriers are required. This commit therefore adds memory barriers to wake_nocb_leader() and nocb_leader_wait(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Krister Johansen <kjlx@templeofstupid.com> Cc: <stable@vger.kernel.org> # 4.6.x
|
#
511324e4 |
|
28-Apr-2017 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Use RCU_NOCB_WAKE rather than RCU_NOGP_WAKE The RCU_NOGP_WAKE_NOT, RCU_NOGP_WAKE, and RCU_NOGP_WAKE_FORCE flags are used to mediate wakeups for the no-CBs CPU kthreads. The "NOGP" really doesn't make any sense, so this commit does s/NOGP/NOCB/. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
ea9b0c8a |
|
28-Apr-2017 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Add lockdep_assert_held() teeth to tree_plugin.h Comments can be helpful, but assertions carry more force. This commit therefore adds lockdep_assert_held() and RCU_LOCKDEP_WARN() calls to enforce lock-held and interrupt-disabled preconditions. Reported-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
17c7798b |
|
28-Apr-2017 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Update rcu_bootup_announce_oddness() This commit updates rcu_bootup_announce_oddness() to check additional Kconfig options and module/boot parameters. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
59d80fd8 |
|
28-Apr-2017 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Print out rcupdate.c non-default boot-time settings This commit adds a rcupdate_announce_bootup_oddness() function to print out non-default values of significant kernel boot parameter settings to aid in debugging. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
e28371c8 |
|
17-Apr-2017 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Remove obsolete reference to synchronize_kernel() The synchronize_kernel() primitive was removed in favor of synchronize_sched() more than a decade ago, and it seems likely that rather few kernel hackers are familiar with it. Its continued presence is therefore providing more confusion than enlightenment. This commit therefore removes the reference from the synchronize_sched() header comment, and adds the corresponding information to the synchronize_rcu(0 header comment. Reported-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
5b72f964 |
|
12-Apr-2017 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Complain if blocking in preemptible RCU read-side critical section Although preemptible RCU allows its read-side critical sections to be preempted, general blocking is forbidden. The reason for this is that excessive preemption times can be handled by CONFIG_RCU_BOOST=y, but a voluntarily blocked task doesn't care how high you boost its priority. Because preemptible RCU is a global mechanism, one ill-behaved reader hurts everyone. Hence the prohibition against general blocking in RCU-preempt read-side critical sections. Preemption yes, blocking no. This commit enforces this prohibition. There is a special exception for the -rt patchset (which they kindly volunteered to implement): It is OK to block (as opposed to merely being preempted) within an RCU-preempt read-side critical section, but only if the blocking is subject to priority inheritance. This exception permits CONFIG_RCU_BOOST=y to get -rt RCU readers out of trouble. Why doesn't this exception also apply to mainline's rt_mutex? Because of the possibility that someone does general blocking while holding an rt_mutex. Yes, the priority boosting will affect the rt_mutex, but it won't help with the task doing general blocking while holding that rt_mutex. Reported-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
933dfbd7 |
|
02-May-2017 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Open-code the rcu_cblist_n_lazy_cbs() function Because the rcu_cblist_n_lazy_cbs() just samples the ->len_lazy counter, and because the rcu_cblist structure is quite straightforward, it makes sense to open-code rcu_cblist_n_lazy_cbs(p) as p->len_lazy, cutting out a level of indirection. This commit makes this change. Reported-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org>
|
#
4b27f20b |
|
02-May-2017 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Open-code the rcu_cblist_n_cbs() function Because the rcu_cblist_n_cbs() just samples the ->len counter, and because the rcu_cblist structure is quite straightforward, it makes sense to open-code rcu_cblist_n_cbs(p) as p->len, cutting out a level of indirection. This commit makes this change. Reported-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org>
|
#
8ef0f37e |
|
02-May-2017 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Open-code the rcu_cblist_empty() function Because the rcu_cblist_empty() just samples the ->head pointer, and because the rcu_cblist structure is quite straightforward, it makes sense to open-code rcu_cblist_empty(p) as !p->head, cutting out a level of indirection. This commit makes this change. Reported-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org>
|
#
5455a7f6 |
|
25-Mar-2017 |
Nicholas Mc Guire <der.herr@hofr.at> |
rcu: Use true/false in assignment to bool This commit makes the parse_rcu_nocb_poll() function assign true (rather than the constant 1) to the bool variable rcu_nocb_poll. Signed-off-by: Nicholas Mc Guire <der.herr@hofr.at> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
15fecf89 |
|
08-Feb-2017 |
Paul E. McKenney <paulmck@kernel.org> |
srcu: Abstract multi-tail callback list handling RCU has only one multi-tail callback list, which is implemented via the nxtlist, nxttail, nxtcompleted, qlen_lazy, and qlen fields in the rcu_data structure, and whose operations are open-code throughout the Tree RCU implementation. This has been more or less OK in the past, but upcoming callback-list optimizations in SRCU could really use a multi-tail callback list there as well. This commit therefore abstracts the multi-tail callback list handling into a new kernel/rcu/rcu_segcblist.h file, and uses this new API. The simple head-and-tail pointer callback list is also abstracted and applied everywhere except for the NOCB callback-offload lists. (Yes, the plan is to apply them there as well, but this commit is already bigger than would be good.) Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
9226b10d |
|
27-Jan-2017 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Place guard on rcu_all_qs() and rcu_note_context_switch() actions The rcu_all_qs() and rcu_note_context_switch() do a series of checks, taking various actions to supply RCU with quiescent states, depending on the outcomes of the various checks. This is a bit much for scheduling fastpaths, so this commit creates a separate ->rcu_urgent_qs field in the rcu_dynticks structure that acts as a global guard for these checks. Thus, in the common case, rcu_all_qs() and rcu_note_context_switch() check the ->rcu_urgent_qs field, find it false, and simply return. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org>
|
#
b17b0153 |
|
08-Feb-2017 |
Ingo Molnar <mingo@kernel.org> |
sched/headers: Prepare for new header dependencies before moving code to <linux/sched/debug.h> We are going to split <linux/sched/debug.h> out of <linux/sched.h>, which will have to be picked up from other headers and a couple of .c files. Create a trivial placeholder <linux/sched/debug.h> file that just maps to <linux/sched.h> to make this patch obviously correct and bisectable. Include the new header in the files that are going to need it. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
ae7e81c0 |
|
01-Feb-2017 |
Ingo Molnar <mingo@kernel.org> |
sched/headers: Prepare for new header dependencies before moving code to <uapi/linux/sched/types.h> We are going to move scheduler ABI details to <uapi/linux/sched/types.h>, which will be used from a number of .c files. Create empty placeholder header that maps to <linux/types.h>. Include the new header in the files that are going to need it. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
02a5c550 |
|
02-Nov-2016 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Abstract extended quiescent state determination This commit is the fourth step towards full abstraction of all accesses to the ->dynticks counter, implementing previously open-coded checks and comparisons in new rcu_dynticks_in_eqs() and rcu_dynticks_in_eqs_since() functions. This abstraction will ease changes to the ->dynticks counter operation. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
|
#
9831ce3b |
|
02-Jan-2017 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Fix comment in rcu_organize_nocb_kthreads() It used to be that the rcuo callback-offload kthreads were spawned in rcu_organize_nocb_kthreads(), and the comment before the "for" loop says as much. However, this spawning has long since moved to the CPU-hotplug code, so this commit fixes this comment. Reported-by: Michalis Kokologiannakis <mixaskok@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
|
#
52d7e48b |
|
10-Jan-2017 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Narrow early boot window of illegal synchronous grace periods The current preemptible RCU implementation goes through three phases during bootup. In the first phase, there is only one CPU that is running with preemption disabled, so that a no-op is a synchronous grace period. In the second mid-boot phase, the scheduler is running, but RCU has not yet gotten its kthreads spawned (and, for expedited grace periods, workqueues are not yet running. During this time, any attempt to do a synchronous grace period will hang the system (or complain bitterly, depending). In the third and final phase, RCU is fully operational and everything works normally. This has been OK for some time, but there has recently been some synchronous grace periods showing up during the second mid-boot phase. This code worked "by accident" for awhile, but started failing as soon as expedited RCU grace periods switched over to workqueues in commit 8b355e3bc140 ("rcu: Drive expedited grace periods from workqueue"). Note that the code was buggy even before this commit, as it was subject to failure on real-time systems that forced all expedited grace periods to run as normal grace periods (for example, using the rcu_normal ksysfs parameter). The callchain from the failure case is as follows: early_amd_iommu_init() |-> acpi_put_table(ivrs_base); |-> acpi_tb_put_table(table_desc); |-> acpi_tb_invalidate_table(table_desc); |-> acpi_tb_release_table(...) |-> acpi_os_unmap_memory |-> acpi_os_unmap_iomem |-> acpi_os_map_cleanup |-> synchronize_rcu_expedited The kernel showing this callchain was built with CONFIG_PREEMPT_RCU=y, which caused the code to try using workqueues before they were initialized, which did not go well. This commit therefore reworks RCU to permit synchronous grace periods to proceed during this mid-boot phase. This commit is therefore a fix to a regression introduced in v4.9, and is therefore being put forward post-merge-window in v4.10. This commit sets a flag from the existing rcu_scheduler_starting() function which causes all synchronous grace periods to take the expedited path. The expedited path now checks this flag, using the requesting task to drive the expedited grace period forward during the mid-boot phase. Finally, this flag is updated by a core_initcall() function named rcu_exp_runtime_mode(), which causes the runtime codepaths to be used. Note that this arrangement assumes that tasks are not sent POSIX signals (or anything similar) from the time that the first task is spawned through core_initcall() time. Fixes: 8b355e3bc140 ("rcu: Drive expedited grace periods from workqueue") Reported-by: "Zheng, Lv" <lv.zheng@intel.com> Reported-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Stan Kain <stan.kain@gmail.com> Tested-by: Ivan <waffolz@hotmail.com> Tested-by: Emanuel Castelo <emanuel.castelo@gmail.com> Tested-by: Bruno Pesavento <bpesavento@infinito.it> Tested-by: Borislav Petkov <bp@suse.de> Tested-by: Frederic Bezies <fredbezies@gmail.com> Cc: <stable@vger.kernel.org> # 4.9.0-
|
#
bedc1969 |
|
15-Jun-2016 |
Ding Tianhong <dingtianhong@huawei.com> |
rcu: Fix soft lockup for rcu_nocb_kthread Carrying out the following steps results in a softlockup in the RCU callback-offload (rcuo) kthreads: 1. Connect to ixgbevf, and set the speed to 10Gb/s. 2. Use ifconfig to bring the nic up and down repeatedly. [ 317.005148] IPv6: ADDRCONF(NETDEV_CHANGE): eth2: link becomes ready [ 368.106005] BUG: soft lockup - CPU#1 stuck for 22s! [rcuos/1:15] [ 368.106005] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 368.106005] task: ffff88057dd8a220 ti: ffff88057dd9c000 task.ti: ffff88057dd9c000 [ 368.106005] RIP: 0010:[<ffffffff81579e04>] [<ffffffff81579e04>] fib_table_lookup+0x14/0x390 [ 368.106005] RSP: 0018:ffff88061fc83ce8 EFLAGS: 00000286 [ 368.106005] RAX: 0000000000000001 RBX: 00000000020155c0 RCX: 0000000000000001 [ 368.106005] RDX: ffff88061fc83d50 RSI: ffff88061fc83d70 RDI: ffff880036d11a00 [ 368.106005] RBP: ffff88061fc83d08 R08: 0000000000000001 R09: 0000000000000000 [ 368.106005] R10: ffff880036d11a00 R11: ffffffff819e0900 R12: ffff88061fc83c58 [ 368.106005] R13: ffffffff816154dd R14: ffff88061fc83d08 R15: 00000000020155c0 [ 368.106005] FS: 0000000000000000(0000) GS:ffff88061fc80000(0000) knlGS:0000000000000000 [ 368.106005] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 368.106005] CR2: 00007f8c2aee9c40 CR3: 000000057b222000 CR4: 00000000000407e0 [ 368.106005] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 368.106005] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 368.106005] Stack: [ 368.106005] 00000000010000c0 ffff88057b766000 ffff8802e380b000 ffff88057af03e00 [ 368.106005] ffff88061fc83dc0 ffffffff815349a6 ffff88061fc83d40 ffffffff814ee146 [ 368.106005] ffff8802e380af00 00000000e380af00 ffffffff819e0900 020155c0010000c0 [ 368.106005] Call Trace: [ 368.106005] <IRQ> [ 368.106005] [ 368.106005] [<ffffffff815349a6>] ip_route_input_noref+0x516/0xbd0 [ 368.106005] [<ffffffff814ee146>] ? skb_release_data+0xd6/0x110 [ 368.106005] [<ffffffff814ee20a>] ? kfree_skb+0x3a/0xa0 [ 368.106005] [<ffffffff8153698f>] ip_rcv_finish+0x29f/0x350 [ 368.106005] [<ffffffff81537034>] ip_rcv+0x234/0x380 [ 368.106005] [<ffffffff814fd656>] __netif_receive_skb_core+0x676/0x870 [ 368.106005] [<ffffffff814fd868>] __netif_receive_skb+0x18/0x60 [ 368.106005] [<ffffffff814fe4de>] process_backlog+0xae/0x180 [ 368.106005] [<ffffffff814fdcb2>] net_rx_action+0x152/0x240 [ 368.106005] [<ffffffff81077b3f>] __do_softirq+0xef/0x280 [ 368.106005] [<ffffffff8161619c>] call_softirq+0x1c/0x30 [ 368.106005] <EOI> [ 368.106005] [ 368.106005] [<ffffffff81015d95>] do_softirq+0x65/0xa0 [ 368.106005] [<ffffffff81077174>] local_bh_enable+0x94/0xa0 [ 368.106005] [<ffffffff81114922>] rcu_nocb_kthread+0x232/0x370 [ 368.106005] [<ffffffff81098250>] ? wake_up_bit+0x30/0x30 [ 368.106005] [<ffffffff811146f0>] ? rcu_start_gp+0x40/0x40 [ 368.106005] [<ffffffff8109728f>] kthread+0xcf/0xe0 [ 368.106005] [<ffffffff810971c0>] ? kthread_create_on_node+0x140/0x140 [ 368.106005] [<ffffffff816147d8>] ret_from_fork+0x58/0x90 [ 368.106005] [<ffffffff810971c0>] ? kthread_create_on_node+0x140/0x140 ==================================cut here============================== It turns out that the rcuos callback-offload kthread is busy processing a very large quantity of RCU callbacks, and it is not reliquishing the CPU while doing so. This commit therefore adds an cond_resched_rcu_qs() within the loop to allow other tasks to run. Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> [ paulmck: Substituted cond_resched_rcu_qs for cond_resched. ] Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
bc75e999 |
|
03-Jun-2016 |
Mark Rutland <mark.rutland@arm.com> |
rcu: Correctly handle sparse possible cpus In many cases in the RCU tree code, we iterate over the set of cpus for a leaf node described by rcu_node::grplo and rcu_node::grphi, checking per-cpu data for each cpu in this range. However, if the set of possible cpus is sparse, some cpus described in this range are not possible, and thus no per-cpu region will have been allocated (or initialised) for them by the generic percpu code. Erroneous accesses to a per-cpu area for these !possible cpus may fault or may hit other data depending on the addressed generated when the erroneous per cpu offset is applied. In practice, both cases have been observed on arm64 hardware (the former being silent, but detectable with additional patches). To avoid issues resulting from this, we must iterate over the set of *possible* cpus for a given leaf node. This patch add a new helper, for_each_leaf_node_possible_cpu, to enable this. As iteration is often intertwined with rcu_node local bitmask manipulation, a new leaf_node_cpu_bit helper is added to make this simpler and more consistent. The RCU tree code is made to use both of these where appropriate. Without this patch, running reboot at a shell can result in an oops like: [ 3369.075979] Unable to handle kernel paging request at virtual address ffffff8008b21b4c [ 3369.083881] pgd = ffffffc3ecdda000 [ 3369.087270] [ffffff8008b21b4c] *pgd=00000083eca48003, *pud=00000083eca48003, *pmd=0000000000000000 [ 3369.096222] Internal error: Oops: 96000007 [#1] PREEMPT SMP [ 3369.101781] Modules linked in: [ 3369.104825] CPU: 2 PID: 1817 Comm: NetworkManager Tainted: G W 4.6.0+ #3 [ 3369.121239] task: ffffffc0fa13e000 ti: ffffffc3eb940000 task.ti: ffffffc3eb940000 [ 3369.128708] PC is at sync_rcu_exp_select_cpus+0x188/0x510 [ 3369.134094] LR is at sync_rcu_exp_select_cpus+0x104/0x510 [ 3369.139479] pc : [<ffffff80081109a8>] lr : [<ffffff8008110924>] pstate: 200001c5 [ 3369.146860] sp : ffffffc3eb9435a0 [ 3369.150162] x29: ffffffc3eb9435a0 x28: ffffff8008be4f88 [ 3369.155465] x27: ffffff8008b66c80 x26: ffffffc3eceb2600 [ 3369.160767] x25: 0000000000000001 x24: ffffff8008be4f88 [ 3369.166070] x23: ffffff8008b51c3c x22: ffffff8008b66c80 [ 3369.171371] x21: 0000000000000001 x20: ffffff8008b21b40 [ 3369.176673] x19: ffffff8008b66c80 x18: 0000000000000000 [ 3369.181975] x17: 0000007fa951a010 x16: ffffff80086a30f0 [ 3369.187278] x15: 0000007fa9505590 x14: 0000000000000000 [ 3369.192580] x13: ffffff8008b51000 x12: ffffffc3eb940000 [ 3369.197882] x11: 0000000000000006 x10: ffffff8008b51b78 [ 3369.203184] x9 : 0000000000000001 x8 : ffffff8008be4000 [ 3369.208486] x7 : ffffff8008b21b40 x6 : 0000000000001003 [ 3369.213788] x5 : 0000000000000000 x4 : ffffff8008b27280 [ 3369.219090] x3 : ffffff8008b21b4c x2 : 0000000000000001 [ 3369.224406] x1 : 0000000000000001 x0 : 0000000000000140 ... [ 3369.972257] [<ffffff80081109a8>] sync_rcu_exp_select_cpus+0x188/0x510 [ 3369.978685] [<ffffff80081128b4>] synchronize_rcu_expedited+0x64/0xa8 [ 3369.985026] [<ffffff80086b987c>] synchronize_net+0x24/0x30 [ 3369.990499] [<ffffff80086ddb54>] dev_deactivate_many+0x28c/0x298 [ 3369.996493] [<ffffff80086b6bb8>] __dev_close_many+0x60/0xd0 [ 3370.002052] [<ffffff80086b6d48>] __dev_close+0x28/0x40 [ 3370.007178] [<ffffff80086bf62c>] __dev_change_flags+0x8c/0x158 [ 3370.012999] [<ffffff80086bf718>] dev_change_flags+0x20/0x60 [ 3370.018558] [<ffffff80086cf7f0>] do_setlink+0x288/0x918 [ 3370.023771] [<ffffff80086d0798>] rtnl_newlink+0x398/0x6a8 [ 3370.029158] [<ffffff80086cee84>] rtnetlink_rcv_msg+0xe4/0x220 [ 3370.034891] [<ffffff80086e274c>] netlink_rcv_skb+0xc4/0xf8 [ 3370.040364] [<ffffff80086ced8c>] rtnetlink_rcv+0x2c/0x40 [ 3370.045663] [<ffffff80086e1fe8>] netlink_unicast+0x160/0x238 [ 3370.051309] [<ffffff80086e24b8>] netlink_sendmsg+0x2f0/0x358 [ 3370.056956] [<ffffff80086a0070>] sock_sendmsg+0x18/0x30 [ 3370.062168] [<ffffff80086a21cc>] ___sys_sendmsg+0x26c/0x280 [ 3370.067728] [<ffffff80086a30ac>] __sys_sendmsg+0x44/0x88 [ 3370.073027] [<ffffff80086a3100>] SyS_sendmsg+0x10/0x20 [ 3370.078153] [<ffffff8008085e70>] el0_svc_naked+0x24/0x28 Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reported-by: Dennis Chen <dennis.chen@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Steve Capper <steve.capper@arm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Will Deacon <will.deacon@arm.com> Cc: linux-kernel@vger.kernel.org Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
4e9a073f |
|
19-Apr-2016 |
Paul E. McKenney <paulmck@kernel.org> |
torture: Remove CONFIG_RCU_TORTURE_TEST_RUNNABLE, simplify code This commit removes CONFIG_RCU_TORTURE_TEST_RUNNABLE in favor of the already-existing rcutorture.torture_runnable kernel boot parameter. It also converts an #ifdef into IS_ENABLED(), saving a few lines of code. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
40e0a6cf |
|
15-Apr-2016 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Move expedited code from tree_plugin.h to tree_exp.h People have been having some difficulty finding their way around the RCU code. This commit therefore pulls some of the expedited grace-period code from tree_plugin.h to a new tree_exp.h file. This commit is strictly code movement. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
aff12cdf |
|
16-Mar-2016 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Consolidate expedited GP code into exp_funnel_lock() This commit pulls the grace-period-start counter adjustment and tracing from synchronize_rcu_expedited() and synchronize_sched_expedited() into exp_funnel_lock(), thus eliminating some code duplication. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
179e5dcd |
|
16-Mar-2016 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Consolidate expedited GP tracing into rcu_exp_gp_seq_snap() This commit moves some duplicate code from synchronize_rcu_expedited() and synchronize_sched_expedited() into rcu_exp_gp_seq_snap(). This doesn't save lines of code, but does eliminate a "tell me twice" issue. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
4ea3e85b |
|
16-Mar-2016 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Consolidate expedited GP code into rcu_exp_wait_wake() Currently, synchronize_rcu_expedited() and rcu_sched_expedited() have significant duplicate code. This commit therefore consolidates some of this code into rcu_exp_wake(), which is now renamed to rcu_exp_wait_wake() in recognition of its added responsibilities. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
f6a12f34 |
|
30-Jan-2016 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Enforce expedited-GP fairness via funnel wait queue The current mutex-based funnel-locking approach used by expedited grace periods is subject to severe unfairness. The problem arises when a few tasks, making a path from leaves to root, all wake up before other tasks do. A new task can then follow this path all the way to the root, which needlessly delays tasks whose grace period is done, but who do not happen to acquire the lock quickly enough. This commit avoids this problem by maintaining per-rcu_node wait queues, along with a per-rcu_node counter that tracks the latest grace period sought by an earlier task to visit this node. If that grace period would satisfy the current task, instead of proceeding up the tree, it waits on the current rcu_node structure using a pair of wait queues provided for that purpose. This decouples awakening of old tasks from the arrival of new tasks. If the wakeups prove to be a bottleneck, additional kthreads can be brought to bear for that purpose. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
4f415302 |
|
28-Jan-2016 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Add expedited-grace-period event tracing Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
bea2de44 |
|
28-Jan-2016 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Add funnel-locking tracing for expedited grace periods Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
26ece8ef |
|
28-Jan-2016 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Fix synchronize_rcu_expedited() header comment This commit brings the synchronize_rcu_expedited() function's header comment into line with the new implementation. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
abedf8e2 |
|
19-Feb-2016 |
Paul Gortmaker <paul.gortmaker@windriver.com> |
rcu: Use simple wait queues where possible in rcutree As of commit dae6e64d2bcfd ("rcu: Introduce proper blocking to no-CBs kthreads GP waits") the RCU subsystem started making use of wait queues. Here we convert all additions of RCU wait queues to use simple wait queues, since they don't need the extra overhead of the full wait queue features. Originally this was done for RT kernels[1], since we would get things like... BUG: sleeping function called from invalid context at kernel/rtmutex.c:659 in_atomic(): 1, irqs_disabled(): 1, pid: 8, name: rcu_preempt Pid: 8, comm: rcu_preempt Not tainted Call Trace: [<ffffffff8106c8d0>] __might_sleep+0xd0/0xf0 [<ffffffff817d77b4>] rt_spin_lock+0x24/0x50 [<ffffffff8106fcf6>] __wake_up+0x36/0x70 [<ffffffff810c4542>] rcu_gp_kthread+0x4d2/0x680 [<ffffffff8105f910>] ? __init_waitqueue_head+0x50/0x50 [<ffffffff810c4070>] ? rcu_gp_fqs+0x80/0x80 [<ffffffff8105eabb>] kthread+0xdb/0xe0 [<ffffffff8106b912>] ? finish_task_switch+0x52/0x100 [<ffffffff817e0754>] kernel_thread_helper+0x4/0x10 [<ffffffff8105e9e0>] ? __init_kthread_worker+0x60/0x60 [<ffffffff817e0750>] ? gs_change+0xb/0xb ...and hence simple wait queues were deployed on RT out of necessity (as simple wait uses a raw lock), but mainline might as well take advantage of the more streamline support as well. [1] This is a carry forward of work from v3.10-rt; the original conversion was by Thomas on an earlier -rt version, and Sebastian extended it to additional post-3.10 added RCU waiters; here I've added a commit log and unified the RCU changes into one, and uprev'd it to match mainline RCU. Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: linux-rt-users@vger.kernel.org Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/1455871601-27484-6-git-send-email-wagi@monom.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
065bb78c |
|
19-Feb-2016 |
Daniel Wagner <daniel.wagner@bmw-carit.de> |
rcu: Do not call rcu_nocb_gp_cleanup() while holding rnp->lock rcu_nocb_gp_cleanup() is called while holding rnp->lock. Currently, this is okay because the wake_up_all() in rcu_nocb_gp_cleanup() will not enable the IRQs. lockdep is happy. By switching over using swait this is not true anymore. swake_up_all() enables the IRQs while processing the waiters. __do_softirq() can now run and will eventually call rcu_process_callbacks() which wants to grap nrp->lock. Let's move the rcu_nocb_gp_cleanup() call outside the lock before we switch over to swait. If we would hold the rnp->lock and use swait, lockdep reports following: ================================= [ INFO: inconsistent lock state ] 4.2.0-rc5-00025-g9a73ba0 #136 Not tainted --------------------------------- inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage. rcu_preempt/8 [HC0[0]:SC0[0]:HE1:SE1] takes: (rcu_node_1){+.?...}, at: [<ffffffff811387c7>] rcu_gp_kthread+0xb97/0xeb0 {IN-SOFTIRQ-W} state was registered at: [<ffffffff81109b9f>] __lock_acquire+0xd5f/0x21e0 [<ffffffff8110be0f>] lock_acquire+0xdf/0x2b0 [<ffffffff81841cc9>] _raw_spin_lock_irqsave+0x59/0xa0 [<ffffffff81136991>] rcu_process_callbacks+0x141/0x3c0 [<ffffffff810b1a9d>] __do_softirq+0x14d/0x670 [<ffffffff810b2214>] irq_exit+0x104/0x110 [<ffffffff81844e96>] smp_apic_timer_interrupt+0x46/0x60 [<ffffffff81842e70>] apic_timer_interrupt+0x70/0x80 [<ffffffff810dba66>] rq_attach_root+0xa6/0x100 [<ffffffff810dbc2d>] cpu_attach_domain+0x16d/0x650 [<ffffffff810e4b42>] build_sched_domains+0x942/0xb00 [<ffffffff821777c2>] sched_init_smp+0x509/0x5c1 [<ffffffff821551e3>] kernel_init_freeable+0x172/0x28f [<ffffffff8182cdce>] kernel_init+0xe/0xe0 [<ffffffff8184231f>] ret_from_fork+0x3f/0x70 irq event stamp: 76 hardirqs last enabled at (75): [<ffffffff81841330>] _raw_spin_unlock_irq+0x30/0x60 hardirqs last disabled at (76): [<ffffffff8184116f>] _raw_spin_lock_irq+0x1f/0x90 softirqs last enabled at (0): [<ffffffff810a8df2>] copy_process.part.26+0x602/0x1cf0 softirqs last disabled at (0): [< (null)>] (null) other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(rcu_node_1); <Interrupt> lock(rcu_node_1); *** DEADLOCK *** 1 lock held by rcu_preempt/8: #0: (rcu_node_1){+.?...}, at: [<ffffffff811387c7>] rcu_gp_kthread+0xb97/0xeb0 stack backtrace: CPU: 0 PID: 8 Comm: rcu_preempt Not tainted 4.2.0-rc5-00025-g9a73ba0 #136 Hardware name: Dell Inc. PowerEdge R820/066N7P, BIOS 2.0.20 01/16/2014 0000000000000000 000000006d7e67d8 ffff881fb081fbd8 ffffffff818379e0 0000000000000000 ffff881fb0812a00 ffff881fb081fc38 ffffffff8110813b 0000000000000000 0000000000000001 ffff881f00000001 ffffffff8102fa4f Call Trace: [<ffffffff818379e0>] dump_stack+0x4f/0x7b [<ffffffff8110813b>] print_usage_bug+0x1db/0x1e0 [<ffffffff8102fa4f>] ? save_stack_trace+0x2f/0x50 [<ffffffff811087ad>] mark_lock+0x66d/0x6e0 [<ffffffff81107790>] ? check_usage_forwards+0x150/0x150 [<ffffffff81108898>] mark_held_locks+0x78/0xa0 [<ffffffff81841330>] ? _raw_spin_unlock_irq+0x30/0x60 [<ffffffff81108a28>] trace_hardirqs_on_caller+0x168/0x220 [<ffffffff81108aed>] trace_hardirqs_on+0xd/0x10 [<ffffffff81841330>] _raw_spin_unlock_irq+0x30/0x60 [<ffffffff810fd1c7>] swake_up_all+0xb7/0xe0 [<ffffffff811386e1>] rcu_gp_kthread+0xab1/0xeb0 [<ffffffff811089bf>] ? trace_hardirqs_on_caller+0xff/0x220 [<ffffffff81841341>] ? _raw_spin_unlock_irq+0x41/0x60 [<ffffffff81137c30>] ? rcu_barrier+0x20/0x20 [<ffffffff810d2014>] kthread+0x104/0x120 [<ffffffff81841330>] ? _raw_spin_unlock_irq+0x30/0x60 [<ffffffff810d1f10>] ? kthread_create_on_node+0x260/0x260 [<ffffffff8184231f>] ret_from_fork+0x3f/0x70 [<ffffffff810d1f10>] ? kthread_create_on_node+0x260/0x260 Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: linux-rt-users@vger.kernel.org Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/1455871601-27484-5-git-send-email-wagi@monom.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
67c583a7 |
|
28-Dec-2015 |
Boqun Feng <boqun.feng@gmail.com> |
RCU: Privatize rcu_node::lock In patch: "rcu: Add transitivity to remaining rcu_node ->lock acquisitions" All locking operations on rcu_node::lock are replaced with the wrappers because of the need of transitivity, which indicates we should never write code using LOCK primitives alone(i.e. without a proper barrier following) on rcu_node::lock outside those wrappers. We could detect this kind of misuses on rcu_node::lock in the future by adding __private modifier on rcu_node::lock. To privatize rcu_node::lock, unlock wrappers are also needed. Replacing spinlock unlocks with these wrappers not only privatizes rcu_node::lock but also makes it easier to figure out critical sections of rcu_node. This patch adds __private modifier to rcu_node::lock and makes every access to it wrapped by ACCESS_PRIVATE(). Besides, unlock wrappers are added and raw_spin_unlock(&rnp->lock) and its friends are replaced with those wrappers. Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
1914aab5 |
|
26-Dec-2015 |
Chen Gang <chengang@emindsoft.com.cn> |
rcu: Remove useless rcu_data_p when !PREEMPT_RCU The related warning from gcc 6.0: In file included from kernel/rcu/tree.c:4630:0: kernel/rcu/tree_plugin.h:810:40: warning: ‘rcu_data_p’ defined but not used [-Wunused-const-variable] static struct rcu_data __percpu *const rcu_data_p = &rcu_sched_data; ^~~~~~~~~~ Also remove always redundant rcu_data_p in tree.c. Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
a87f203e |
|
20-Oct-2015 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Eliminate unused rcu_init_one() argument Now that the rcu_state structure's ->rda field is compile-time initialized, there is no need to pass the per-CPU rcu_data structure into rcu_init_one(). This commit therefore eliminates this now-unused parameter. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
46a5d164 |
|
07-Oct-2015 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Stop disabling interrupts in scheduler fastpaths We need the scheduler's fastpaths to be, well, fast, and unnecessarily disabling and re-enabling interrupts is not necessarily consistent with this goal. Especially given that there are regions of the scheduler that already have interrupts disabled. This commit therefore moves the call to rcu_note_context_switch() to one of the interrupts-disabled regions of the scheduler, and removes the now-redundant disabling and re-enabling of interrupts from rcu_note_context_switch() and the functions it calls. Reported-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> [ paulmck: Shift rcu_note_context_switch() to avoid deadlock, as suggested by Peter Zijlstra. ]
|
#
f0f2e7d3 |
|
29-Sep-2015 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Avoid tick_nohz_active checks on NOCBs CPUs Currently, rcu_prepare_for_idle() checks for tick_nohz_active, even on individual NOCBs CPUs, unless all CPUs are marked as NOCBs CPUs at build time. This check is pointless on NOCBs CPUs because they never have any callbacks posted, given that all of their callbacks are handed off to the corresponding rcuo kthread. There is a check for individually designated NOCBs CPUs, but it pointelessly follows the check for tick_nohz_active. This commit therefore moves the check for individually designated NOCBs CPUs up with the check for CONFIG_RCU_NOCB_CPU_ALL. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
699d4035 |
|
29-Sep-2015 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Fix obsolete rcu_bootup_announce_oddness() comment This function no longer has #ifdefs, so this commit removes the header comment calling them out. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
8ba9153b |
|
29-Sep-2015 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Remove lock-acquisition loop from rcu_read_unlock_special() Several releases have come and gone without the warning triggering, so remove the lock-acquisition loop. Retain the WARN_ON_ONCE() out of sheer paranoia. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
5a9be7c6 |
|
24-Nov-2015 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Add rcu_normal kernel parameter to suppress expediting Although expedited grace periods can be quite useful, and although their OS jitter has been greatly reduced, they can still pose problems for extreme real-time workloads. This commit therefore adds a rcu_normal kernel boot parameter (which can also be manipulated via sysfs) to suppress expedited grace periods, that is, to treat requests for expedited grace periods as if they were requests for normal grace periods. If both rcu_expedited and rcu_normal are specified, rcu_normal wins. This means that if you are relying on expedited grace periods to speed up boot, you will want to specify rcu_expedited on the kernel command line, and then specify rcu_normal via sysfs once boot completes. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
6cf10081 |
|
08-Oct-2015 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Add transitivity to remaining rcu_node ->lock acquisitions The rule is that all acquisitions of the rcu_node structure's ->lock must provide transitivity: The lock is not acquired that frequently, and sorting out exactly which required it and which did not would be a maintenance nightmare. This commit therefore supplies the needed transitivity to the remaining ->lock acquisitions. Reported-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
2a67e741 |
|
07-Oct-2015 |
Peter Zijlstra <peterz@infradead.org> |
rcu: Create transitive rnp->lock acquisition functions Providing RCU's memory-ordering guarantees requires that the rcu_node tree's locking provide transitive memory ordering, which the Linux kernel's spinlocks currently do not provide unless smp_mb__after_unlock_lock() is used. Having a separate smp_mb__after_unlock_lock() after each and every lock acquisition is error-prone, hard to read, and a bit annoying, so this commit provides wrapper functions that pull in the smp_mb__after_unlock_lock() invocations. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
b08517c7 |
|
18-Aug-2015 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Enable stall warnings for synchronize_rcu_expedited() This commit redirects synchronize_rcu_expedited()'s wait to synchronize_sched_expedited_wait(), thus enabling RCU CPU stall warnings. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
74611ecb |
|
18-Aug-2015 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Add online/offline info to expedited stall warning message This commit makes the RCU CPU stall warning message print online/offline indications immediately after the CPU number. A "O" indicates global offline, a "." global online, and a "o" indicates RCU believes that the CPU is offline for the current grace period and "." otherwise, and an "N" indicates that RCU believes that the CPU will be offline for the next grace period, and "." otherwise, all right after the CPU number. So for CPU 10, you would normally see "10-...:" indicating that everything believes that the CPU is online. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
dcdb8807 |
|
15-Aug-2015 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Consolidate expedited CPU selection Now that sync_sched_exp_select_cpus() and sync_rcu_exp_select_cpus() are identical aside from the the argument to smp_call_function_single(), this commit consolidates them with a functional argument. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
7f21aeef |
|
15-Aug-2015 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Add online/offline info to stall warning message This commit makes the RCU CPU stall warning message print online/offline indications immediately after a hyphen following the CPU number. A "O" indicates that the global CPU-hotplug system believes that the CPU is online, a "o" that RCU perceived the CPU to be online at the beginning of the current expedited grace period, and an "N" that RCU currently believes that it will perceive the CPU as being online at the beginning of the next expedited grace period, with "." otherwise for all three indications. So for CPU 10, you would normally see "10-OoN:" indicating that everything believes that the CPU is online. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
b6a4ae76 |
|
28-Jul-2015 |
Boqun Feng <boqun.feng@gmail.com> |
rcu: Use rcu_callback_t in call_rcu*() and friends As we now have rcu_callback_t typedefs as the type of rcu callbacks, we should use it in call_rcu*() and friends as the type of parameters. This could save us a few lines of code and make it clear which function requires an rcu callbacks rather than other callbacks as its argument. Besides, this can also help cscope to generate a better database for code reading. Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
|
#
5b74c458 |
|
06-Aug-2015 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Make ->cpu_no_qs be a union for aggregate OR This commit converts the rcu_data structure's ->cpu_no_qs field to a union. The bytewise side of this union allows individual access to indications as to whether this CPU needs to find a quiescent state for a normal (.norm) and/or expedited (.exp) grace period. The setwise side of the union allows testing whether or not a quiescent state is needed at all, for either type of grace period. For now, only .norm is used. A later commit will introduce the expedited usage. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
0d43eb34 |
|
06-Aug-2015 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Invert passed_quiesce and rename to cpu_no_qs This commit inverts the sense of the rcu_data structure's ->passed_quiesce field and renames it to ->cpu_no_qs. This will allow a later commit to use an "aggregate OR" operation to test expedited as well as normal grace periods without added overhead. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
97c668b8 |
|
06-Aug-2015 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Rename qs_pending to core_needs_qs An upcoming commit needs to invert the sense of the ->passed_quiesce rcu_data structure field, so this commit is taking this opportunity to clarify things a bit by renaming ->qs_pending to ->core_needs_qs. So if !rdp->core_needs_qs, then this CPU need not concern itself with quiescent states, in particular, it need not acquire its leaf rcu_node structure's ->lock to check. Otherwise, it needs to report the next quiescent state. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
8203d6d0 |
|
02-Aug-2015 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Use single-stage IPI algorithm for RCU expedited grace period The current preemptible-RCU expedited grace-period algorithm invokes synchronize_sched_expedited() to enqueue all tasks currently running in a preemptible-RCU read-side critical section, then waits for all the ->blkd_tasks lists to drain. This works, but results in both an IPI and a double context switch even on CPUs that do not happen to be running in a preemptible RCU read-side critical section. This commit implements a new algorithm that causes less OS jitter. This new algorithm IPIs all online CPUs that are not idle (from an RCU perspective), but refrains from self-IPIs. If a CPU receiving this IPI is not in a preemptible RCU read-side critical section (or is just now exiting one), it pushes quiescence up the rcu_node tree, otherwise, it sets a flag that will be handled by the upcoming outermost rcu_read_unlock(), which will then push quiescence up the tree. The expedited grace period must of course wait on any pre-existing blocked readers, and newly blocked readers must be queued carefully based on the state of both the normal and the expedited grace periods. This new queueing approach also avoids the need to update boost state, courtesy of the fact that blocked tasks are no longer ever migrated to the root rcu_node structure. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
b9585e94 |
|
31-Jul-2015 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Consolidate tree setup for synchronize_rcu_expedited() This commit replaces sync_rcu_preempt_exp_init1(() and sync_rcu_preempt_exp_init2() with sync_exp_reset_tree_hotplug() and sync_exp_reset_tree(), which will also be used by synchronize_sched_expedited(), and sync_rcu_exp_select_nodes(), which contains code specific to synchronize_rcu_expedited(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
7922cd0e |
|
31-Jul-2015 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Move rcu_report_exp_rnp() to allow consolidation This is a nearly pure code-movement commit, moving rcu_report_exp_rnp(), sync_rcu_preempt_exp_done(), and rcu_preempted_readers_exp() so that later commits can make synchronize_sched_expedited() use them. The non-code-movement portion of this commit tags rcu_report_exp_rnp() as __maybe_unused to avoid build errors when CONFIG_PREEMPT=n. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
f4ecea30 |
|
29-Jul-2015 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Use rsp->expedited_wq instead of sync_rcu_preempt_exp_wq Now that there is an ->expedited_wq waitqueue in each rcu_state structure, there is no need for the sync_rcu_preempt_exp_wq global variable. This commit therefore substitutes ->expedited_wq for sync_rcu_preempt_exp_wq. It also initializes ->expedited_wq only once at boot instead of at the start of each expedited grace period. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
9a54f98e |
|
14-Jul-2015 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Don't disable CPU hotplug during OOM notifiers RCU's rcu_oom_notify() disables CPU hotplug in order to stabilize the list of online CPUs, which it traverses. However, this is completely pointless because smp_call_function_single() will quietly fail if invoked on an offline CPU. Because the count of requests is incremented in the rcu_oom_notify_cpu() function that is remotely invoked, everything works nicely even in the face of concurrent CPU-hotplug operations. Furthermore, in recent kernels, invoking get_online_cpus() from an OOM notifier can result in deadlock. This commit therefore removes the call to get_online_cpus() and put_online_cpus() from rcu_oom_notify(). Reported-by: Marcin Åšlusarz <marcin.slusarz@gmail.com> Reported-by: David Rientjes <rientjes@google.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: David Rientjes <rientjes@google.com> Tested-by: Marcin Åšlusarz <marcin.slusarz@gmail.com>
|
#
f78f5b90 |
|
18-Jun-2015 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Rename rcu_lockdep_assert() to RCU_LOCKDEP_WARN() This commit renames rcu_lockdep_assert() to RCU_LOCKDEP_WARN() for consistency with the WARN() series of macros. This also requires inverting the sense of the conditional, which this commit also does. Reported-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Ingo Molnar <mingo@kernel.org>
|
#
bc17ea10 |
|
06-Jun-2015 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Fix obsolete priority-boosting comment Tasks are no longer migrated to the root rcu_node, so there is no longer any need for a boost kthread for the root rcu_node, and there no longer is such a kthread. This commit therefore fixes the comment in rcu_boost_kthread()'s header to reflect this new reality. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
704dd435 |
|
27-Jun-2015 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Consolidate last open-coded expedited memory barrier One of the requirements on RCU grace periods is that if there is a causal chain of operations that starts after one grace period and ends before another grace period, then the two grace periods must be serialized. There has been (and might still be) code that relies on this, for example, certain types of reference-counting code that does a call_rcu() within an RCU callback function. This requirement is why there is an smp_mb() at the end of both synchronize_sched_expedited() and synchronize_rcu_expedited(). However, this is the only smp_mb() in these functions, so it would be nicer to consolidate it into rcu_exp_gp_seq_end(). This commit does just that. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
29fd9309 |
|
25-Jun-2015 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Use funnel locking for synchronize_rcu_expedited()'s polling loop This commit gets rid of synchronize_rcu_expedited()'s mutex_trylock() polling loop in favor of the funnel-locking scheme that was abstracted from synchronize_sched_expedited(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
543c6158 |
|
25-Jun-2015 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Make synchronize_rcu_expedited() use sequence-counter scheme Although synchronize_rcu_expedited() uses a sequence-counter scheme, it is based on a single increment per grace period, which means that tasks piggybacking off of concurrent grace periods may be forced to wait longer than necessary. This commit therefore applies the new sequence-count functions developed for synchronize_sched_expedited() to speed things up a bit and to consolidate the sequence-counter implementation. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
75c27f11 |
|
11-Jun-2015 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Remove CONFIG_RCU_CPU_STALL_INFO The CONFIG_RCU_CPU_STALL_INFO has been default-y for a couple of releases with no complaints, so it is time to eliminate this Kconfig option entirely, so that the long-form RCU CPU stall warnings cannot be disabled. This commit does just that. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
9b683874 |
|
11-Jun-2015 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Stop disabling CPU hotplug in synchronize_rcu_expedited() The fact that tasks could be migrated from leaf to root rcu_node structures meant that synchronize_rcu_expedited() had to disable CPU hotplug. However, tasks now stay put, so this commit removes the CPU-hotplug disabling from synchronize_rcu_expedited(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
42621697 |
|
03-Jun-2015 |
Alexander Gordeev <agordeev@redhat.com> |
rcu: Simplify arithmetic to calculate number of RCU nodes This update makes arithmetic to calculate number of RCU nodes more straight and easy to read. Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Alexander Gordeev <agordeev@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
bc7a34b8 |
|
26-May-2015 |
Thomas Gleixner <tglx@linutronix.de> |
timer: Reduce timer migration overhead if disabled Eric reported that the timer_migration sysctl is not really nice performance wise as it needs to check at every timer insertion whether the feature is enabled or not. Further the check does not live in the timer code, so we have an extra function call which checks an extra cache line to figure out that it is disabled. We can do better and store that information in the per cpu (hr)timer bases. I pondered to use a static key, but that's a nightmare to update from the nohz code and the timer base cache line is hot anyway when we select a timer base. The old logic enabled the timer migration unconditionally if CONFIG_NO_HZ was set even if nohz was disabled on the kernel command line. With this modification, we start off with migration disabled. The user visible sysctl is still set to enabled. If the kernel switches to NOHZ migration is enabled, if the user did not disable it via the sysctl prior to the switch. If nohz=off is on the kernel command line, migration stays disabled no matter what. Before: 47.76% hog [.] main 14.84% [kernel] [k] _raw_spin_lock_irqsave 9.55% [kernel] [k] _raw_spin_unlock_irqrestore 6.71% [kernel] [k] mod_timer 6.24% [kernel] [k] lock_timer_base.isra.38 3.76% [kernel] [k] detach_if_pending 3.71% [kernel] [k] del_timer 2.50% [kernel] [k] internal_add_timer 1.51% [kernel] [k] get_nohz_timer_target 1.28% [kernel] [k] __internal_add_timer 0.78% [kernel] [k] timerfn 0.48% [kernel] [k] wake_up_nohz_cpu After: 48.10% hog [.] main 15.25% [kernel] [k] _raw_spin_lock_irqsave 9.76% [kernel] [k] _raw_spin_unlock_irqrestore 6.50% [kernel] [k] mod_timer 6.44% [kernel] [k] lock_timer_base.isra.38 3.87% [kernel] [k] detach_if_pending 3.80% [kernel] [k] del_timer 2.67% [kernel] [k] internal_add_timer 1.33% [kernel] [k] __internal_add_timer 0.73% [kernel] [k] timerfn 0.54% [kernel] [k] wake_up_nohz_cpu Reported-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: John Stultz <john.stultz@linaro.org> Cc: Joonwoo Park <joonwoop@codeaurora.org> Cc: Wenbo Wang <wenbo.wang@memblaze.com> Link: http://lkml.kernel.org/r/20150526224512.127050787@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
47d631af |
|
21-Apr-2015 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Make RCU able to tolerate undefined CONFIG_RCU_FANOUT_LEAF This commit introduces an RCU_FANOUT_LEAF C-preprocessor macro so that RCU will build even when CONFIG_RCU_FANOUT_LEAF is undefined. The RCU_FANOUT_LEAF macro is set to the value of CONFIG_RCU_FANOUT_LEAF when defined, otherwise it is set to 32 for 32-bit systems and 64 for 64-bit systems. This commit then makes CONFIG_RCU_FANOUT_LEAF depend on CONFIG_RCU_EXPERT, so that Kconfig users won't be asked about CONFIG_RCU_FANOUT_LEAF unless they want to be. Reported-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Pranith Kumar <bobby.prani@gmail.com>
|
#
05c5df31 |
|
20-Apr-2015 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Make RCU able to tolerate undefined CONFIG_RCU_FANOUT This commit introduces an RCU_FANOUT C-preprocessor macro so that RCU will build even when CONFIG_RCU_FANOUT is undefined. The RCU_FANOUT macro is set to the value of CONFIG_RCU_FANOUT when defined, otherwise it is set to 32 for 32-bit systems and 64 for 64-bit systems. This commit then makes CONFIG_RCU_FANOUT depend on CONFIG_RCU_EXPERT, so that Kconfig users won't be asked about CONFIG_RCU_FANOUT unless they want to be. Reported-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Pranith Kumar <bobby.prani@gmail.com>
|
#
7fa27001 |
|
20-Apr-2015 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Convert CONFIG_RCU_FANOUT_EXACT to boot parameter The CONFIG_RCU_FANOUT_EXACT Kconfig parameter is used primarily (and perhaps only) by rcutorture to verify that RCU works correctly in specific rcu_node combining-tree configurations. It therefore does not make much sense have this as a question to people attempting to configure their kernels. So this commit creates an rcutree.rcu_fanout_exact= boot parameter that rcutorture can use, and eliminates the original CONFIG_RCU_FANOUT_EXACT Kconfig parameter. Reported-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Pranith Kumar <bobby.prani@gmail.com>
|
#
0a0ba1c9 |
|
08-Mar-2015 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Adjust ->lock acquisition for tasks no longer migrating Tasks are no longer migrated away from a given rcu_node structure when all CPUs corresponding to that rcu_node structure have gone offline. This means that rcu_read_unlock_special() no longer needs to loop retrying rcu_node ->lock acquisition because the current task is guaranteed to stay put. This commit takes a small and paranoid step towards relying on this guarantee by placing a WARN_ON_ONCE() just after the early exit from the lock-acquisition loop. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
82efed06 |
|
07-Apr-2015 |
Patrick Daly <pdaly@codeaurora.org> |
rcu: Fix missing task information during rcu-preempt stall The first item list_for_each_entry_continue(alist) iterates over is alist->next, rather than alist itself. Consequently, rcu_print_detail_task_stall_rnp() skips the task referenced by gp_tasks. Use gp_tasks->prev as the argument to list_for_each_entry_continue() instead. Signed-off-by: Patrick Daly <pdaly@codeaurora.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
5ce035fb |
|
30-Mar-2015 |
Joe Perches <joe@perches.com> |
rcu: tree_plugin: Use bool function return values of true/false not 1/0 Use the normal return values for bool functions Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
3382adbc |
|
04-Mar-2015 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Eliminate a few CONFIG_RCU_NOCB_CPU_ALL #ifdefs This commit converts several CONFIG_RCU_NOCB_CPU_ALL #ifdefs to instead use IS_ENABLED(). This change should help avoid hiding code from compiler diagnostics. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
2927a689 |
|
04-Mar-2015 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Create an immutable rcu_data_p pointer to default rcu_data structure This commit creates an immutable rcu_data_p pointer that references rcu_preempt_data for TREE_PREEMPT_RCU builds and that references rcu_sched_data for TREE_RCU builds. This rcu_data_p pointer will enable more code to move from #ifdef to IS_ENABLED(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
b28a7c01 |
|
04-Mar-2015 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Tell the compiler that rcu_state_p is immutable This commit adds a "const" tag to the declarations of rcu_state_p, which should allow the compiler to generate better code and also to catch erroneous assignments to this variable. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
727b705b |
|
03-Mar-2015 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Eliminate a few RCU_BOOST #ifdefs in favor of IS_ENABLED() This commit removes a few RCU_BOOST #ifdefs, replacing them with IS_ENABLED()-protected return statements. This relies on the optimizer to remove any resulting dead code. There are several other RCU_BOOST #ifdefs, however these rely on some per-CPU variables that are available only under RCU_BOOST. These might be converted later, if the simplification proves to outweigh the increase in memory footprint. One hoped-for advantage is more easily locating compiler errors in obscure combinations of Kconfig parameters. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: <linux-rt-users@vger.kernel.org>
|
#
e63c887c |
|
03-Mar-2015 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Convert from rcu_preempt_state to *rcu_state_p It would be good to move more code from #ifdef to IS_ENABLED(), but that does not work if the body of the IS_ENABLED() "if" statement references a variable (such as rcu_preempt_state) that does not exist if the IS_ENABLED() Kconfig variable is not set. This commit therefore substitutes *rcu_state_p for all uses of rcu_preempt_state in kernel/rcu/tree_preempt.h, which should enable elimination of a few #ifdefs. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
7d0ae808 |
|
03-Mar-2015 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Convert ACCESS_ONCE() to READ_ONCE() and WRITE_ONCE() This commit moves from the old ACCESS_ONCE() API to the new READ_ONCE() and WRITE_ONCE() APIs. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> [ paulmck: Updated to include kernel/torture.c as suggested by Jason Low. ]
|
#
c1ad348b |
|
14-Apr-2015 |
Thomas Gleixner <tglx@linutronix.de> |
tick: Nohz: Rework next timer evaluation The evaluation of the next timer in the nohz code is based on jiffies while all the tick internals are nano seconds based. We have also to convert hrtimer nanoseconds to jiffies in the !highres case. That's just wrong and introduces interesting corner cases. Turn it around and convert the next timer wheel timer expiry and the rcu event to clock monotonic and base all calculations on nanoseconds. That identifies the case where no timer is pending clearly with an absolute expiry value of KTIME_MAX. Makes the code more readable and gets rid of the jiffies magic in the nohz code. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Marcelo Tosatti <mtosatti@redhat.com> Link: http://lkml.kernel.org/r/20150414203502.184198593@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
0aa04b05 |
|
23-Jan-2015 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Process offlining and onlining only at grace-period start Races between CPU hotplug and grace periods can be difficult to resolve, so the ->onoff_mutex is used to exclude the two events. Unfortunately, this means that it is impossible for an outgoing CPU to perform the last bits of its offlining from its last pass through the idle loop, because sleeplocks cannot be acquired in that context. This commit avoids these problems by buffering online and offline events in a new ->qsmaskinitnext field in the leaf rcu_node structures. When a grace period starts, the events accumulated in this mask are applied to the ->qsmaskinit field, and, if needed, up the rcu_node tree. The special case of all CPUs corresponding to a given leaf rcu_node structure being offline while there are still elements in that structure's ->blkd_tasks list is handled using a new ->wait_blkd_tasks field. In this case, propagating the offline bits up the tree is deferred until the beginning of the grace period after all of the tasks have exited their RCU read-side critical sections and removed themselves from the list, at which point the ->wait_blkd_tasks flag is cleared. If one of that leaf rcu_node structure's CPUs comes back online before the list empties, then the ->wait_blkd_tasks flag is simply cleared. This of course means that RCU's notion of which CPUs are offline can be out of date. This is OK because RCU need only wait on CPUs that were online at the time that the grace period started. In addition, RCU's force-quiescent-state actions will handle the case where a CPU goes offline after the grace period starts. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
cc99a310 |
|
23-Feb-2015 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Move rcu_report_unblock_qs_rnp() to common code The rcu_report_unblock_qs_rnp() function is invoked when the last task blocking the current grace period exits its outermost RCU read-side critical section. Previously, this was called only from rcu_read_unlock_special(), and was therefore defined only when CONFIG_RCU_PREEMPT=y. However, this function will be invoked even when CONFIG_RCU_PREEMPT=n once CPU-hotplug operations are processed only at the beginnings of RCU grace periods. The reason for this change is that the last task on a given leaf rcu_node structure's ->blkd_tasks list might well exit its RCU read-side critical section between the time that recent CPU-hotplug operations were applied and when the new grace period was initialized. This situation could result in RCU waiting forever on that leaf rcu_node structure, because if all that structure's CPUs were already offline, there would be no quiescent-state events to drive that structure's part of the grace period. This commit therefore moves rcu_report_unblock_qs_rnp() to common code that is built unconditionally so that the quiescent-state-forcing code can clean up after this situation, avoiding the grace-period stall. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
8eb74b2b |
|
13-Feb-2015 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Rework preemptible expedited bitmask handling Currently, the rcu_node tree ->expmask bitmasks are initially set to reflect the online CPUs. This is pointless, because only the CPUs preempted within RCU read-side critical sections by the preceding synchronize_sched_expedited() need to be tracked. This commit therefore instead sets up these bitmasks based on the state of the ->blkd_tasks lists. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
18c629ea |
|
19-Jan-2015 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Eliminate empty HOTPLUG_CPU ifdef Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
c8aead6a |
|
19-Jan-2015 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Simplify sync_rcu_preempt_exp_init() This commit eliminates a boolean and associated "if" statement by rearranging the code. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
a3bd2c09a |
|
21-Jan-2015 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Add boot-up check for non-default CONFIG_RCU_FANOUT_LEAF values Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
ab6f5bd6 |
|
21-Jan-2015 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Use IS_ENABLED() to simplify rcu_bootup_announce_oddness() This commit gets rid of some inline #ifdefs by replacing them with IS_ENABLED. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
d24209bb |
|
21-Jan-2015 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Improve diagnostics for blocked critical sections in irq If an RCU read-side critical section occurs within an interrupt handler or a softirq handler, it cannot have been preempted. Therefore, there is a check in rcu_read_unlock_special() checking for this error. However, when this check triggers, it lacks diagnostic information. This commit therefore moves rcu_read_unlock()'s lockdep annotation to follow the call to __rcu_read_unlock() and changes rcu_read_unlock_special()'s WARN_ON_ONCE() to an lockdep_rcu_suspicious() in order to locate where the offending RCU read-side critical section began. In addition, the value of the ->rcu_read_unlock_special field is printed. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
34404ca8 |
|
19-Jan-2015 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Move early-boot callbacks to no-CBs lists for no-CBs CPUs When a CPU is first determined to be a no-CBs CPUs, this commit causes any early boot callbacks to be moved to the no-CBs callback list, allowing them to be invoked. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
5871968d |
|
24-Feb-2015 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Tighten up affinity and check for sysidle If the RCU grace-period kthread invoking rcu_sysidle_check_cpu() happens to be running on the tick_do_timer_cpu initially, then rcu_bind_gp_kthread() won't bind it. This kthread might then migrate before invoking rcu_gp_fqs(), which will trigger the WARN_ON_ONCE() in rcu_sysidle_check_cpu(). This commit therefore makes rcu_bind_gp_kthread() do the binding even if the kthread is currently on the same CPU. Because this incurs added overhead, this commit also causes each RCU grace-period kthread to invoke rcu_bind_gp_kthread() once at boot rather than at the beginning of each grace period. And as long as rcu_bind_gp_kthread() is being modified, this commit eliminates its #ifdef. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
5afff48b |
|
18-Feb-2015 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Update from rcu_expedited variable to rcu_gp_is_expedited() This commit updates open-coded tests of the rcu_expedited variable to instead use rcu_gp_is_expedited(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
59f792d1 |
|
19-Jan-2015 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Refine diagnostics for lacking kthread for no-CBs callbacks Some diagnostics under CONFIG_PROVE_RCU in rcu_nocb_cpu_needs_barrier() assume that there can be no early-boot callbacks. This commit therefore qualifies the diagnostic with rcu_scheduler_fully_active to permit early boot callbacks to avoid this splat. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
ad853b48 |
|
13-Feb-2015 |
Tejun Heo <tj@kernel.org> |
rcu: use %*pb[l] to print bitmaps including cpumasks and nodemasks printk and friends can now format bitmaps using '%*pb[l]'. cpumask and nodemask also provide cpumask_pr_args() and nodemask_pr_args() respectively which can be used to generate the two printf arguments necessary to format the specified cpu/nodemask. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
c0135d07 |
|
22-Jan-2015 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Clear need_qs flag to prevent splat If the scheduling-clock interrupt sets the current tasks need_qs flag, but if the current CPU passes through a quiescent state in the meantime, then rcu_preempt_qs() will fail to clear the need_qs flag, which can fool RCU into thinking that additional rcu_read_unlock_special() processing is needed. This commit therefore clears the need_qs flag before checking for additional processing. For this problem to occur, we need rcu_preempt_data.passed_quiesce equal to true and current->rcu_read_unlock_special.b.need_qs also equal to true. This condition can occur as follows: 1. CPU 0 is aware of the current preemptible RCU grace period, but has not yet passed through a quiescent state. Among other things, this means that rcu_preempt_data.passed_quiesce is false. 2. Task A running on CPU 0 enters a preemptible RCU read-side critical section. 3. CPU 0 takes a scheduling-clock interrupt, which notices the RCU read-side critical section and the need for a quiescent state, and thus sets current->rcu_read_unlock_special.b.need_qs to true. 4. Task A is preempted, enters the scheduler, eventually invoking rcu_preempt_note_context_switch() which in turn invokes rcu_preempt_qs(). Because rcu_preempt_data.passed_quiesce is false, control enters the body of the "if" statement, which sets rcu_preempt_data.passed_quiesce to true. 5. At this point, CPU 0 takes an interrupt. The interrupt handler contains an RCU read-side critical section, and the rcu_read_unlock() notes that current->rcu_read_unlock_special is nonzero, and thus invokes rcu_read_unlock_special(). 6. Once in rcu_read_unlock_special(), the fact that current->rcu_read_unlock_special.b.need_qs is true becomes apparent, so rcu_read_unlock_special() invokes rcu_preempt_qs(). Recursively, given that we interrupted out of that same function in the preceding step. 7. Because rcu_preempt_data.passed_quiesce is now true, rcu_preempt_qs() does nothing, and simply returns. 8. Upon return to rcu_read_unlock_special(), it is noted that current->rcu_read_unlock_special is still nonzero (because the interrupted rcu_preempt_qs() had not yet gotten around to clearing current->rcu_read_unlock_special.b.need_qs). 9. Execution proceeds to the WARN_ON_ONCE(), which notes that we are in an interrupt handler and thus duly splats. The solution, as noted above, is to make rcu_read_unlock_special() clear out current->rcu_read_unlock_special.b.need_qs after calling rcu_preempt_qs(). The interrupted rcu_preempt_qs() will clear it again, but this is harmless. The worst that happens is that we clobber another attempt to set this field, but this is not a problem because we just got done reporting a quiescent state. Reported-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> [ paulmck: Fix embarrassing build bug noted by Sasha Levin. ] Tested-by: Sasha Levin <sasha.levin@oracle.com>
|
#
a94844b2 |
|
12-Dec-2014 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Optionally run grace-period kthreads at real-time priority Recent testing has shown that under heavy load, running RCU's grace-period kthreads at real-time priority can improve performance (according to 0day test robot) and reduce the incidence of RCU CPU stall warnings. However, most systems do just fine with the default non-realtime priorities for these kthreads, and it does not make sense to expose the entire user base to any risk stemming from this change, given that this change is of use only to a few users running extremely heavy workloads. Therefore, this commit allows users to specify realtime priorities for the grace-period kthreads, but leaves them running SCHED_OTHER by default. The realtime priority may be specified at build time via the RCU_KTHREAD_PRIO Kconfig parameter, or at boot time via the rcutree.kthread_prio parameter. Either way, 0 says to continue the default SCHED_OTHER behavior and values from 1-99 specify that priority of SCHED_FIFO behavior. Note that a value of 0 is not permitted when the RCU_BOOST Kconfig parameter is specified. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
917963d0 |
|
21-Nov-2014 |
Paul E. McKenney <paulmck@kernel.org> |
rcutorture: Check from beginning to end of grace period Currently, rcutorture's Reader Batch checks measure from the end of the previous grace period to the end of the current one. This commit tightens up these checks by measuring from the start and end of the same grace period. This involves adding rcu_batches_started() and friends corresponding to the existing rcu_batches_completed() and friends. We leave SRCU alone for the moment, as it does not yet have a way of tracking both ends of its grace periods. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
9733e4f0 |
|
21-Nov-2014 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Make _batches_completed() functions return unsigned long Long ago, the various ->completed fields were of type long, but now are unsigned long due to signed-integer-overflow concerns. However, the various _batches_completed() functions remained of type long, even though their only purpose in life is to return the corresponding ->completed field. This patch cleans this up by changing these functions' return types to unsigned long. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
e3663b10 |
|
08-Dec-2014 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Handle gpnum/completed wrap while dyntick idle Subtle race conditions can result if a CPU stays in dyntick-idle mode long enough for the ->gpnum and ->completed fields to wrap. For example, consider the following sequence of events: o CPU 1 encounters a quiescent state while waiting for grace period 5 to complete, but then enters dyntick-idle mode. o While CPU 1 is in dyntick-idle mode, the grace-period counters wrap around so that the grace period number is now 4. o Just as CPU 1 exits dyntick-idle mode, grace period 4 completes and grace period 5 begins. o The quiescent state that CPU 1 passed through during the old grace period 5 looks like it applies to the new grace period 5. Therefore, the new grace period 5 completes without CPU 1 having passed through a quiescent state. This could clearly be a fatal surprise to any long-running RCU read-side critical section that happened to be running on CPU 1 at the time. At one time, this was not a problem, given that it takes significant time for the grace-period counters to overflow even on 32-bit systems. However, with the advent of NO_HZ_FULL and SMP embedded systems, arbitrarily long idle periods are now becoming quite feasible. It is therefore time to close this race. This commit therefore avoids this race condition by having the quiescent-state forcing code detect when a CPU is falling too far behind, and setting a new rcu_data field ->gpwrap when this happens. Whenever this new ->gpwrap field is set, the CPU's ->gpnum and ->completed fields are known to be untrustworthy, and can be ignored, along with any associated quiescent states. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
fc908ed3 |
|
08-Dec-2014 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Make RCU_CPU_STALL_INFO include number of fqs attempts One way that an RCU CPU stall warning can happen is if the grace-period kthread is not allowed to execute. One proxy for this kthread's forward progress is the number of force-quiescent-state (fqs) scans. This commit therefore adds the number of fqs scans to the RCU CPU stall warning printouts when CONFIG_RCU_CPU_STALL_INFO=y. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
abaf3f9d |
|
18-Nov-2014 |
Lai Jiangshan <laijs@cn.fujitsu.com> |
rcu: Revert "Allow post-unlock reference for rt_mutex" to avoid priority-inversion The patch dfeb9765ce3c ("Allow post-unlock reference for rt_mutex") ensured rcu-boost safe even the rt_mutex has post-unlock reference. But rt_mutex allowing post-unlock reference is definitely a bug and it was fixed by the commit 27e35715df54 ("rtmutex: Plug slow unlock race"). This fix made the previous patch (dfeb9765ce3c) useless. And even worse, the priority-inversion introduced by the the previous patch still exists. rcu_read_unlock_special() { rt_mutex_unlock(&rnp->boost_mtx); /* Priority-Inversion: * the current task had been deboosted and preempted as a low * priority task immediately, it could wait long before reschedule in, * and the rcu-booster also waits on this low priority task and sleeps. * This priority-inversion makes rcu-booster can't work * as expected. */ complete(&rnp->boost_completion); } Just revert the patch to avoid it. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
5d0b0249 |
|
10-Nov-2014 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Don't bother affinitying rcub kthreads away from offline CPUs When rcu_boost_kthread_setaffinity() sees that all CPUs for a given rcu_node structure are now offline, it affinities the corresponding RCU-boost ("rcub") kthread away from those CPUs. This is pointless because the kthread cannot run on those offline CPUs in any case. This commit therefore removes this unneeded code. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
3e9f5c70 |
|
03-Nov-2014 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Don't spawn rcub kthreads on root rcu_node structure Now that offlining CPUs no longer moves leaf rcu_node structures' ->blkd_tasks lists to the root, there is no way for the root rcu_node structure's ->blkd_task list to be nonempty, unless the root node is also the sole leaf node. This commit therefore refrains from creating an rcub kthread for the root rcu_node structure unless it is also the sole leaf. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
96e92021 |
|
31-Oct-2014 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Make use of rcu_preempt_has_tasks() Given that there is now arcu_preempt_has_tasks() function that checks to see if the ->blkd_tasks list is non-empty, this commit makes use of it. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
d19fb8d1 |
|
31-Oct-2014 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Don't migrate blocked tasks even if all corresponding CPUs offline When the last CPU associated with a given leaf rcu_node structure goes offline, something must be done about the tasks queued on that rcu_node structure. Each of these tasks has been preempted on one of the leaf rcu_node structure's CPUs while in an RCU read-side critical section that it have not yet exited. Handling these tasks is the job of rcu_preempt_offline_tasks(), which migrates them from the leaf rcu_node structure to the root rcu_node structure. Unfortunately, this migration has to be done one task at a time because each tasks allegiance must be shifted from the original leaf rcu_node to the root, so that future attempts to deal with these tasks will acquire the root rcu_node structure's ->lock rather than that of the leaf. Worse yet, this migration must be done with interrupts disabled, which is not so good for realtime response, especially given that there is no bound on the number of tasks on a given rcu_node structure's list. (OK, OK, there is a bound, it is just that it is unreasonably large, especially on 64-bit systems.) This was not considered a problem back when rcu_preempt_offline_tasks() was first written because realtime systems were assumed not to do CPU-hotplug operations while real-time applications were running. This assumption has proved of dubious validity given that people are starting to run multiple realtime applications on a single SMP system and that it is common practice to offline then online a CPU before starting its real-time application in order to clear extraneous processing off of that CPU. So we now need CPU hotplug operations to avoid undue latencies. This commit therefore avoids migrating these tasks, instead letting them be dequeued one by one from the original leaf rcu_node structure by rcu_read_unlock_special(). This means that the clearing of bits from the upper-level rcu_node structures must be deferred until the last such task has been dequeued, because otherwise subsequent grace periods won't wait on them. This commit has the beneficial side effect of simplifying the CPU-hotplug code for TREE_PREEMPT_RCU, especially in CONFIG_RCU_BOOST builds. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
b6a932d1 |
|
31-Oct-2014 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Make rcu_read_unlock_special() propagate ->qsmaskinit bit clearing This commit causes rcu_read_unlock_special() to propagate ->qsmaskinit bit clearing up the rcu_node tree once a given rcu_node structure's blkd_tasks list becomes empty. This is the final commit in preparation for the rework of RCU priority boosting: It enables preempted tasks to remain queued on their rcu_node structure even after all of that rcu_node structure's CPUs have gone offline. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
8af3a5e7 |
|
31-Oct-2014 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Abstract rcu_cleanup_dead_rnp() from rcu_cleanup_dead_cpu() This commit abstracts rcu_cleanup_dead_rnp() from rcu_cleanup_dead_cpu() in preparation for the rework of RCU priority boosting. This new function will be invoked from rcu_read_unlock_special() in the reworked scheme, which is why rcu_cleanup_dead_rnp() assumes that the leaf rcu_node structure's ->qsmaskinit field has already been updated. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
74e871ac |
|
30-Oct-2014 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Rename "empty" to "empty_norm" in preparation for boost rework This commit undertakes a simple variable renaming to make way for some rework of RCU priority boosting. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
b08ea27d |
|
29-Oct-2014 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Protect rcu_boost() lockless accesses with ACCESS_ONCE() This commit prevents random compiler optimizations by applying ACCESS_ONCE() to lockless accesses. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
41050a00 |
|
18-Dec-2014 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Fix rcu_barrier() race that could result in too-short wait The rcu_barrier() no-callbacks check for no-CBs CPUs has race conditions. It checks a given CPU's lists of callbacks, and if all three no-CBs lists are empty, ignores that CPU. However, these three lists could potentially be empty even when callbacks are present if the check executed just as the callbacks were being moved from one list to another. It turns out that recent versions of rcutorture can spot this race. This commit plugs this hole by consolidating the per-list counts of no-CBs callbacks into a single count, which is incremented before the corresponding callback is posted and after it is invoked. Then rcu_barrier() checks this single count to reliably determine whether the corresponding CPU has no-CBs callbacks. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
8fa7845d |
|
22-Oct-2014 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Remove "cpu" argument to rcu_cleanup_after_idle() The "cpu" argument to rcu_cleanup_after_idle() is always the current CPU, so drop it. This moves the smp_processor_id() from the caller to rcu_cleanup_after_idle(), saving argument-passing overhead. Again, the anticipated cross-CPU uses of these functions has been replaced by NO_HZ_FULL. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Pranith Kumar <bobby.prani@gmail.com>
|
#
198bbf81 |
|
22-Oct-2014 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Remove "cpu" argument to rcu_prepare_for_idle() The "cpu" argument to rcu_prepare_for_idle() is always the current CPU, so drop it. This in turn allows two of the uses of "cpu" in this function to be replaced with a this_cpu_ptr() and the third by smp_processor_id(), replacing that of the call to rcu_prepare_for_idle(). Again, the anticipated cross-CPU uses of these functions has been replaced by NO_HZ_FULL. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Pranith Kumar <bobby.prani@gmail.com>
|
#
aa6da514 |
|
21-Oct-2014 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Remove "cpu" argument to rcu_needs_cpu() The "cpu" argument to rcu_needs_cpu() is always the current CPU, so drop it. This in turn allows the "cpu" argument to rcu_cpu_has_callbacks() to be removed, which allows the uses of "cpu" in both functions to be replaced with a this_cpu_ptr(). Again, the anticipated cross-CPU uses of these functions has been replaced by NO_HZ_FULL. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Pranith Kumar <bobby.prani@gmail.com>
|
#
38200cf2 |
|
21-Oct-2014 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Remove "cpu" argument to rcu_note_context_switch() The "cpu" argument to rcu_note_context_switch() is always the current CPU, so drop it. This in turn allows the "cpu" argument to rcu_preempt_note_context_switch() to be removed, which allows the sole use of "cpu" in both functions to be replaced with a this_cpu_ptr(). Again, the anticipated cross-CPU uses of these functions has been replaced by NO_HZ_FULL. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Pranith Kumar <bobby.prani@gmail.com>
|
#
86aea0e6 |
|
21-Oct-2014 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Remove "cpu" argument to rcu_preempt_check_callbacks() Because rcu_preempt_check_callbacks()'s argument is guaranteed to always be the current CPU, drop the argument and replace per_cpu() with __this_cpu_read(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Pranith Kumar <bobby.prani@gmail.com>
|
#
28ced795 |
|
02-Sep-2014 |
Christoph Lameter <cl@linux.com> |
rcu: Remove rcu_dynticks * parameters when they are always this_cpu_ptr(&rcu_dynticks) For some functions in kernel/rcu/tree* the rdtp parameter is always this_cpu_ptr(rdtp). Remove the parameter if constant and calculate the pointer in function. This will have the advantage that it is obvious that the address are all per cpu offsets and thus it will enable the use of this_cpu_ops in the future. Signed-off-by: Christoph Lameter <cl@linux.com> [ paulmck: Forward-ported to rcu/dev, whitespace adjustment. ] Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Pranith Kumar <bobby.prani@gmail.com>
|
#
bbe5d7a9 |
|
24-Oct-2014 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Fix for rcuo online-time-creation reorganization bug Commit 35ce7f29a44a (rcu: Create rcuo kthreads only for onlined CPUs) contains checks for the case where CPUs are brought online out of order, re-wiring the rcuo leader-follower relationships as needed. Unfortunately, this rewiring was broken. This apparently went undetected due to the tendency of systems to bring CPUs online in order. This commit nevertheless fixes the rewiring. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
28f6569a |
|
22-Sep-2014 |
Pranith Kumar <bobby.prani@gmail.com> |
rcu: Remove redundant TREE_PREEMPT_RCU config option PREEMPT_RCU and TREE_PREEMPT_RCU serve the same function after TINY_PREEMPT_RCU has been removed. This patch removes TREE_PREEMPT_RCU and uses PREEMPT_RCU config option in its place. Signed-off-by: Pranith Kumar <bobby.prani@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
21871d7e |
|
12-Sep-2014 |
Clark Williams <clark.williams@gmail.com> |
rcu: Unify boost and kthread priorities Rename CONFIG_RCU_BOOST_PRIO to CONFIG_RCU_KTHREAD_PRIO and use this value for both the per-CPU kthreads (rcuc/N) and the rcu boosting threads (rcub/n). Also, create the module_parameter rcutree.kthread_prio to be used on the kernel command line at boot to set a new value (rcutree.kthread_prio=N). Signed-off-by: Clark Williams <clark.williams@gmail.com> [ paulmck: Ported to rcu/dev, applied Paul Bolle and Peter Zijlstra feedback. ] Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
61cfd097 |
|
02-Sep-2014 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Move RCU_BOOST variable declarations, eliminating #ifdef There are some RCU_BOOST-specific per-CPU variable declarations that are needlessly defined under #ifdef in kernel/rcu/tree.c. This commit therefore moves these declarations into a pre-existing #ifdef in kernel/rcu/tree_plugin.h. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
0eafa468 |
|
28-Aug-2014 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Remove CONFIG_RCU_CPU_STALL_VERBOSE The CONFIG_RCU_CPU_STALL_VERBOSE Kconfig parameter causes preemptible RCU's CPU stall warnings to dump out any preempted tasks that are blocking the current RCU grace period. This information is useful, and the default has been CONFIG_RCU_CPU_STALL_VERBOSE=y for some years. It is therefore time for this commit to remove this Kconfig parameter, so that future kernel builds will always act as if CONFIG_RCU_CPU_STALL_VERBOSE=y. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
d7e29933 |
|
27-Oct-2014 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Make rcu_barrier() understand about missing rcuo kthreads Commit 35ce7f29a44a (rcu: Create rcuo kthreads only for onlined CPUs) avoids creating rcuo kthreads for CPUs that never come online. This fixes a bug in many instances of firmware: Instead of lying about their age, these systems instead lie about the number of CPUs that they have. Before commit 35ce7f29a44a, this could result in huge numbers of useless rcuo kthreads being created. It appears that experience indicates that I should have told the people suffering from this problem to fix their broken firmware, but I instead produced what turned out to be a partial fix. The missing piece supplied by this commit makes sure that rcu_barrier() knows not to post callbacks for no-CBs CPUs that have not yet come online, because otherwise rcu_barrier() will hang on systems having firmware that lies about the number of CPUs. It is tempting to simply have rcu_barrier() refuse to post a callback on any no-CBs CPU that does not have an rcuo kthread. This unfortunately does not work because rcu_barrier() is required to wait for all pending callbacks. It is therefore required to wait even for those callbacks that cannot possibly be invoked. Even if doing so hangs the system. Given that posting a callback to a no-CBs CPU that does not yet have an rcuo kthread can hang rcu_barrier(), It is tempting to report an error in this case. Unfortunately, this will result in false positives at boot time, when it is perfectly legal to post callbacks to the boot CPU before the scheduler has started, in other words, before it is legal to invoke rcu_barrier(). So this commit instead has rcu_barrier() avoid posting callbacks to CPUs having neither rcuo kthread nor pending callbacks, and has it complain bitterly if it finds CPUs having no rcuo kthread but some pending callbacks. And when rcu_barrier() does find CPUs having no rcuo kthread but pending callbacks, as noted earlier, it has no choice but to hang indefinitely. Reported-by: Yanko Kaneti <yaneti@declera.com> Reported-by: Jay Vosburgh <jay.vosburgh@canonical.com> Reported-by: Meelis Roos <mroos@linux.ee> Reported-by: Eric B Munson <emunson@akamai.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Eric B Munson <emunson@akamai.com> Tested-by: Jay Vosburgh <jay.vosburgh@canonical.com> Tested-by: Yanko Kaneti <yaneti@declera.com> Tested-by: Kevin Fenzi <kevin@scrye.com> Tested-by: Meelis Roos <mroos@linux.ee>
|
#
dd56af42 |
|
25-Aug-2014 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Eliminate deadlock between CPU hotplug and expedited grace periods Currently, the expedited grace-period primitives do get_online_cpus(). This greatly simplifies their implementation, but means that calls to them holding locks that are acquired by CPU-hotplug notifiers (to say nothing of calls to these primitives from CPU-hotplug notifiers) can deadlock. But this is starting to become inconvenient, as can be seen here: https://lkml.org/lkml/2014/8/5/754. The problem in this case is that some developers need to acquire a mutex from a CPU-hotplug notifier, but also need to hold it across a synchronize_rcu_expedited(). As noted above, this currently results in deadlock. This commit avoids the deadlock and retains the simplicity by creating a try_get_online_cpus(), which returns false if the get_online_cpus() reference count could not immediately be incremented. If a call to try_get_online_cpus() returns true, the expedited primitives operate as before. If a call returns false, the expedited primitives fall back to normal grace-period operations. This falling back of course results in increased grace-period latency, but only during times when CPU hotplug operations are actually in flight. The effect should therefore be negligible during normal operation. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Tested-by: Lan Tianyu <tianyu.lan@intel.com>
|
#
c847f142 |
|
12-Aug-2014 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Avoid misordering in nocb_leader_wait() The NOCB follower wakeup ordering depends on the store to the tail pointer happening before the wakeup. However, because atomic_long_add() does not return a value, it does not provide ordering guarantees, and the locking in wake_up() only guarantees that the store will happen before the unlock, which might be too late. Even though this is only a theoretical issue, this commit adds a smp_mb__after_atomic() after the final atomic_long_add() to provide the needed ordering guarantee. Reported-by: Amit Shah <amit.shah@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|
#
1772947bd |
|
12-Aug-2014 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Handle NOCB callbacks from irq-disabled idle code If an RCU callback is queued on a no-CBs CPU from idle code with irqs disabled, and if that CPU stays idle forever after, the callback will never be invoked. This commit therefore adds a check for this situation in ____call_rcu_nocb(), invoking the RCU core solely for the purpose of the ensuing return-to-idle transition. (If the CPU doesn't return to idle, the next scheduling-clock interrupt will fix things up.) Reported-by: Amit Shah <amit.shah@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|
#
39953dfd |
|
12-Aug-2014 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Avoid misordering in __call_rcu_nocb_enqueue() The NOCB leader wakeup ordering depends on the store to the header happening before the check for the leader already being awake. However, because atomic_long_add() does not return a value, it does not provide ordering guarantees, the incorrect comment in wake_nocb_leader() notwithstanding. This commit therefore adds a smp_mb__after_atomic() after the final atomic_long_add() to provide the needed ordering guarantee. Reported-by: Amit Shah <amit.shah@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|
#
663e1310 |
|
21-Jul-2014 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Don't track sysidle state if no nohz_full= CPUs If there are no nohz_full= CPUs, then there is currently no reason to track sysidle state. This commit therefore short-circuits this state tracking if !tick_nohz_full_enabled(). Note that these checks will need to be revisited if nohz_full= state can ever be changed at runtime. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Tested-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|
#
417e8d26 |
|
21-Jul-2014 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Eliminate redundant rcu_sysidle_state variable Now that we have rcu_state_p, which references rcu_preempt_state for TREE_PREEMPT_RCU and rcu_sched_state for TREE_RCU, we don't need a separate rcu_sysidle_state variable. This commit therefore eliminates rcu_preempt_state in favor of rcu_state_p. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Pranith Kumar <bobby.prani@gmail.com> Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Tested-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|
#
22c2f669 |
|
17-Jul-2014 |
Pranith Kumar <bobby.prani@gmail.com> |
rcu: Check for have_rcu_nocb_mask instead of rcu_nocb_mask If we configure a kernel with CONFIG_NOCB_CPU=y, CONFIG_RCU_NOCB_CPU_NONE=y and CONFIG_CPUMASK_OFFSTACK=n and do not pass in a rcu_nocb= boot parameter, the cpumask rcu_nocb_mask can be garbage instead of NULL. Hence this commit replaces checks for rcu_nocb_mask == NULL with a check for have_rcu_nocb_mask. Signed-off-by: Pranith Kumar <bobby.prani@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|
#
35ce7f29 |
|
11-Jul-2014 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Create rcuo kthreads only for onlined CPUs RCU currently uses for_each_possible_cpu() to spawn rcuo kthreads, which can result in more rcuo kthreads than one would expect, for example, derRichard reported 64 CPUs worth of rcuo kthreads on an 8-CPU image. This commit therefore creates rcuo kthreads only for those CPUs that actually come online. This was reported by derRichard on the OFTC IRC network. Reported-by: Richard Weinberger <richard@nod.at> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Tested-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|
#
9386c0b7 |
|
13-Jul-2014 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Rationalize kthread spawning Currently, RCU spawns kthreads from several different early_initcall() functions. Although this has served RCU well for quite some time, as more kthreads are added a more deterministic approach is required. This commit therefore causes all of RCU's early-boot kthreads to be spawned from a single early_initcall() function. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Tested-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|
#
f4aa84ba |
|
08-Jul-2014 |
Pranith Kumar <bobby.prani@gmail.com> |
rcu: Return false instead of 0 in rcu_nocb_adopt_orphan_cbs() Return false instead of 0 in rcu_nocb_adopt_orphan_cbs() as this has bool as return type. Signed-off-by: Pranith Kumar <bobby.prani@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|
#
4afc7e26 |
|
08-Jul-2014 |
Pranith Kumar <bobby.prani@gmail.com> |
rcu: Use false for return in __call_rcu_nocb() Return false instead of 0 in __call_rcu_nocb() as this has bool as return type. Signed-off-by: Pranith Kumar <bobby.prani@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|
#
0a9e1e11 |
|
08-Jul-2014 |
Pranith Kumar <bobby.prani@gmail.com> |
rcu: Use true/false for return in rcu_nocb_adopt_orphan_cbs() Return true/false in rcu_nocb_adopt_orphan_cbs() instead of 0/1 as this function has return type of bool. Signed-off-by: Pranith Kumar <bobby.prani@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|
#
c271d3a9 |
|
08-Jul-2014 |
Pranith Kumar <bobby.prani@gmail.com> |
rcu: Use true/false for return in __call_rcu_nocb() Return true/false instead of 0/1 in __call_rcu_nocb() as this returns a bool type. Signed-off-by: Pranith Kumar <bobby.prani@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|
#
949cccdb |
|
25-Jul-2014 |
Pranith Kumar <bobby.prani@gmail.com> |
rcu: Check the return value of zalloc_cpumask_var() This commit checks the return value of the zalloc_cpumask_var() used for allocating cpumask for rcu_nocb_mask. Signed-off-by: Pranith Kumar <bobby.prani@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|
#
f4579fc5 |
|
25-Jul-2014 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Fix attempt to avoid unsolicited offloading of callbacks Commit b58cc46c5f6b (rcu: Don't offload callbacks unless specifically requested) failed to adjust the callback lists of the CPUs that are known to be no-CBs CPUs only because they are also nohz_full= CPUs. This failure can result in callbacks that are posted during early boot getting stranded on nxtlist for CPUs whose no-CBs property becomes apparent late, and there can also be spurious warnings about offline CPUs posting callbacks. This commit fixes these problems by adding an early-boot rcu_init_nohz() that properly initializes the no-CBs CPUs. Note that kernels built with CONFIG_RCU_NOCB_CPU_ALL=y or with CONFIG_RCU_NOCB_CPU=n do not exhibit this bug. Neither do kernels booted without the nohz_full= boot parameter. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Pranith Kumar <bobby.prani@gmail.com> Tested-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|
#
284a8c93 |
|
14-Aug-2014 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Per-CPU operation cleanups to rcu_*_qs() functions The rcu_bh_qs(), rcu_preempt_qs(), and rcu_sched_qs() functions use old-style per-CPU variable access and write to ->passed_quiesce even if it is already set. This commit therefore updates to use the new-style per-CPU variable access functions and avoids the spurious writes. This commit also eliminates the "cpu" argument to these functions because they are always invoked on the indicated CPU. Reported-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
1d082fd0 |
|
14-Aug-2014 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Remove local_irq_disable() in rcu_preempt_note_context_switch() The rcu_preempt_note_context_switch() function is on a scheduling fast path, so it would be good to avoid disabling irqs. The reason that irqs are disabled is to synchronize process-level and irq-handler access to the task_struct ->rcu_read_unlock_special bitmask. This commit therefore makes ->rcu_read_unlock_special instead be a union of bools with a short allowing single-access checks in RCU's __rcu_read_unlock(). This results in the process-level and irq-handler accesses being simple loads and stores, so that irqs need no longer be disabled. This commit therefore removes the irq disabling from rcu_preempt_note_context_switch(). Reported-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
176f8f7a |
|
04-Aug-2014 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Make TASKS_RCU handle nohz_full= CPUs Currently TASKS_RCU would ignore a CPU running a task in nohz_full= usermode execution. There would be neither a context switch nor a scheduling-clock interrupt to tell TASKS_RCU that the task in question had passed through a quiescent state. The grace period would therefore extend indefinitely. This commit therefore makes RCU's dyntick-idle subsystem record the task_struct structure of the task that is running in dyntick-idle mode on each CPU. The TASKS_RCU grace period can then access this information and record a quiescent state on behalf of any CPU running in dyntick-idle usermode. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
bde6c3aa |
|
01-Jul-2014 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Provide cond_resched_rcu_qs() to force quiescent states in long loops RCU-tasks requires the occasional voluntary context switch from CPU-bound in-kernel tasks. In some cases, this requires instrumenting cond_resched(). However, there is some reluctance to countenance unconditionally instrumenting cond_resched() (see http://lwn.net/Articles/603252/), so this commit creates a separate cond_resched_rcu_qs() that may be used in place of cond_resched() in locations prone to long-duration in-kernel looping. This commit currently instruments only RCU-tasks. Future possibilities include also instrumenting RCU, RCU-bh, and RCU-sched in order to reduce IPI usage. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
73a860cd |
|
14-Aug-2014 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Replace flush_signals() with WARN_ON(signal_pending()) Currently, when RCU awakens from a wait_event_interruptible() that might have awakened prematurely, it does a flush_signals(). This is done on the off-chance that someone figured out how to deliver a signal to a kthread, which is supposed to be impossible. Given that this is supposed to be impossible, this commit changes the flush_signals() calls into WARN_ON(signal_pending()). Reported-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
9fdd3bc9 |
|
29-Jul-2014 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Break more call_rcu() deadlock involving scheduler and perf Commit 96d3fd0d315a9 (rcu: Break call_rcu() deadlock involving scheduler and perf) covered the case where __call_rcu_nocb_enqueue() needs to wake the rcuo kthread due to the queue being initially empty, but did not do anything for the case where the queue was overflowing. This commit therefore also defers wakeup for the overflow case. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
d0bc90fd |
|
08-Jul-2014 |
Pranith Kumar <bobby.prani@gmail.com> |
rcu: Return bool type for rcu_try_advance_all_cbs() Return a bool type instead of 0 in rcu_try_advance_all_cbs(). Signed-off-by: Pranith Kumar <bobby.prani@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
bf33eb1a |
|
08-Jul-2014 |
Pranith Kumar <bobby.prani@gmail.com> |
rcu: Fix sparse warning about rcu_batches_completed_preempt() being non-static fix sparse warning about rcu_batches_completed_preempt() being non-static by marking it as static Signed-off-by: Pranith Kumar <bobby.prani@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
4de376a1 |
|
08-Jul-2014 |
Pranith Kumar <bobby.prani@gmail.com> |
rcu: Remove remaining read-modify-write ACCESS_ONCE() calls Change the remaining uses of ACCESS_ONCE() so that each ACCESS_ONCE() either does a load or a store, but not both. Signed-off-by: Pranith Kumar <bobby.prani@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
11ed7f93 |
|
27-Aug-2014 |
Pranith Kumar <bobby.prani@gmail.com> |
rcu: Make nocb leader kthreads process pending callbacks after spawning The nocb callbacks generated before the nocb kthreads are spawned are enqueued in the nocb queue for later processing. Commit fbce7497ee5af ("rcu: Parallelize and economize NOCB kthread wakeups") introduced nocb leader kthreads which checked the nocb_leader_wake flag to see if there were any such pending callbacks. A case was reported in which newly spawned leader kthreads were not processing the pending callbacks as this flag was not set, which led to a boot hang. The following commit ensures that the newly spawned nocb kthreads process the pending callbacks by allowing the kthreads to run immediately after spawning instead of waiting. This is done by inverting the logic of nocb_leader_wake tests to nocb_leader_sleep which allows us to use the default initialization of this flag to 0 to let the kthreads run. Reported-by: Amit Shah <amit.shah@redhat.com> Signed-off-by: Pranith Kumar <bobby.prani@gmail.com> Link: http://www.spinics.net/lists/kernel/msg1802899.html [ paulmck: Backported to v3.17-rc2. ] Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Amit Shah <amit.shah@redhat.com>
|
#
187497fa |
|
16-Jul-2014 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Allow for NULL tick_nohz_full_mask when nohz_full= missing If there isn't a nohz_full= kernel parameter specified, then tick_nohz_full_mask can legitimately be NULL. This can cause problems when RCU's boot code tries to cpumask_or() this value into rcu_nocb_mask. In addition, if NO_HZ_FULL_ALL=y, there is no point in doing the cpumask_or() in the first place because this will cause RCU_NOCB_CPU_ALL=y, which in turn will have all bits already set in rcu_nocb_mask. This commit therefore avoids the cpumask_or() if NO_HZ_FULL_ALL=y and checks for !tick_nohz_full_running otherwise, this latter check catching cases when there was no nohz_full= kernel parameter specified. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
b41d1b92 |
|
11-Jun-2014 |
Pranith Kumar <bobby.prani@gmail.com> |
rcu: Fix a sparse warning in rcu_report_unblock_qs_rnp() This commit annotates rcu_report_unblock_qs_rnp() in order to fix the following sparse warning: kernel/rcu/tree_plugin.h:990:13: warning: context imbalance in 'rcu_report_unblock_qs_rnp' - unexpected unlock Signed-off-by: Pranith Kumar <bobby.prani@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
|
#
615e41c6 |
|
11-Jun-2014 |
Pranith Kumar <bobby.prani@gmail.com> |
rcu: Fix a sparse warning in rcu_initiate_boost() This commit annotates rcu_initiate_boost() fixes the following sparse warning: kernel/rcu/tree_plugin.h:1494:13: warning: context imbalance in 'rcu_initiate_boost' - unexpected unlock Signed-off-by: Pranith Kumar <bobby.prani@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
|
#
c0f489d2 |
|
04-Jun-2014 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Bind grace-period kthreads to non-NO_HZ_FULL CPUs Binding the grace-period kthreads to the timekeeping CPU resulted in significant performance decreases for some workloads. For more detail, see: https://lkml.org/lkml/2014/6/3/395 for benchmark numbers https://lkml.org/lkml/2014/6/4/218 for CPU statistics It turns out that it is necessary to bind the grace-period kthreads to the timekeeping CPU only when all but CPU 0 is a nohz_full CPU on the one hand or if CONFIG_NO_HZ_FULL_SYSIDLE=y on the other. In other cases, it suffices to bind the grace-period kthreads to the set of non-nohz_full CPUs. This commit therefore creates a tick_nohz_not_full_mask that is the complement of tick_nohz_full_mask, and then binds the grace-period kthread to the set of CPUs indicated by this new mask, which covers the CONFIG_NO_HZ_FULL_SYSIDLE=n case. The CONFIG_NO_HZ_FULL_SYSIDLE=y case still binds the grace-period kthreads to the timekeeping CPU. This commit also includes the tick_nohz_full_enabled() check suggested by Frederic Weisbecker. Reported-by: Jet Chen <jet.chen@intel.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> [ paulmck: Created housekeeping_affine() and housekeeping_mask per fweisbec feedback. ]
|
#
abaa93d9 |
|
12-Jun-2014 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Simplify priority boosting by putting rt_mutex in rcu_node RCU priority boosting currently checks for boosting via a pointer in task_struct. However, this is not needed: As Oleg noted, if the rt_mutex is placed in the rcu_node instead of on the booster's stack, the boostee can simply check it see if it owns the lock. This commit makes this change, shrinking task_struct by one pointer and the kernel by thirteen lines. Suggested-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
dfeb9765 |
|
10-Jun-2014 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Allow post-unlock reference for rt_mutex The current approach to RCU priority boosting uses an rt_mutex strictly for its priority-boosting side effects. The rt_mutex_init_proxy_locked() function is used by the booster to initialize the lock as held by the boostee. The booster then uses rt_mutex_lock() to acquire this rt_mutex, which priority-boosts the boostee. When the boostee reaches the end of its outermost RCU read-side critical section, it checks a field in its task structure to see whether it has been boosted, and, if so, uses rt_mutex_unlock() to release the rt_mutex. The booster can then go on to boost the next task that is blocking the current RCU grace period. But reasonable implementations of rt_mutex_unlock() might result in the boostee referencing the rt_mutex's data after releasing it. But the booster might have re-initialized the rt_mutex between the time that the boostee released it and the time that it later referenced it. This is clearly asking for trouble, so this commit introduces a completion that forces the booster to wait until the boostee has completely finished with the rt_mutex, thus avoiding the case where the booster is re-initializing the rt_mutex before the last boostee's last reference to that rt_mutex. This of course does introduce some overhead, but the priority-boosting code paths are miles from any possible fastpath, and the overhead of executing the completion will normally be quite small compared to the overhead of priority boosting and deboosting, so this should be OK. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
4da117cf |
|
09-May-2014 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Remove redundant ACCESS_ONCE() from tick_do_timer_cpu In kernels built with CONFIG_NO_HZ_FULL, tick_do_timer_cpu is constant once boot completes. Thus, there is no need to wrap it in ACCESS_ONCE() in code that is built only when CONFIG_NO_HZ_FULL. This commit therefore removes the redundant ACCESS_ONCE(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
|
#
b58cc46c |
|
02-Jul-2014 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Don't offload callbacks unless specifically requested Enabling NO_HZ_FULL currently has the side effect of enabling callback offloading on all CPUs. This results in lots of additional rcuo kthreads, and can also increase context switching and wakeups, even in cases where callback offloading is neither needed nor particularly desirable. This commit therefore enables callback offloading on a given CPU only if specifically requested at build time or boot time, or if that CPU has been specifically designated (again, either at build time or boot time) as a nohz_full CPU. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
fbce7497 |
|
24-Jun-2014 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Parallelize and economize NOCB kthread wakeups An 80-CPU system with a context-switch-heavy workload can require so many NOCB kthread wakeups that the RCU grace-period kthreads spend several tens of percent of a CPU just awakening things. This clearly will not scale well: If you add enough CPUs, the RCU grace-period kthreads would get behind, increasing grace-period latency. To avoid this problem, this commit divides the NOCB kthreads into leaders and followers, where the grace-period kthreads awaken the leaders each of whom in turn awakens its followers. By default, the number of groups of kthreads is the square root of the number of CPUs, but this default may be overridden using the rcutree.rcu_nocb_leader_stride boot parameter. This reduces the number of wakeups done per grace period by the RCU grace-period kthread by the square root of the number of CPUs, but of course by shifting those wakeups to the leaders. In addition, because the leaders do grace periods on behalf of their respective followers, the number of wakeups of the followers decreases by up to a factor of two. Instead of being awakened once when new callbacks arrive and again at the end of the grace period, the followers are awakened only at the end of the grace period. For a numerical example, in a 4096-CPU system, the grace-period kthread would awaken 64 leaders, each of which would awaken its 63 followers at the end of the grace period. This compares favorably with the 79 wakeups for the grace-period kthread on an 80-CPU system. Reported-by: Rik van Riel <riel@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
4a81e832 |
|
20-Jun-2014 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Reduce overhead of cond_resched() checks for RCU Commit ac1bea85781e (Make cond_resched() report RCU quiescent states) fixed a problem where a CPU looping in the kernel with but one runnable task would give RCU CPU stall warnings, even if the in-kernel loop contained cond_resched() calls. Unfortunately, in so doing, it introduced performance regressions in Anton Blanchard's will-it-scale "open1" test. The problem appears to be not so much the increased cond_resched() path length as an increase in the rate at which grace periods complete, which increased per-update grace-period overhead. This commit takes a different approach to fixing this bug, mainly by moving the RCU-visible quiescent state from cond_resched() to rcu_note_context_switch(), and by further reducing the check to a simple non-zero test of a single per-CPU variable. However, this approach requires that the force-quiescent-state processing send resched IPIs to the offending CPUs. These will be sent only once the grace period has reached an age specified by the boot/sysfs parameter rcutree.jiffies_till_sched_qs, or once the grace period reaches an age halfway to the point at which RCU CPU stall warnings will be emitted, whichever comes first. Reported-by: Dave Hansen <dave.hansen@intel.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Christoph Lameter <cl@gentwo.org> Cc: Mike Galbraith <umgwanakikbuti@gmail.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org> [ paulmck: Made rcu_momentary_dyntick_idle() as suggested by the ktest build robot. Also fixed smp_mb() comment as noted by Oleg Nesterov. ] Merge with e552592e (Reduce overhead of cond_resched() checks for RCU) Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
e534165b |
|
23-Mar-2014 |
Uma Sharma <uma.sharma523@gmail.com> |
rcu: Variable name changed in tree_plugin.h and used in tree.c The variable and struct both having the name "rcu_state" confuses sparse in some situations, so this commit changes the variable to "rcu_state_p" in order to avoid this confusion. This also makes things easier for human readers. Signed-off-by: Uma Sharma <uma.sharma523@gmail.com> [ paulmck: Changed the declaration and several additional uses. ] Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
fa07a58f |
|
15-Apr-2014 |
Christoph Lameter <cl@linux.com> |
rcu: Replace __this_cpu_ptr() uses with raw_cpu_ptr() __this_cpu_ptr is being phased out. One special case is increment_cpu_stall_ticks(). A per cpu variable is incremented so use raw_cpu_inc(). Cc: Dipankar Sarma <dipankar@in.ibm.com> Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
|
#
becb41bf |
|
07-Apr-2014 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Make large and small sysidle systems use same state machine Currently, small systems move back into RCU_SYSIDLE_NOT from RCU_SYSIDLE_SHORT and large systems do not. This works because moving aggressively to RCU_SYSIDLE_NOT affects only performance, not correctness, and on small systems, the performance impact should be negligible. That said, this difference does make RCU a bit more complex, and RCU does not seem to be suffering from any lack of complexity. This commit therefore adjusts small-system operation to match that of large systems, so that the state never moves back to RCU_SYSIDLE_NOT from RCU_SYSIDLE_SHORT. Reported-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
|
#
5057f55e5 |
|
01-Apr-2014 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Bind RCU grace-period kthreads if NO_HZ_FULL Currently, RCU binds the grace-period kthreads to the timekeeping CPU only if CONFIG_NO_HZ_FULL_SYSIDLE=y. This means that these kthreads must be bound manually when CONFIG_NO_HZ_FULL_SYSIDLE=n and CONFIG_NO_HZ_FULL=y: Otherwise, these kthreads will induce OS jitter on random CPUs. Given that we are trying to reduce the amount of manual tweaking required to make CONFIG_NO_HZ_FULL=y work nicely, this commit makes this binding happen when CONFIG_NO_HZ_FULL=y, even in cases where CONFIG_NO_HZ_FULL_SYSIDLE=n. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
|
#
a381d757 |
|
19-Mar-2014 |
Andreea-Cristina Bernat <bernat.ada@gmail.com> |
rcu: Merge rcu_sched_force_quiescent_state() with rcu_force_quiescent_state() This patch merges the function rcu_force_quiescent_state() with rcu_sched_force_quiescent_state(), using the rcu_state pointer. Firstly, the rcu_sched_force_quiescent_state() function is deleted from the file kernel/rcu/tree.c. Also, the rcu_force_quiescent_state() function that was calling force_quiescent_state with the argument rcu_preempt_state pointer was deleted as well. The new function that combines the old ones uses the rcu_state pointer and is located after rcu_batches_completed_bh() in kernel/rcu/tree.c. Signed-off-by: Andreea-Cristina Bernat <bernat.ada@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
|
#
495aa969 |
|
18-Mar-2014 |
Andreea-Cristina Bernat <bernat.ada@gmail.com> |
rcu: Consolidate kfree_call_rcu() to use rcu_state pointer kfree_call_rcu is defined two times. When defined under CONFIG_TREE_PREEMPT_RCU, it uses rcu_preempt_state. Otherwise, it uses rcu_sched_state. This patch uses the rcu_state_pointer to combine the two definitions into one. The resulting function is placed after the closing of the preprocessor conditional CONFIG_TREE_PREEMPT_RCU. Signed-off-by: Andreea-Cristina Bernat <bernat.ada@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
|
#
48a7639c |
|
11-Mar-2014 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Make callers awaken grace-period kthread The rcu_start_gp_advanced() function currently uses irq_work_queue() to defer wakeups of the RCU grace-period kthread. This deferring is necessary to avoid RCU-scheduler deadlocks involving the rcu_node structure's lock, meaning that RCU cannot call any of the scheduler's wake-up functions while holding one of these locks. Unfortunately, the second and subsequent calls to irq_work_queue() are ignored, and the first call will be ignored (aside from queuing the work item) if the scheduler-clock tick is turned off. This is OK for many uses, especially those where irq_work_queue() is called from an interrupt or softirq handler, because in those cases the scheduler-clock-tick state will be re-evaluated, which will turn the scheduler-clock tick back on. On the next tick, any deferred work will then be processed. However, this strategy does not always work for RCU, which can be invoked at process level from idle CPUs. In this case, the tick might never be turned back on, indefinitely defering a grace-period start request. Note that the RCU CPU stall detector cannot see this condition, because there is no RCU grace period in progress. Therefore, we can (and do!) see long tens-of-seconds stalls in grace-period handling. In theory, we could see a full grace-period hang, but rcutorture testing to date has seen only the tens-of-seconds stalls. Event tracing demonstrates that irq_work_queue() is being called repeatedly to no effect during these stalls: The "newreq" event appears repeatedly from a task that is not one of the grace-period kthreads. In theory, irq_work_queue() might be fixed to avoid this sort of issue, but RCU's requirements are unusual and it is quite straightforward to pass wake-up responsibility up through RCU's call chain, so that the wakeup happens when the offending locks are released. This commit therefore makes this change. The rcu_start_gp_advanced(), rcu_start_future_gp(), rcu_accelerate_cbs(), rcu_advance_cbs(), __note_gp_changes(), and rcu_start_gp() functions now return a boolean which indicates when a wake-up is needed. A new rcu_gp_kthread_wake() does the wakeup when it is necessary and safe to do so: No self-wakes, no wake-ups if the ->gp_flags field indicates there is no need (as in someone else did the wake-up before we got around to it), and no wake-ups before the grace-period kthread has been created. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
|
#
365187fb |
|
10-Mar-2014 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Update cpu_needs_another_gp() for futures from non-NOCB CPUs In the old days, the only source of requests for future grace periods was NOCB CPUs. This has changed: CPUs routinely post requests for future grace periods in order to promote power efficiency and reduce OS jitter with minimal impact on grace-period latency. This commit therefore updates cpu_needs_another_gp() to invoke rcu_future_needs_gp() instead of rcu_nocb_needs_gp(). The latter is no longer used, so is now removed. This commit also adds tracing for the irq_work_queue() wakeup case. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
|
#
24342c96 |
|
24-Feb-2014 |
Liu Ping Fan <kernelfans@gmail.com> |
rcu: Fix incorrect notes for code Signed-off-by: Liu Ping Fan <kernelfans@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
|
#
4e857c58 |
|
17-Mar-2014 |
Peter Zijlstra <peterz@infradead.org> |
arch: Mass conversion of smp_mb__*() Mostly scripted conversion of the smp_mb__* barriers. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Link: http://lkml.kernel.org/n/tip-55dhyhocezdw1dg7u19hmh1u@git.kernel.org Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: linux-arch@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
f1f399d1 |
|
17-Nov-2013 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Optimize RCU_FAST_NO_HZ for RCU_NOCB_CPU_ALL If CONFIG_RCU_NOCB_CPU_ALL=y, then no CPU will ever have RCU callbacks because these callbacks will instead be handled by the rcuo kthreads. However, the current version of RCU_FAST_NO_HZ nevertheless checks for RCU callbacks. This commit therefore creates static inline implementations of rcu_prepare_for_idle() and rcu_cleanup_after_idle() that are no-ops when CONFIG_RCU_NOCB_CPU_ALL=y. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
|
#
ffa83fb5 |
|
17-Nov-2013 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Optimize rcu_needs_cpu() for RCU_NOCB_CPU_ALL If CONFIG_RCU_NOCB_CPU_ALL=y, then rcu_needs_cpu() will always return false, however, the current version nevertheless checks for RCU callbacks. This commit therefore creates a static inline implementation of rcu_needs_cpu() that unconditionally returns false when CONFIG_RCU_NOCB_CPU_ALL=y. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
|
#
2f33b512 |
|
17-Nov-2013 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Optimize rcu_is_nocb_cpu() for RCU_NOCB_CPU_ALL If CONFIG_RCU_NOCB_CPU_ALL=y, then rcu_is_nocb_cpu() will always return true, however, the current version nevertheless checks rcu_nocb_mask. This commit therefore creates a static inline implementation of rcu_is_nocb_cpu() that unconditionally returns true when CONFIG_RCU_NOCB_CPU_ALL=y. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
|
#
52e2bb95 |
|
09-Feb-2014 |
Paul Bolle <pebolle@tiscali.nl> |
rcu: Disambiguate CONFIG_RCU_NOCB_CPUs This commit fixes a grammar issue in the rcu_nohz_full_cpu() comment header, so that it is clear that the plural is CPUs not Kconfig options. Signed-off-by: Paul Bolle <pebolle@tiscali.nl> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
|
#
87de1cfd |
|
03-Dec-2013 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Stop tracking FSF's postal address All of the RCU source files have the usual GPL header, which contains a long-obsolete postal address for FSF. To avoid the need to track the FSF office's movements, this commit substitutes the URL where GPL may be found. Reported-by: Greg KH <gregkh@linuxfoundation.org> Reported-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
|
#
6303b9c8 |
|
11-Dec-2013 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Apply smp_mb__after_unlock_lock() to preserve grace periods RCU must ensure that there is the equivalent of a full memory barrier between any memory access preceding grace period and any memory access following that same grace period, regardless of which CPU(s) happen to execute the two memory accesses. Therefore, downgrading UNLOCK+LOCK to no longer imply a full memory barrier requires some adjustments to RCU. This commit therefore adds smp_mb__after_unlock_lock() invocations as needed after the RCU lock acquisitions that need to be part of a full-memory-barrier UNLOCK+LOCK. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: <linux-arch@vger.kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/1386799151-2219-7-git-send-email-paulmck@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
a096932f |
|
08-Nov-2013 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Don't activate RCU core on NO_HZ_FULL CPUs Whenever a CPU receives a scheduling-clock interrupt, RCU checks to see if the RCU core needs anything from this CPU. If so, RCU raises RCU_SOFTIRQ to carry out any needed processing. This approach has worked well historically, but it is undesirable on NO_HZ_FULL CPUs. Such CPUs are expected to spend almost all of their time in userspace, so that scheduling-clock interrupts can be disabled while there is only one runnable task on the CPU in question. Unfortunately, raising any softirq has the potential to wake up ksoftirqd, which would provide the second runnable task on that CPU, preventing disabling of scheduling-clock interrupts. What is needed instead is for RCU to leave NO_HZ_FULL CPUs alone, relying on the grace-period kthreads' quiescent-state forcing to do any needed RCU work on behalf of those CPUs. This commit therefore refrains from raising RCU_SOFTIRQ on any NO_HZ_FULL CPUs during any grace periods that have been in effect for less than one second. The one-second limit handles the case where an inappropriate workload is running on a NO_HZ_FULL CPU that features lots of scheduling-clock interrupts, but no idle or userspace time. Reported-by: Mike Galbraith <bitbucket@online.de> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Mike Galbraith <bitbucket@online.de> Toasted-by: Frederic Weisbecker <fweisbec@gmail.com>
|
#
79a62f95 |
|
30-Oct-2013 |
Lai Jiangshan <laijs@cn.fujitsu.com> |
rcu: Warn on allegedly impossible rcu_read_unlock_special() from irq After commit #10f39bb1b2c1 (rcu: protect __rcu_read_unlock() against scheduler-using irq handlers), it is no longer possible to enter the main body of rcu_read_lock_special() from an NMI, interrupt, or softirq handler. In theory, this implies that the check for "in_irq() || in_serving_softirq()" must always fail, so that in theory this check could be removed entirely. In practice, this commit wraps this condition with a WARN_ON_ONCE(). If this warning never triggers, then the condition will be removed entirely. [ paulmck: And one way of triggering the WARN_ON() is if a scheduling clock interrupt occurs in an RCU read-side critical section, setting RCU_READ_UNLOCK_NEED_QS, which is handled by rcu_read_unlock_special(). Updated this commit to return if only that bit was set. ] Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
96d3fd0d |
|
04-Oct-2013 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Break call_rcu() deadlock involving scheduler and perf Dave Jones got the following lockdep splat: > ====================================================== > [ INFO: possible circular locking dependency detected ] > 3.12.0-rc3+ #92 Not tainted > ------------------------------------------------------- > trinity-child2/15191 is trying to acquire lock: > (&rdp->nocb_wq){......}, at: [<ffffffff8108ff43>] __wake_up+0x23/0x50 > > but task is already holding lock: > (&ctx->lock){-.-...}, at: [<ffffffff81154c19>] perf_event_exit_task+0x109/0x230 > > which lock already depends on the new lock. > > > the existing dependency chain (in reverse order) is: > > -> #3 (&ctx->lock){-.-...}: > [<ffffffff810cc243>] lock_acquire+0x93/0x200 > [<ffffffff81733f90>] _raw_spin_lock+0x40/0x80 > [<ffffffff811500ff>] __perf_event_task_sched_out+0x2df/0x5e0 > [<ffffffff81091b83>] perf_event_task_sched_out+0x93/0xa0 > [<ffffffff81732052>] __schedule+0x1d2/0xa20 > [<ffffffff81732f30>] preempt_schedule_irq+0x50/0xb0 > [<ffffffff817352b6>] retint_kernel+0x26/0x30 > [<ffffffff813eed04>] tty_flip_buffer_push+0x34/0x50 > [<ffffffff813f0504>] pty_write+0x54/0x60 > [<ffffffff813e900d>] n_tty_write+0x32d/0x4e0 > [<ffffffff813e5838>] tty_write+0x158/0x2d0 > [<ffffffff811c4850>] vfs_write+0xc0/0x1f0 > [<ffffffff811c52cc>] SyS_write+0x4c/0xa0 > [<ffffffff8173d4e4>] tracesys+0xdd/0xe2 > > -> #2 (&rq->lock){-.-.-.}: > [<ffffffff810cc243>] lock_acquire+0x93/0x200 > [<ffffffff81733f90>] _raw_spin_lock+0x40/0x80 > [<ffffffff810980b2>] wake_up_new_task+0xc2/0x2e0 > [<ffffffff81054336>] do_fork+0x126/0x460 > [<ffffffff81054696>] kernel_thread+0x26/0x30 > [<ffffffff8171ff93>] rest_init+0x23/0x140 > [<ffffffff81ee1e4b>] start_kernel+0x3f6/0x403 > [<ffffffff81ee1571>] x86_64_start_reservations+0x2a/0x2c > [<ffffffff81ee1664>] x86_64_start_kernel+0xf1/0xf4 > > -> #1 (&p->pi_lock){-.-.-.}: > [<ffffffff810cc243>] lock_acquire+0x93/0x200 > [<ffffffff8173419b>] _raw_spin_lock_irqsave+0x4b/0x90 > [<ffffffff810979d1>] try_to_wake_up+0x31/0x350 > [<ffffffff81097d62>] default_wake_function+0x12/0x20 > [<ffffffff81084af8>] autoremove_wake_function+0x18/0x40 > [<ffffffff8108ea38>] __wake_up_common+0x58/0x90 > [<ffffffff8108ff59>] __wake_up+0x39/0x50 > [<ffffffff8110d4f8>] __call_rcu_nocb_enqueue+0xa8/0xc0 > [<ffffffff81111450>] __call_rcu+0x140/0x820 > [<ffffffff81111b8d>] call_rcu+0x1d/0x20 > [<ffffffff81093697>] cpu_attach_domain+0x287/0x360 > [<ffffffff81099d7e>] build_sched_domains+0xe5e/0x10a0 > [<ffffffff81efa7fc>] sched_init_smp+0x3b7/0x47a > [<ffffffff81ee1f4e>] kernel_init_freeable+0xf6/0x202 > [<ffffffff817200be>] kernel_init+0xe/0x190 > [<ffffffff8173d22c>] ret_from_fork+0x7c/0xb0 > > -> #0 (&rdp->nocb_wq){......}: > [<ffffffff810cb7ca>] __lock_acquire+0x191a/0x1be0 > [<ffffffff810cc243>] lock_acquire+0x93/0x200 > [<ffffffff8173419b>] _raw_spin_lock_irqsave+0x4b/0x90 > [<ffffffff8108ff43>] __wake_up+0x23/0x50 > [<ffffffff8110d4f8>] __call_rcu_nocb_enqueue+0xa8/0xc0 > [<ffffffff81111450>] __call_rcu+0x140/0x820 > [<ffffffff81111bb0>] kfree_call_rcu+0x20/0x30 > [<ffffffff81149abf>] put_ctx+0x4f/0x70 > [<ffffffff81154c3e>] perf_event_exit_task+0x12e/0x230 > [<ffffffff81056b8d>] do_exit+0x30d/0xcc0 > [<ffffffff8105893c>] do_group_exit+0x4c/0xc0 > [<ffffffff810589c4>] SyS_exit_group+0x14/0x20 > [<ffffffff8173d4e4>] tracesys+0xdd/0xe2 > > other info that might help us debug this: > > Chain exists of: > &rdp->nocb_wq --> &rq->lock --> &ctx->lock > > Possible unsafe locking scenario: > > CPU0 CPU1 > ---- ---- > lock(&ctx->lock); > lock(&rq->lock); > lock(&ctx->lock); > lock(&rdp->nocb_wq); > > *** DEADLOCK *** > > 1 lock held by trinity-child2/15191: > #0: (&ctx->lock){-.-...}, at: [<ffffffff81154c19>] perf_event_exit_task+0x109/0x230 > > stack backtrace: > CPU: 2 PID: 15191 Comm: trinity-child2 Not tainted 3.12.0-rc3+ #92 > ffffffff82565b70 ffff880070c2dbf8 ffffffff8172a363 ffffffff824edf40 > ffff880070c2dc38 ffffffff81726741 ffff880070c2dc90 ffff88022383b1c0 > ffff88022383aac0 0000000000000000 ffff88022383b188 ffff88022383b1c0 > Call Trace: > [<ffffffff8172a363>] dump_stack+0x4e/0x82 > [<ffffffff81726741>] print_circular_bug+0x200/0x20f > [<ffffffff810cb7ca>] __lock_acquire+0x191a/0x1be0 > [<ffffffff810c6439>] ? get_lock_stats+0x19/0x60 > [<ffffffff8100b2f4>] ? native_sched_clock+0x24/0x80 > [<ffffffff810cc243>] lock_acquire+0x93/0x200 > [<ffffffff8108ff43>] ? __wake_up+0x23/0x50 > [<ffffffff8173419b>] _raw_spin_lock_irqsave+0x4b/0x90 > [<ffffffff8108ff43>] ? __wake_up+0x23/0x50 > [<ffffffff8108ff43>] __wake_up+0x23/0x50 > [<ffffffff8110d4f8>] __call_rcu_nocb_enqueue+0xa8/0xc0 > [<ffffffff81111450>] __call_rcu+0x140/0x820 > [<ffffffff8109bc8f>] ? local_clock+0x3f/0x50 > [<ffffffff81111bb0>] kfree_call_rcu+0x20/0x30 > [<ffffffff81149abf>] put_ctx+0x4f/0x70 > [<ffffffff81154c3e>] perf_event_exit_task+0x12e/0x230 > [<ffffffff81056b8d>] do_exit+0x30d/0xcc0 > [<ffffffff810c9af5>] ? trace_hardirqs_on_caller+0x115/0x1e0 > [<ffffffff810c9bcd>] ? trace_hardirqs_on+0xd/0x10 > [<ffffffff8105893c>] do_group_exit+0x4c/0xc0 > [<ffffffff810589c4>] SyS_exit_group+0x14/0x20 > [<ffffffff8173d4e4>] tracesys+0xdd/0xe2 The underlying problem is that perf is invoking call_rcu() with the scheduler locks held, but in NOCB mode, call_rcu() will with high probability invoke the scheduler -- which just might want to use its locks. The reason that call_rcu() needs to invoke the scheduler is to wake up the corresponding rcuo callback-offload kthread, which does the job of starting up a grace period and invoking the callbacks afterwards. One solution (championed on a related problem by Lai Jiangshan) is to simply defer the wakeup to some point where scheduler locks are no longer held. Since we don't want to unnecessarily incur the cost of such deferral, the task before us is threefold: 1. Determine when it is likely that a relevant scheduler lock is held. 2. Defer the wakeup in such cases. 3. Ensure that all deferred wakeups eventually happen, preferably sooner rather than later. We use irqs_disabled_flags() as a proxy for relevant scheduler locks being held. This works because the relevant locks are always acquired with interrupts disabled. We may defer more often than needed, but that is at least safe. The wakeup deferral is tracked via a new field in the per-CPU and per-RCU-flavor rcu_data structure, namely ->nocb_defer_wakeup. This flag is checked by the RCU core processing. The __rcu_pending() function now checks this flag, which causes rcu_check_callbacks() to initiate RCU core processing at each scheduling-clock interrupt where this flag is set. Of course this is not sufficient because scheduling-clock interrupts are often turned off (the things we used to be able to count on!). So the flags are also checked on entry to any state that RCU considers to be idle, which includes both NO_HZ_IDLE idle state and NO_HZ_FULL user-mode-execution state. This approach should allow call_rcu() to be invoked regardless of what locks you might be holding, the key word being "should". Reported-by: Dave Jones <davej@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org>
|
#
78e4bc34 |
|
24-Sep-2013 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Fix and comment ordering around wait_event() It is all too easy to forget that wait_event() does not necessarily imply a full memory barrier. The case where it does not is where the condition transitions to true just as wait_event() starts execution. This is actually a feature: The standard use of wait_event() involves locking, in which case the locks provide the needed ordering (you hold a lock across the wake_up() and acquire that same lock after wait_event() returns). Given that I did forget that wait_event() does not necessarily imply a full memory barrier in one case, this commit fixes that case. This commit also adds comments calling out the placement of existing memory barriers relied on by wait_event() calls. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
d689fe22 |
|
13-Nov-2013 |
Thomas Gleixner <tglx@linutronix.de> |
NOHZ: Check for nohz active instead of nohz enabled RCU and the fine grained idle time accounting functions check tick_nohz_enabled. But that variable is merily telling that NOHZ has been enabled in the config and not been disabled on the command line. But it does not tell anything about nohz being active. That's what all this should check for. Matthew reported, that the idle accounting on his old P1 machine showed bogus values, when he enabled NOHZ in the config and did not disable it on the kernel command line. The reason is that his machine uses (refined) jiffies as a clocksource which explains why the "fine" grained accounting went into lala land, because it depends on when the system goes and leaves idle relative to the jiffies increment. Provide a tick_nohz_active indicator and let RCU and the accounting code use this instead of tick_nohz_enable. Reported-and-tested-by: Matthew Whitehead <tedheadster@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Steven Rostedt <rostedt@goodmis.org> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: john.stultz@linaro.org Cc: mwhitehe@redhat.com Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1311132052240.30673@ionos.tec.linutronix.de
|
#
1696a8be |
|
31-Oct-2013 |
Peter Zijlstra <peterz@infradead.org> |
locking: Move the rtmutex code to kernel/locking/ Suggested-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/n/tip-p9ijt8div0hwldexwfm4nlhj@git.kernel.org [ Fixed build failure in kernel/rcu/tree_plugin.h. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
4102adab |
|
08-Oct-2013 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Move RCU-related source code to kernel/rcu directory Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Ingo Molnar <mingo@kernel.org>
|