Cross Reference: /linux-master/kernel/time/posix-cpu-timers.c

History log of /linux-master/kernel/time/posix-cpu-timers.c
Revision	Date	Author	Comments
# f7abf14f	17-Apr-2023	Thomas Gleixner <tglx@linutronix.de>	posix-cpu-timers: Implement the missing timer_wait_running callback For some unknown reason the introduction of the timer_wait_running callback missed to fixup posix CPU timers, which went unnoticed for almost four years. Marco reported recently that the WARN_ON() in timer_wait_running() triggers with a posix CPU timer test case. Posix CPU timers have two execution models for expiring timers depending on CONFIG_POSIX_CPU_TIMERS_TASK_WORK: 1) If not enabled, the expiry happens in hard interrupt context so spin waiting on the remote CPU is reasonably time bound. Implement an empty stub function for that case. 2) If enabled, the expiry happens in task work before returning to user space or guest mode. The expired timers are marked as firing and moved from the timer queue to a local list head with sighand lock held. Once the timers are moved, sighand lock is dropped and the expiry happens in fully preemptible context. That means the expiring task can be scheduled out, migrated, interrupted etc. So spin waiting on it is more than suboptimal. The timer wheel has a timer_wait_running() mechanism for RT, which uses a per CPU timer-base expiry lock which is held by the expiry code and the task waiting for the timer function to complete blocks on that lock. This does not work in the same way for posix CPU timers as there is no timer base and expiry for process wide timers can run on any task belonging to that process, but the concept of waiting on an expiry lock can be used too in a slightly different way: - Add a mutex to struct posix_cputimers_work. This struct is per task and used to schedule the expiry task work from the timer interrupt. - Add a task_struct pointer to struct cpu_timer which is used to store a the task which runs the expiry. That's filled in when the task moves the expired timers to the local expiry list. That's not affecting the size of the k_itimer union as there are bigger union members already - Let the task take the expiry mutex around the expiry function - Let the waiter acquire a task reference with rcu_read_lock() held and block on the expiry mutex This avoids spin-waiting on a task which might not even be on a CPU and works nicely for RT too. Fixes: ec8f954a40da ("posix-timers: Use a callback for cancel synchronization on PREEMPT_RT") Reported-by: Marco Elver <elver@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Marco Elver <elver@google.com> Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/87zg764ojw.ffs@tglx
# 915d4ad3	16-Jan-2023	Uros Bizjak <ubizjak@gmail.com>	posix-timers: Use atomic64_try_cmpxchg() in __update_gt_cputime() Use atomic64_try_cmpxchg() instead of atomic64_cmpxchg() in __update_gt_cputime(). The x86 CMPXCHG instruction returns success in ZF flag, so this change saves a compare after cmpxchg() (and related move instruction in front of cmpxchg()). Also, atomic64_try_cmpxchg() implicitly assigns old *ptr value to "old" when cmpxchg() fails. There is no need to re-read the value in the loop. No functional change intended. Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20230116165337.5810-1-ubizjak@gmail.com
# e71ba124	22-Apr-2022	Eric W. Biederman <ebiederm@xmission.com>	signal: Replace __group_send_sig_info with send_signal_locked The function __group_send_sig_info is just a light wrapper around send_signal_locked with one parameter fixed to a constant value. As the wrapper adds no real value update the code to directly call the wrapped function. Tested-by: Kees Cook <keescook@chromium.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Link: https://lkml.kernel.org/r/20220505182645.497868-2-ebiederm@xmission.com Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
# 8ca07e17	28-Jan-2022	Eric W. Biederman <ebiederm@xmission.com>	task_work: Remove unnecessary include from posix_timers.h Break a header file circular dependency by removing the unnecessary include of task_work.h from posix_timers.h. sched.h -> posix-timers.h posix-timers.h -> task_work.h task_work.h -> sched.h Add missing includes of task_work.h to: arch/x86/mm/tlb.c kernel/time/posix-cpu-timers.c Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lkml.kernel.org/r/20220309162454.123006-6-ebiederm@xmission.com Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
# 18c91bb2	05-Jan-2022	Barret Rhoden <brho@google.com>	prlimit: do not grab the tasklist_lock Unnecessarily grabbing the tasklist_lock can be a scalability bottleneck for workloads that also must grab the tasklist_lock for waiting, killing, and cloning. The tasklist_lock was grabbed to protect tsk->sighand from disappearing (becoming NULL). tsk->signal was already protected by holding a reference to tsk. update_rlimit_cpu() assumed tsk->sighand != NULL. With this commit, it attempts to lock_task_sighand(). However, this means that update_rlimit_cpu() can fail. This only happens when a task is exiting. Note that during exec, sighand may change, but it will not be NULL. Prior to this commit, the do_prlimit() ensured that update_rlimit_cpu() would not fail by read locking the tasklist_lock and checking tsk->sighand != NULL. If update_rlimit_cpu() fails, there may be other tasks that are not exiting that share tsk->signal. However, the group_leader is the last task to be released, so if we cannot update_rlimit_cpu(group_leader), then the entire process is exiting. The only other caller of update_rlimit_cpu() is selinux_bprm_committing_creds(). It has tsk == current, so update_rlimit_cpu() cannot fail (current->sighand cannot disappear until current exits). This change resulted in a 14% speedup on a microbenchmark where parents kill and wait on their children, and children getpriority, setpriority, and getrlimit. Signed-off-by: Barret Rhoden <brho@google.com> Link: https://lkml.kernel.org/r/20220106172041.522167-4-brho@google.com Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
# ca7752ca	01-Nov-2021	Michael Pratt <mpratt@google.com>	posix-cpu-timers: Clear task::posix_cputimers_work in copy_process() copy_process currently copies task_struct.posix_cputimers_work as-is. If a timer interrupt arrives while handling clone and before dup_task_struct completes then the child task will have: 1. posix_cputimers_work.scheduled = true 2. posix_cputimers_work.work queued. copy_process clears task_struct.task_works, so (2) will have no effect and posix_cpu_timers_work will never run (not to mention it doesn't make sense for two tasks to share a common linked list). Since posix_cpu_timers_work never runs, posix_cputimers_work.scheduled is never cleared. Since scheduled is set, future timer interrupts will skip scheduling work, with the ultimate result that the task will never receive timer expirations. Together, the complete flow is: 1. Task 1 calls clone(), enters kernel. 2. Timer interrupt fires, schedules task work on Task 1. 2a. task_struct.posix_cputimers_work.scheduled = true 2b. task_struct.posix_cputimers_work.work added to task_struct.task_works. 3. dup_task_struct() copies Task 1 to Task 2. 4. copy_process() clears task_struct.task_works for Task 2. 5. Future timer interrupts on Task 2 see task_struct.posix_cputimers_work.scheduled = true and skip scheduling work. Fix this by explicitly clearing contents of task_struct.posix_cputimers_work in copy_process(). This was never meant to be shared or inherited across tasks in the first place. Fixes: 1fb497dd0030 ("posix-cpu-timers: Provide mechanisms to defer timer handling to task_work") Reported-by: Rhys Hiltner <rhys@justin.tv> Signed-off-by: Michael Pratt <mpratt@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20211101210615.716522-1-mpratt@google.com
# 8cd9da85	13-Sep-2021	Frederic Weisbecker <frederic@kernel.org>	posix-cpu-timers: Prevent spuriously armed 0-value itimer Resetting/stopping an itimer eventually leads to it being reprogrammed with an actual "0" value. As a result the itimer expires on the next tick, triggering an unexpected signal. To fix this, make sure that struct signal_struct::it[CPUCLOCK_PROF/VIRT]::expires is set to 0 when setitimer() passes a 0 it_value, indicating that the timer must stop. Fixes: 406dd42bd1ba ("posix-cpu-timers: Force next expiration recalc after itimer reset") Reported-by: Victor Stinner <vstinner@redhat.com> Reported-by: Chris Hixon <linux-kernel-bugs@hixontech.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20210913145332.232023-1-frederic@kernel.org
# ee375328	26-Jul-2021	Frederic Weisbecker <frederic@kernel.org>	posix-cpu-timers: Recalc next expiration when timer_settime() ends up not queueing There are several scenarios that can result in posix_cpu_timer_set() not queueing the timer but still leaving the threadgroup cputime counter running or keeping the tick dependency around for a random amount of time. 1) If timer_settime() is called with a 0 expiration on a timer that is already disabled, the process wide cputime counter will be started and won't ever get a chance to be stopped by stop_process_timer() since no timer is actually armed to be processed. The following snippet is enough to trigger the issue. void trigger_process_counter(void) { timer_t id; struct itimerspec val = { }; timer_create(CLOCK_PROCESS_CPUTIME_ID, NULL, &id); timer_settime(id, TIMER_ABSTIME, &val, NULL); timer_delete(id); } 2) If timer_settime() is called with a 0 expiration on a timer that is already armed, the timer is dequeued but not really disarmed. So the process wide cputime counter and the tick dependency may still remain a while around. The following code snippet keeps this overhead around for one week after the timer deletion: void trigger_process_counter(void) { timer_t id; struct itimerspec val = { }; val.it_value.tv_sec = 604800; timer_create(CLOCK_PROCESS_CPUTIME_ID, NULL, &id); timer_settime(id, 0, &val, NULL); timer_delete(id); } 3) If the timer was initially deactivated, this call to timer_settime() with an early expiration may have started the process wide cputime counter even though the timer hasn't been queued and armed because it has fired early and inline within posix_cpu_timer_set() itself. As a result the process wide cputime counter may never stop until a new timer is ever armed in the future. The following code snippet can reproduce this: void trigger_process_counter(void) { timer_t id; struct itimerspec val = { }; signal(SIGALRM, SIG_IGN); timer_create(CLOCK_PROCESS_CPUTIME_ID, NULL, &id); val.it_value.tv_nsec = 1; timer_settime(id, TIMER_ABSTIME, &val, NULL); } 4) If the timer was initially armed with a former expiration value before this call to timer_settime() and the current call sets an early deadline that has already expired, the timer fires inline within posix_cpu_timer_set(). In this case it must have been dequeued before firing inline with its new expiration value, yet it hasn't been disarmed in this case. So the process wide cputime counter and the tick dependency may still be around for a while even after the timer fired. The following code snippet can reproduce this: void trigger_process_counter(void) { timer_t id; struct itimerspec val = { }; signal(SIGALRM, SIG_IGN); timer_create(CLOCK_PROCESS_CPUTIME_ID, NULL, &id); val.it_value.tv_sec = 100; timer_settime(id, TIMER_ABSTIME, &val, NULL); val.it_value.tv_sec = 0; val.it_value.tv_nsec = 1; timer_settime(id, TIMER_ABSTIME, &val, NULL); } Fix all these issues with triggering the related base next expiration recalculation on the next tick. This also implies to re-evaluate the need to keep around the process wide cputime counter and the tick dependency, in a similar fashion to disarm_timer(). Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20210726125513.271824-7-frederic@kernel.org
# 5c8f23e6	26-Jul-2021	Frederic Weisbecker <frederic@kernel.org>	posix-cpu-timers: Consolidate timer base accessor Remove the ad-hoc timer base accessors and provide a consolidated one. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20210726125513.271824-6-frederic@kernel.org
# d9c1b2a1	26-Jul-2021	Frederic Weisbecker <frederic@kernel.org>	posix-cpu-timers: Remove confusing return value override The end of the function cannot be reached with an error in variable ret. Unconfuse reviewers about that. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20210726125513.271824-5-frederic@kernel.org
# 406dd42b	26-Jul-2021	Frederic Weisbecker <frederic@kernel.org>	posix-cpu-timers: Force next expiration recalc after itimer reset When an itimer deactivates a previously armed expiration, it simply doesn't do anything. As a result the process wide cputime counter keeps running and the tick dependency stays set until it reaches the old ghost expiration value. This can be reproduced with the following snippet: void trigger_process_counter(void) { struct itimerval n = {}; n.it_value.tv_sec = 100; setitimer(ITIMER_VIRTUAL, &n, NULL); n.it_value.tv_sec = 0; setitimer(ITIMER_VIRTUAL, &n, NULL); } Fix this with resetting the relevant base expiration. This is similar to disarming a timer. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20210726125513.271824-4-frederic@kernel.org
# 175cc3ab	26-Jul-2021	Frederic Weisbecker <frederic@kernel.org>	posix-cpu-timers: Force next_expiration recalc after timer deletion A timer deletion only dequeues the timer but it doesn't shutdown the related costly process wide cputimer counter and the tick dependency. The following code snippet keeps this overhead around for one week after the timer deletion: void trigger_process_counter(void) { timer_t id; struct itimerspec val = { }; val.it_value.tv_sec = 604800; timer_create(CLOCK_PROCESS_CPUTIME_ID, NULL, &id); timer_settime(id, 0, &val, NULL); timer_delete(id); } Make sure the next target's tick recalculates the nearest expiration and clears the process wide counter and tick dependency if necessary. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20210726125513.271824-3-frederic@kernel.org
# a5dec9f8	26-Jul-2021	Frederic Weisbecker <frederic@kernel.org>	posix-cpu-timers: Assert task sighand is locked while starting cputime counter Starting the process wide cputime counter needs to be done in the same sighand locking sequence than actually arming the related timer otherwise this races against concurrent timers setting/expiring in the same threadgroup. Detecting that the cputime counter is started without holding the sighand lock is a first step toward debugging such situations. Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20210726125513.271824-2-frederic@kernel.org
# 1a3402d9	02-Jun-2021	Frederic Weisbecker <frederic@kernel.org>	posix-cpu-timers: Fix rearm racing against process tick Since the process wide cputime counter is started locklessly from posix_cpu_timer_rearm(), it can be concurrently stopped by operations on other timers from the same thread group, such as in the following unlucky scenario: CPU 0 CPU 1 ----- ----- timer_settime(TIMER B) posix_cpu_timer_rearm(TIMER A) cpu_clock_sample_group() (pct->timers_active already true) handle_posix_cpu_timers() check_process_timers() stop_process_timers() pct->timers_active = false arm_timer(TIMER A) tick -> run_posix_cpu_timers() // sees !pct->timers_active, ignore // our TIMER A Fix this with simply locking process wide cputime counting start and timer arm in the same block. Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Fixes: 60f2ceaa8111 ("posix-cpu-timers: Remove unnecessary locking around cpu_clock_sample_group") Cc: stable@vger.kernel.org Cc: Oleg Nesterov <oleg@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Eric W. Biederman <ebiederm@xmission.com>
# 1e4ca26d	12-May-2021	Marcelo Tosatti <mtosatti@redhat.com>	tick/nohz: Change signal tick dependency to wake up CPUs of member tasks Rather than waking up all nohz_full CPUs on the system, only wake up the target CPUs of member threads of the signal. Reduces interruptions to nohz_full CPUs. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20210512232924.150322-8-frederic@kernel.org
# 4bf07f65	22-Mar-2021	Ingo Molnar <mingo@kernel.org>	timekeeping, clocksource: Fix various typos in comments Fix ~56 single-word typos in timekeeping & clocksource code comments. Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: John Stultz <john.stultz@linaro.org> Cc: Stephen Boyd <sboyd@kernel.org> Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: linux-kernel@vger.kernel.org
# 5abbe51a	01-Feb-2021	Oleg Nesterov <oleg@redhat.com>	kernel, fs: Introduce and use set_restart_fn() and arch_set_restart_data() Preparation for fixing get_nr_restart_syscall() on X86 for COMPAT. Add a new helper which sets restart_block->fn and calls a dummy arch_set_restart_data() helper. Fixes: 609c19a385c8 ("x86/ptrace: Stop setting TS_COMPAT in ptrace code") Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210201174641.GA17871@redhat.com
# 1fb497dd	29-Jul-2020	Thomas Gleixner <tglx@linutronix.de>	posix-cpu-timers: Provide mechanisms to defer timer handling to task_work Running posix CPU timers in hard interrupt context has a few downsides: - For PREEMPT_RT it cannot work as the expiry code needs to take sighand lock, which is a 'sleeping spinlock' in RT. The original RT approach of offloading the posix CPU timer handling into a high priority thread was clumsy and provided no real benefit in general. - For fine grained accounting it's just wrong to run this in context of the timer interrupt because that way a process specific CPU time is accounted to the timer interrupt. - Long running timer interrupts caused by a large amount of expiring timers which can be created and armed by unpriviledged user space. There is no hard requirement to expire them in interrupt context. If the signal is targeted at the task itself then it won't be delivered before the task returns to user space anyway. If the signal is targeted at a supervisor process then it might be slightly delayed, but posix CPU timers are inaccurate anyway due to the fact that they are tied to the tick. Provide infrastructure to schedule task work which allows splitting the posix CPU timer code into a quick check in interrupt context and a thread context expiry and signal delivery function. This has to be enabled by architectures as it requires that the architecture specific KVM implementation handles pending task work before exiting to guest mode. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20200730102337.783470146@linutronix.de
# 820903c7	29-Jul-2020	Thomas Gleixner <tglx@linutronix.de>	posix-cpu-timers: Split run_posix_cpu_timers() Split it up as a preparatory step to move the heavy lifting out of interrupt context. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20200730102337.677439437@linutronix.de
# 96498773	27-Apr-2020	Eric W. Biederman <ebiederm@xmission.com>	posix-cpu-timers: Replace __get_task_for_clock with pid_for_clock Now that the codes store references to pids instead of referendes to tasks. Looking up a task for a clock instead of looking up a struct pid makes the code more difficult to verify it is correct than necessary. In posix_cpu_timers_create get_task_pid can race with release_task for threads and return a NULL pid. As put_pid and cpu_timer_task_rcu handle NULL pids just fine the code works without problems but it is an extra case to consider and keep in mind while verifying and modifying the code. There are races with de_thread to consider that only don't apply because thread clocks are only allowed for threads in the same thread_group. So instead of leaving a burden for people making modification to the code in the future return a rcu protected struct pid for the clock instead. The logic for __get_task_for_pid and lookup_task has been folded into the new function pid_for_clock with the only change being the logic has been modified from working on a task to working on a pid that will be returned. In posix_cpu_clock_get instead of calling pid_for_clock checking the result and then calling pid_task to get the task. The result of pid_for_clock is fed directly into pid_task. This is safe because pid_task handles NULL pids. As such an extra error check was unnecessary. Instead of hiding the flag that enables the special clock_gettime handling, I have made the 3 callers just pass the flag in themselves. That is less code and seems just as simple to work with as the wrapper functions. Historically the clock_gettime special case of allowing a process clock to be found by the thread id did not even exist [33ab0fec3352] but Thomas Gleixner reports that he has found code that uses that functionality [55e8c8eb2c7b]. Link: https://lkml.kernel.org/r/87zhaxqkwa.fsf@nanos.tec.linutronix.de/ Ref: 33ab0fec3352 ("posix-timers: Consolidate posix_cpu_clock_get()") Ref: 55e8c8eb2c7b ("posix-cpu-timers: Store a reference to a pid not a task") Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
# fece9826	27-Apr-2020	Eric W. Biederman <ebiederm@xmission.com>	posix-cpu-timers: Replace cpu_timer_pid_type with clock_pid_type Taking a clock and returning a pid_type is a more general and a superset of taking a timer and returning a pid_type. Perform this generalization so that future changes may use this code on clocks as well as timers. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
# 9bf7c324	25-Apr-2020	Eric W. Biederman <ebiederm@xmission.com>	posix-cpu-timers: Extend rcu_read_lock removing task_struct references Now that the code stores of pid references it is no longer necessary or desirable to take a reference on task_struct in __get_task_for_clock. Instead extend the scope of rcu_read_lock and remove the reference counting on struct task_struct entirely. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
# c7f51940	28-Apr-2020	Eric W. Biederman <ebiederm@xmission.com>	posix-cpu-timer: Unify the now redundant code in lookup_task Now that both !thread paths through lookup_task call thread_group_leader, unify them into the single test at the end of lookup_task. This unification just makes it clear what is happening in the gettime special case of lookup_task. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
# 8feebc67	27-Apr-2020	Eric W. Biederman <ebiederm@xmission.com>	posix-cpu-timer: Tidy up group_leader logic in lookup_task Replace has_group_leader_pid with thread_group_leader. Years ago Oleg suggested changing thread_group_leader to has_group_leader_pid to handle races. Looking at the code then and now I don't see how it ever helped. Especially as then the code really did need to be the thread_group_leader. Today it doesn't make a difference if thread_group_leader races with de_thread as the task returned from lookup_task in the non-thread case is just used to find values in task->signal. Since the races with de_thread have never been handled revert has_group_header_pid to thread_group_leader for clarity. Update the comment in lookup_task to remove implementation details that are no longer true and to mention task->signal instead of task->sighand, as the relevant cpu timer details are all in task->signal. Ref: 55e8c8eb2c7b ("posix-cpu-timers: Store a reference to a pid not a task") Ref: c0deae8c9587 ("posix-cpu-timers: Rcu_read_lock/unlock protect find_task_by_vpid call") Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
# d53f2b62	20-Mar-2020	Sebastian Andrzej Siewior <bigeasy@linutronix.de>	lockdep: Add posixtimer context tracing bits Splitting run_posix_cpu_timers() into two parts is work in progress which is stuck on other entry code related problems. The heavy lifting which involves locking of sighand lock will be moved into task context so the necessary execution time is burdened on the task and not on interrupt context. Until this work completes lockdep with the spinlock nesting rules enabled would emit warnings for this known context. Prevent it by setting "->irq_config = 1" for the invocation of run_posix_cpu_timers() so lockdep does not complain when sighand lock is acquried. This will be removed once the split is completed. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200321113242.751182723@linutronix.de
# 55e8c8eb	28-Feb-2020	Eric W. Biederman <ebiederm@xmission.com>	posix-cpu-timers: Store a reference to a pid not a task posix cpu timers do not handle the death of a process well. This is most clearly seen when a multi-threaded process calls exec from a thread that is not the leader of the thread group. The posix cpu timer code continues to pin the old thread group leader and is unable to find the siglock from there. This results in posix_cpu_timer_del being unable to delete a timer, posix_cpu_timer_set being unable to set a timer. Further to compensate for the problems in posix_cpu_timer_del on a multi-threaded exec all timers that point at the multi-threaded task are stopped. The code for the timers fundamentally needs to check if the target process/thread is alive. This needs an extra level of indirection. This level of indirection is already available in struct pid. So replace cpu.task with cpu.pid to get the needed extra layer of indirection. In addition to handling things more cleanly this reduces the amount of memory a timer can pin when a process exits and then is reaped from a task_struct to the vastly smaller struct pid. Fixes: e0a70217107e ("posix-cpu-timers: workaround to suppress the problems with mt exec") Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/87wo86tz6d.fsf@x220.int.ebiederm.org
# beb41d9c	28-Feb-2020	Eric W. Biederman <ebiederm@xmission.com>	posix-cpu-timers: Pass the task into arm_timer() The task has been already computed to take siglock before calling arm_timer. So pass the benefit of that labor into arm_timer(). Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/8736auvdt1.fsf@x220.int.ebiederm.org
# 60f2ceaa	28-Feb-2020	Eric W. Biederman <ebiederm@xmission.com>	posix-cpu-timers: Remove unnecessary locking around cpu_clock_sample_group As of e78c3496790e ("time, signal: Protect resource use statistics with seqlock") cpu_clock_sample_group no longers needs siglock protection. Unfortunately no one realized it at the time. Remove the extra locking that is for cpu_clock_sample_group and not for cpu_clock_sample. This significantly simplifies the code. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/878skmvdts.fsf@x220.int.ebiederm.org
# a2efdbf4	28-Feb-2020	Eric W. Biederman <ebiederm@xmission.com>	posix-cpu-timers: cpu_clock_sample_group() no longer needs siglock As of e78c3496790e ("time, signal: Protect resource use statistics with seqlock") cpu_clock_sample_group() no longer needs siglock protection so remove the stale comment. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/87eeuevduq.fsf@x220.int.ebiederm.org
# 819a95fe	11-Nov-2019	Andrei Vagin <avagin@gmail.com>	posix-clocks: Rename the clock_get() callback to clock_get_timespec() The upcoming support for time namespaces requires to have access to: - The time in a task's time namespace for sys_clock_gettime() - The time in the root name space for common_timer_get() That adds a valid reason to finally implement a separate callback which returns the time in ktime_t format, rather than in (struct timespec). Rename the clock_get() callback to clock_get_timespec() as a preparation for introducing clock_get_ktime(). Suggested-by: Thomas Gleixner <tglx@linutronix.de> Co-developed-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20191112012724.250792-6-dima@arista.com
# 7f2cbcbc	21-Oct-2019	Yi Wang <wang.yi59@zte.com.cn>	posix-cpu-timers: Fix two trivial comments Recent changes modified the function arguments of thread_group_sample_cputime() and task_cputimers_expired(), but forgot to update the comments. Fix it up. [ tglx: Changed the argument name of task_cputimers_expired() as the pointer points to an array of samples. ] Fixes: b7be4ef1365d ("posix-cpu-timers: Switch thread group sampling to array") Fixes: 001f7971433a ("posix-cpu-timers: Make expiry checks array based") Signed-off-by: Yi Wang <wang.yi59@zte.com.cn> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/1571643852-21848-1-git-send-email-wang.yi59@zte.com.cn
# 77b4b542	05-Sep-2019	Thomas Gleixner <tglx@linutronix.de>	posix-cpu-timers: Fix permission check regression The recent consolidation of the three permission checks introduced a subtle regression. For timer_create() with a process wide timer it returns the current task if the lookup through the PID which is encoded into the clockid results in returning current. That's broken because it does not validate whether the current task is the group leader. That was caused by the two different variants of permission checks: - posix_cpu_timer_get() allowed access to the process wide clock when the looked up task is current. That's not an issue because the process wide clock is in the shared sighand. - posix_cpu_timer_create() made sure that the looked up task is the group leader. Restore the previous state. Note, that these permission checks are more than questionable, but that's subject to follow up changes. Fixes: 6ae40e3fdcd3 ("posix-cpu-timers: Provide task validation functions") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1909052314110.1902@nanos.tec.linutronix.de
# a2ed4fd6	28-Aug-2019	Thomas Gleixner <tglx@linutronix.de>	posix-cpu-timers: Make expiry_active check actually work correctly The state tracking changes broke the expiry active check by not writing to it and instead sitting timers_active, which is already set. That's not a big issue as the actual expiry is protected by sighand lock, so concurrent handling is not possible. That means that the second task which invokes that function executes the expiry code for nothing. Write to the proper flag. Also add a check whether the flag is set into check_process_timers(). That check had been missing in the code before the rework already. The check for another task handling the expiry of process wide timers was only done in the fastpath check. If the fastpath check returns true because a per task timer expired, then the checking of process wide timers was done in parallel which is as explained above just a waste of cycles. Fixes: 244d49e30653 ("posix-cpu-timers: Move state tracking to struct posix_cputimers") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Frederic Weisbecker <frederic@kernel.org>
# 60bda037	27-Aug-2019	Thomas Gleixner <tglx@linutronix.de>	posix-cpu-timers: Utilize timerqueue for storage Using a linear O(N) search for timer insertion affects execution time and D-cache footprint badly with a larger number of timers. Switch the storage to a timerqueue which is already used for hrtimers and alarmtimers. It does not affect the size of struct k_itimer as it.alarm is still larger. The extra list head for the expiry list will go away later once the expiry is moved into task work context. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1908272129220.1939@nanos.tec.linutronix.de
# 244d49e3	21-Aug-2019	Thomas Gleixner <tglx@linutronix.de>	posix-cpu-timers: Move state tracking to struct posix_cputimers Put it where it belongs and clean up the ifdeffery in fork completely. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20190821192922.743229404@linutronix.de
# 8991afe2	21-Aug-2019	Thomas Gleixner <tglx@linutronix.de>	posix-cpu-timers: Deduplicate rlimit handling Both thread and process expiry functions have the same functionality for sending signals for soft and hard RLIMITs duplicated in 4 different ways. Split it out into a common function and cleanup the callsites. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lkml.kernel.org/r/20190821192922.653276779@linutronix.de
# dd670224	21-Aug-2019	Thomas Gleixner <tglx@linutronix.de>	posix-cpu-timers: Remove pointless comparisons The soft RLIMIT expiry code checks whether the soft limit is greater than the hard limit. That's pointless because if the soft RLIMIT is greater than the hard RLIMIT then that code cannot be reached as the hard RLIMIT check is before that and already killed the process. Remove it. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lkml.kernel.org/r/20190821192922.548747613@linutronix.de
# 8ea1de90	21-Aug-2019	Thomas Gleixner <tglx@linutronix.de>	posix-cpu-timers: Get rid of 64bit divisions Instead of dividing A to match the units of B it's more efficient to multiply B to match the units of A. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lkml.kernel.org/r/20190821192922.458286860@linutronix.de
# 1cd07c0b	21-Aug-2019	Thomas Gleixner <tglx@linutronix.de>	posix-cpu-timers: Consolidate timer expiry further With the array based samples and expiry cache, the expiry function can use a loop to collect timers from the clock specific lists. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lkml.kernel.org/r/20190821192922.365469982@linutronix.de
# 2bbdbdae	21-Aug-2019	Thomas Gleixner <tglx@linutronix.de>	posix-cpu-timers: Get rid of zero checks Deactivation of the expiry cache is done by setting all clock caches to 0. That requires to have a check for zero in all places which update the expiry cache: if (cache == 0 \|\| new < cache) cache = new; Use U64_MAX as the deactivated value, which allows to remove the zero checks when updating the cache and reduces it to the obvious check: if (new < cache) cache = new; This also removes the weird workaround in do_prlimit() which was required to convert a RLIMIT_CPU value of 0 (immediate expiry) to 1 because handing in 0 to the posix CPU timer code would have effectively disarmed it. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lkml.kernel.org/r/20190821192922.275086128@linutronix.de