#
14274d0b |
|
18-Dec-2023 |
Peter Hilber <peter.hilber@opensynergy.com> |
timekeeping: Fix cross-timestamp interpolation for non-x86 So far, get_device_system_crosststamp() unconditionally passes system_counterval.cycles to timekeeping_cycles_to_ns(). But when interpolating system time (do_interp == true), system_counterval.cycles is before tkr_mono.cycle_last, contrary to the timekeeping_cycles_to_ns() expectations. On x86, CONFIG_CLOCKSOURCE_VALIDATE_LAST_CYCLE will mitigate on interpolating, setting delta to 0. With delta == 0, xtstamp->sys_monoraw and xtstamp->sys_realtime are then set to the last update time, as implicitly expected by adjust_historical_crosststamp(). On other architectures, the resulting nonsense xtstamp->sys_monoraw and xtstamp->sys_realtime corrupt the xtstamp (ts) adjustment in adjust_historical_crosststamp(). Fix this by deriving xtstamp->sys_monoraw and xtstamp->sys_realtime from the last update time when interpolating, by using the local variable "cycles". The local variable already has the right value when interpolating, unlike system_counterval.cycles. Fixes: 2c756feb18d9 ("time: Add history to cross timestamp interface supporting slower devices") Signed-off-by: Peter Hilber <peter.hilber@opensynergy.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: John Stultz <jstultz@google.com> Link: https://lore.kernel.org/r/20231218073849.35294-4-peter.hilber@opensynergy.com
|
#
87a41130 |
|
18-Dec-2023 |
Peter Hilber <peter.hilber@opensynergy.com> |
timekeeping: Fix cross-timestamp interpolation corner case decision The cycle_between() helper checks if parameter test is in the open interval (before, after). Colloquially speaking, this also applies to the counter wrap-around special case before > after. get_device_system_crosststamp() currently uses cycle_between() at the first call site to decide whether to interpolate for older counter readings. get_device_system_crosststamp() has the following problem with cycle_between() testing against an open interval: Assume that, by chance, cycles == tk->tkr_mono.cycle_last (in the following, "cycle_last" for brevity). Then, cycle_between() at the first call site, with effective argument values cycle_between(cycle_last, cycles, now), returns false, enabling interpolation. During interpolation, get_device_system_crosststamp() will then call cycle_between() at the second call site (if a history_begin was supplied). The effective argument values are cycle_between(history_begin->cycles, cycles, cycles), since system_counterval.cycles == interval_start == cycles, per the assumption. Due to the test against the open interval, cycle_between() returns false again. This causes get_device_system_crosststamp() to return -EINVAL. This failure should be avoided, since get_device_system_crosststamp() works both when cycles follows cycle_last (no interpolation), and when cycles precedes cycle_last (interpolation). For the case cycles == cycle_last, interpolation is actually unneeded. Fix this by changing cycle_between() into timestamp_in_interval(), which now checks against the closed interval, rather than the open interval. This changes the get_device_system_crosststamp() behavior for three corner cases: 1. Bypass interpolation in the case cycles == tk->tkr_mono.cycle_last, fixing the problem described above. 2. At the first timestamp_in_interval() call site, cycles == now no longer causes failure. 3. At the second timestamp_in_interval() call site, history_begin->cycles == system_counterval.cycles no longer causes failure. adjust_historical_crosststamp() also works for this corner case, where partial_history_cycles == total_history_cycles. These behavioral changes should not cause any problems. Fixes: 2c756feb18d9 ("time: Add history to cross timestamp interface supporting slower devices") Signed-off-by: Peter Hilber <peter.hilber@opensynergy.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20231218073849.35294-3-peter.hilber@opensynergy.com
|
#
84dccadd |
|
18-Dec-2023 |
Peter Hilber <peter.hilber@opensynergy.com> |
timekeeping: Fix cross-timestamp interpolation on counter wrap cycle_between() decides whether get_device_system_crosststamp() will interpolate for older counter readings. cycle_between() yields wrong results for a counter wrap-around where after < before < test, and for the case after < test < before. Fix the comparison logic. Fixes: 2c756feb18d9 ("time: Add history to cross timestamp interface supporting slower devices") Signed-off-by: Peter Hilber <peter.hilber@opensynergy.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: John Stultz <jstultz@google.com> Link: https://lore.kernel.org/r/20231218073849.35294-2-peter.hilber@opensynergy.com
|
#
4b7f5212 |
|
31-Jan-2024 |
Peter Hilber <peter.hilber@opensynergy.com> |
timekeeping: Evaluate system_counterval_t.cs_id instead of .cs Clocksource pointers can be problematic to obtain for drivers which are not clocksource drivers themselves. In particular, the RFC virtio_rtc driver [1] would require a new helper function to obtain a pointer to the ARM Generic Timer clocksource. The ptp_kvm driver also required a similar workaround. Address this by evaluating the clocksource ID, rather than the clocksource pointer, of struct system_counterval_t. By this, setting the clocksource pointer becomes unneeded, and get_device_system_crosststamp() callers will no longer need to supply clocksource pointers. All relevant clocksource drivers provide the ID, so this change is not changing the behaviour. [1] https://lore.kernel.org/lkml/20231218073849.35294-1-peter.hilber@opensynergy.com/ Signed-off-by: Peter Hilber <peter.hilber@opensynergy.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20240201010453.2212371-7-peter.hilber@opensynergy.com
|
#
d16317de |
|
18-May-2023 |
Peter Zijlstra <peterz@infradead.org> |
seqlock/latch: Provide raw_read_seqcount_latch_retry() The read side of seqcount_latch consists of: do { seq = raw_read_seqcount_latch(&latch->seq); ... } while (read_seqcount_latch_retry(&latch->seq, seq)); which is asymmetric in the raw_ department, and sure enough, read_seqcount_latch_retry() includes (explicit) instrumentation where raw_read_seqcount_latch() does not. This inconsistency becomes a problem when trying to use it from noinstr code. As such, fix it by renaming and re-implementing raw_read_seqcount_latch_retry() without the instrumentation. Specifically the instrumentation in question is kcsan_atomic_next(0) in do___read_seqcount_retry(). Loosing this annotation is not a problem because raw_read_seqcount_latch() does not pass through kcsan_atomic_next(KCSAN_SEQLOCK_REGION_MAX). Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Petr Mladek <pmladek@suse.com> Tested-by: Michael Kelley <mikelley@microsoft.com> # Hyper-V Link: https://lore.kernel.org/r/20230519102715.233598176@infradead.org
|
#
158009f1 |
|
26-Apr-2023 |
Geert Uytterhoeven <geert+renesas@glider.be> |
timekeeping: Fix references to nonexistent ktime_get_fast_ns() There was never a function named ktime_get_fast_ns(). Presumably these should refer to ktime_get_mono_fast_ns() instead. Fixes: c1ce406e80fb15fa ("timekeeping: Fix up function documentation for the NMI safe accessors") Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: John Stultz <jstultz@google.com> Link: https://lore.kernel.org/r/06df7b3cbd94f016403bbf6cd2b38e4368e7468f.1682516546.git.geert+renesas@glider.be
|
#
f3cb8080 |
|
02-Jan-2023 |
Randy Dunlap <rdunlap@infradead.org> |
time: Fix various kernel-doc problems Clean up kernel-doc complaints about function names and non-kernel-doc comments in kernel/time/. Fixes these warnings: kernel/time/time.c:479: warning: expecting prototype for set_normalized_timespec(). Prototype was for set_normalized_timespec64() instead kernel/time/time.c:553: warning: expecting prototype for msecs_to_jiffies(). Prototype was for __msecs_to_jiffies() instead kernel/time/timekeeping.c:1595: warning: contents before sections kernel/time/timekeeping.c:1705: warning: This comment starts with '/**', but isn't a kernel-doc comment. * We have three kinds of time sources to use for sleep time kernel/time/timekeeping.c:1726: warning: This comment starts with '/**', but isn't a kernel-doc comment. * 1) can be determined whether to use or not only when doing kernel/time/tick-oneshot.c:21: warning: missing initial short description on line: * tick_program_event kernel/time/tick-oneshot.c:107: warning: expecting prototype for tick_check_oneshot_mode(). Prototype was for tick_oneshot_mode_active() instead Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20230103032849.12723-1-rdunlap@infradead.org
|
#
b8ac29b4 |
|
17-Jul-2022 |
Jason A. Donenfeld <Jason@zx2c4.com> |
timekeeping: contribute wall clock to rng on time change The rng's random_init() function contributes the real time to the rng at boot time, so that events can at least start in relation to something particular in the real world. But this clock might not yet be set that point in boot, so nothing is contributed. In addition, the relation between minor clock changes from, say, NTP, and the cycle counter is potentially useful entropic data. This commit addresses this by mixing in a time stamp on calls to settimeofday and adjtimex. No entropy is credited in doing so, so it doesn't make initialization faster, but it is still useful input to have. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Cc: stable@vger.kernel.org Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
#
1366992e |
|
10-Apr-2022 |
Jason A. Donenfeld <Jason@zx2c4.com> |
timekeeping: Add raw clock fallback for random_get_entropy() The addition of random_get_entropy_fallback() provides access to whichever time source has the highest frequency, which is useful for gathering entropy on platforms without available cycle counters. It's not necessarily as good as being able to quickly access a cycle counter that the CPU has, but it's still something, even when it falls back to being jiffies-based. In the event that a given arch does not define get_cycles(), falling back to the get_cycles() default implementation that returns 0 is really not the best we can do. Instead, at least calling random_get_entropy_fallback() would be preferable, because that always needs to return _something_, even falling back to jiffies eventually. It's not as though random_get_entropy_fallback() is super high precision or guaranteed to be entropic, but basically anything that's not zero all the time is better than returning zero all the time. Finally, since random_get_entropy_fallback() is used during extremely early boot when randomizing freelists in mm_init(), it can be called before timekeeping has been initialized. In that case there really is nothing we can do; jiffies hasn't even started ticking yet. So just give up and return 0. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Theodore Ts'o <tytso@mit.edu>
|
#
90be8d6c |
|
15-Apr-2022 |
Thomas Gleixner <tglx@linutronix.de> |
timekeeping: Consolidate fast timekeeper Provide a inline function which replaces the copy & pasta. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20220415091921.072296632@linutronix.de
|
#
eff4849f |
|
15-Apr-2022 |
Thomas Gleixner <tglx@linutronix.de> |
timekeeping: Annotate ktime_get_boot_fast_ns() with data_race() Accessing timekeeper::offset_boot in ktime_get_boot_fast_ns() is an intended data race as the reader side cannot synchronize with a writer and there is no space in struct tk_read_base of the NMI safe timekeeper. Mark it so. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20220415091920.956045162@linutronix.de
|
#
3dc6ffae |
|
14-Apr-2022 |
Kurt Kanzenbach <kurt@linutronix.de> |
timekeeping: Introduce fast accessor to clock tai Introduce fast/NMI safe accessor to clock tai for tracing. The Linux kernel tracing infrastructure has support for using different clocks to generate timestamps for trace events. Especially in TSN networks it's useful to have TAI as trace clock, because the application scheduling is done in accordance to the network time, which is based on TAI. With a tai trace_clock in place, it becomes very convenient to correlate network activity with Linux kernel application traces. Use the same implementation as ktime_get_boot_fast_ns() does by reading the monotonic time and adding the TAI offset. The same limitations as for the fast boot implementation apply. The TAI offset may change at run time e.g., by setting the time or using adjtimex() with an offset. However, these kind of offset changes are rare events. Nevertheless, the user has to be aware and deal with it in post processing. An alternative approach would be to use the same implementation as ktime_get_real_fast_ns() does. However, this requires to add an additional u64 member to the tk_read_base struct. This struct together with a seqcount is designed to fit into a single cache line on 64 bit architectures. Adding a new member would violate this constraint. Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Steven Rostedt <rostedt@goodmis.org> Link: https://lore.kernel.org/r/20220414091805.89667-2-kurt@linutronix.de
|
#
2c33d775 |
|
28-Apr-2022 |
Kurt Kanzenbach <kurt@linutronix.de> |
timekeeping: Mark NMI safe time accessors as notrace Mark the CLOCK_MONOTONIC fast time accessors as notrace. These functions are used in tracing to retrieve timestamps, so they should not recurse. Fixes: 4498e7467e9e ("time: Parametrize all tk_fast_mono users") Fixes: f09cb9a1808e ("time: Introduce tk_fast_raw") Reported-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20220426175338.3807ca4f@gandalf.local.home/ Link: https://lore.kernel.org/r/20220428062432.61063-1-kurt@linutronix.de
|
#
4e8c11b6 |
|
13-Dec-2021 |
Yu Liao <liaoyu15@huawei.com> |
timekeeping: Really make sure wall_to_monotonic isn't positive Even after commit e1d7ba873555 ("time: Always make sure wall_to_monotonic isn't positive") it is still possible to make wall_to_monotonic positive by running the following code: int main(void) { struct timespec time; clock_gettime(CLOCK_MONOTONIC, &time); time.tv_nsec = 0; clock_settime(CLOCK_REALTIME, &time); return 0; } The reason is that the second parameter of timespec64_compare(), ts_delta, may be unnormalized because the delta is calculated with an open coded substraction which causes the comparison of tv_sec to yield the wrong result: wall_to_monotonic = { .tv_sec = -10, .tv_nsec = 900000000 } ts_delta = { .tv_sec = -9, .tv_nsec = -900000000 } That makes timespec64_compare() claim that wall_to_monotonic < ts_delta, but actually the result should be wall_to_monotonic > ts_delta. After normalization, the result of timespec64_compare() is correct because the tv_sec comparison is not longer misleading: wall_to_monotonic = { .tv_sec = -10, .tv_nsec = 900000000 } ts_delta = { .tv_sec = -10, .tv_nsec = 100000000 } Use timespec64_sub() to ensure that ts_delta is normalized, which fixes the issue. Fixes: e1d7ba873555 ("time: Always make sure wall_to_monotonic isn't positive") Signed-off-by: Yu Liao <liaoyu15@huawei.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20211213135727.1656662-1-liaoyu15@huawei.com
|
#
17a1b882 |
|
13-Jul-2021 |
Thomas Gleixner <tglx@linutronix.de> |
hrtimer: Add bases argument to clock_was_set() clock_was_set() unconditionaly invokes retrigger_next_event() on all online CPUs. This was necessary because that mechanism was also used for resume from suspend to idle which is not longer the case. The bases arguments allows the callers of clock_was_set() to hand in a mask which tells clock_was_set() which of the hrtimer clock bases are affected by the clock setting. This mask will be used in the next step to check whether a CPU base has timers queued on a clock base affected by the event and avoid the SMP function call if there are none. Add a @bases argument, provide defines for the active bases masking and fixup all callsites. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20210713135158.691083465@linutronix.de
|
#
1b267793 |
|
13-Jul-2021 |
Thomas Gleixner <tglx@linutronix.de> |
time/timekeeping: Avoid invoking clock_was_set() twice do_adjtimex() might end up scheduling a delayed clock_was_set() via timekeeping_advance() and then invoke clock_was_set() directly which is pointless. Make timekeeping_advance() return whether an invocation of clock_was_set() is required and handle it at the call sites which allows do_adjtimex() to issue a single direct call if required. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20210713135158.580966888@linutronix.de
|
#
a761a67f |
|
13-Jul-2021 |
Thomas Gleixner <tglx@linutronix.de> |
timekeeping: Distangle resume and clock-was-set events Resuming timekeeping is a clock-was-set event and uses the clock-was-set notification mechanism. This is in the way of making the clock-was-set update for hrtimers selective so unnecessary IPIs are avoided when a CPU base does not have timers queued which are affected by the clock setting. Distangle it by invoking hrtimer_resume() on each unfreezing CPU and invoke the new timerfd_resume() function from timekeeping_resume() which is the only place where this is needed. Rename hrtimer_resume() to hrtimer_resume_local() to reflect the change. With this the clock_was_set*() functions are not longer required to IPI all CPUs unconditionally and can get some smarts to avoid them. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20210713135158.488853478@linutronix.de
|
#
b2c67cbe |
|
08-Dec-2020 |
Thomas Gleixner <tglx@linutronix.de> |
time: Add mechanism to recognize clocksource in time_get_snapshot System time snapshots are not conveying information about the current clocksource which was used, but callers like the PTP KVM guest implementation have the requirement to evaluate the clocksource type to select the appropriate mechanism. Introduce a clocksource id field in struct clocksource which is by default set to CSID_GENERIC (0). Clocksource implementations can set that field to a value which allows to identify the clocksource. Store the clocksource id of the current clocksource in the system_time_snapshot so callers can evaluate which clocksource was used to take the snapshot and act accordingly. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Jianyong Wu <jianyong.wu@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20201209060932.212364-5-jianyong.wu@arm.com
|
#
d4c7c288 |
|
11-Feb-2021 |
Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se> |
timekeeping: Allow runtime PM from change_clocksource() The struct clocksource callbacks enable() and disable() are described as a way to allow clock sources to enter a power save mode. See commit 4614e6adafa2 ("clocksource: add enable() and disable() callbacks") But using runtime PM from these callbacks triggers a cyclic lockdep warning when switching clock source using change_clocksource(). # echo e60f0000.timer > /sys/devices/system/clocksource/clocksource0/current_clocksource ====================================================== WARNING: possible circular locking dependency detected ------------------------------------------------------ migration/0/11 is trying to acquire lock: ffff0000403ed220 (&dev->power.lock){-...}-{2:2}, at: __pm_runtime_resume+0x40/0x74 but task is already holding lock: ffff8000113c8f88 (tk_core.seq.seqcount){----}-{0:0}, at: multi_cpu_stop+0xa4/0x190 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (tk_core.seq.seqcount){----}-{0:0}: ktime_get+0x28/0xa0 hrtimer_start_range_ns+0x210/0x2dc generic_sched_clock_init+0x70/0x88 sched_clock_init+0x40/0x64 start_kernel+0x494/0x524 -> #1 (hrtimer_bases.lock){-.-.}-{2:2}: hrtimer_start_range_ns+0x68/0x2dc rpm_suspend+0x308/0x5dc rpm_idle+0xc4/0x2a4 pm_runtime_work+0x98/0xc0 process_one_work+0x294/0x6f0 worker_thread+0x70/0x45c kthread+0x154/0x160 ret_from_fork+0x10/0x20 -> #0 (&dev->power.lock){-...}-{2:2}: _raw_spin_lock_irqsave+0x7c/0xc4 __pm_runtime_resume+0x40/0x74 sh_cmt_start+0x1c4/0x260 sh_cmt_clocksource_enable+0x28/0x50 change_clocksource+0x9c/0x160 multi_cpu_stop+0xa4/0x190 cpu_stopper_thread+0x90/0x154 smpboot_thread_fn+0x244/0x270 kthread+0x154/0x160 ret_from_fork+0x10/0x20 other info that might help us debug this: Chain exists of: &dev->power.lock --> hrtimer_bases.lock --> tk_core.seq.seqcount Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(tk_core.seq.seqcount); lock(hrtimer_bases.lock); lock(tk_core.seq.seqcount); lock(&dev->power.lock); *** DEADLOCK *** 2 locks held by migration/0/11: #0: ffff8000113c9278 (timekeeper_lock){-.-.}-{2:2}, at: change_clocksource+0x2c/0x160 #1: ffff8000113c8f88 (tk_core.seq.seqcount){----}-{0:0}, at: multi_cpu_stop+0xa4/0x190 Rework change_clocksource() so it enables the new clocksource and disables the old clocksource outside of the timekeeper_lock and seqcount write held region. There is no requirement that these callbacks are invoked from the lock held region. Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Link: https://lore.kernel.org/r/20210211134318.323910-1-niklas.soderlund+renesas@ragnatech.se
|
#
4bf07f65 |
|
22-Mar-2021 |
Ingo Molnar <mingo@kernel.org> |
timekeeping, clocksource: Fix various typos in comments Fix ~56 single-word typos in timekeeping & clocksource code comments. Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: John Stultz <john.stultz@linaro.org> Cc: Stephen Boyd <sboyd@kernel.org> Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: linux-kernel@vger.kernel.org
|
#
aba428a0 |
|
01-Dec-2020 |
Chunguang Xu <brookxu@tencent.com> |
timekeeping: Remove unused get_seconds() The get_seconds() cleanup seems to have been completed, now it is time to delete the legacy interface to avoid misuse later. Signed-off-by: Chunguang Xu <brookxu@tencent.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/1606816351-26900-1-git-send-email-brookxu@tencent.com
|
#
6e5a9190 |
|
13-Nov-2020 |
Alex Shi <alex.shi@linux.alibaba.com> |
timekeeping: Address parameter documentation issues for various functions The kernel-doc parser complains: kernel/time/timekeeping.c:1543: warning: Function parameter or member 'ts' not described in 'read_persistent_clock64' kernel/time/timekeeping.c:764: warning: Function parameter or member 'tk' not described in 'timekeeping_forward_now' kernel/time/timekeeping.c:1331: warning: Function parameter or member 'ts' not described in 'timekeeping_inject_offset' kernel/time/timekeeping.c:1331: warning: Excess function parameter 'tv' description in 'timekeeping_inject_offset' Add the missing parameter documentations and rename the 'tv' parameter of timekeeping_inject_offset() to 'ts' so it matches the implemention. [ tglx: Reworded a few docs and massaged changelog ] Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/1605252275-63652-5-git-send-email-alex.shi@linux.alibaba.com
|
#
29efc461 |
|
13-Nov-2020 |
Alex Shi <alex.shi@linux.alibaba.com> |
timekeeping: Fix parameter docs of read_persistent_wall_and_boot_offset() Address the following kernel-doc markup warnings: kernel/time/timekeeping.c:1563: warning: Function parameter or member 'wall_time' not described in 'read_persistent_wall_and_boot_offset' kernel/time/timekeeping.c:1563: warning: Function parameter or member 'boot_offset' not described in 'read_persistent_wall_and_boot_offset' The parameters are described but miss the leading '@' and the colon after the parameter names. [ tglx: Massaged changelog ] Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/1605252275-63652-6-git-send-email-alex.shi@linux.alibaba.com
|
#
f27f7c3f |
|
13-Nov-2020 |
Alex Shi <alex.shi@linux.alibaba.com> |
timekeeping: Add missing parameter docs for pvclock_gtod_[un]register_notifier() The kernel-doc parser complains about: kernel/time/timekeeping.c:651: warning: Function parameter or member 'nb' not described in 'pvclock_gtod_register_notifier' kernel/time/timekeeping.c:670: warning: Function parameter or member 'nb' not described in 'pvclock_gtod_unregister_notifier' Add the missing parameter explanations. [ tglx: Massaged changelog ] Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/1605252275-63652-3-git-send-email-alex.shi@linux.alibaba.com
|
#
c1ce406e |
|
15-Nov-2020 |
Thomas Gleixner <tglx@linutronix.de> |
timekeeping: Fix up function documentation for the NMI safe accessors Alex reported the following warning: kernel/time/timekeeping.c:464: warning: Function parameter or member 'tkf' not described in '__ktime_get_fast_ns' which is not entirely correct because the documented function is ktime_get_mono_fast_ns() which does not have a parameter, but the kernel-doc parser looks at the function declaration which follows the comment and complains about the missing parameter documentation. Aside of that the documentation for the rest of the NMI safe accessors is either incomplete or missing. - Move the function documentation to the right place - Fixup the references and inconsistencies - Add the missing documentation for ktime_get_raw_fast_ns() Reported-by: Alex Shi <alex.shi@linux.alibaba.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
e025b031 |
|
13-Nov-2020 |
Alex Shi <alex.shi@linux.alibaba.com> |
timekeeping: Add missing parameter documentation for update_fast_timekeeper() Address the following warning: kernel/time/timekeeping.c:415: warning: Function parameter or member 'tkf' not described in 'update_fast_timekeeper' [ tglx: Remove the bogus ktime_get_mono_fast_ns() part ] Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/1605252275-63652-2-git-send-email-alex.shi@linux.alibaba.com
|
#
199d280c |
|
13-Nov-2020 |
Alex Shi <alex.shi@linux.alibaba.com> |
timekeeping: Remove static functions from kernel-doc markup Various static functions in the timekeeping code have function comments which pretend to be kernel-doc, but are incomplete and trigger parser warnings. As these functions are local to the timekeeping core code there is no need to expose them via kernel-doc. Remove the double star kernel-doc marker and remove excess newlines. [ tglx: Massaged changelog and removed excess newlines ] Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/1605252275-63652-4-git-send-email-alex.shi@linux.alibaba.com
|
#
56cc7b8a |
|
24-Sep-2020 |
Arnd Bergmann <arnd@arndb.de> |
timekeeping: remove xtime_update There are no more users of xtime_update aside from legacy_timer_tick(), so fold it into that function and remove the declaration. update_process_times() is now only called inside of the kernel/time/ code, so the declaration can be moved there. Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org> Tested-by: Geert Uytterhoeven <geert@linux-m68k.org> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
#
77f6c0b8 |
|
23-Sep-2020 |
Arnd Bergmann <arnd@arndb.de> |
timekeeping: remove arch_gettimeoffset With Arm EBSA110 gone, nothing uses it any more, so the corresponding code and the Kconfig option can be removed. Acked-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
#
249d0538 |
|
27-Aug-2020 |
Ahmed S. Darwish <a.darwish@linutronix.de> |
timekeeping: Use seqcount_latch_t Latch sequence counters are a multiversion concurrency control mechanism where the seqcount_t counter even/odd value is used to switch between two data storage copies. This allows the seqcount_t read path to safely interrupt its write side critical section (e.g. from NMIs). Initially, latch sequence counters were implemented as a single write function, raw_write_seqcount_latch(), above plain seqcount_t. The read path was expected to use plain seqcount_t raw_read_seqcount(). A specialized read function was later added, raw_read_seqcount_latch(), and became the standardized way for latch read paths. Having unique read and write APIs meant that latch sequence counters are basically a data type of their own -- just inappropriately overloading plain seqcount_t. The seqcount_latch_t data type was thus introduced at seqlock.h. Use that new data type instead of seqcount_raw_spinlock_t. This ensures that only latch-safe APIs are to be used with the sequence counter. Note that the use of seqcount_raw_spinlock_t was not very useful in the first place. Only the "raw_" subset of seqcount_t APIs were used at timekeeping.c. This subset was created for contexts where lockdep cannot be used. seqcount_LOCKTYPE_t's raison d'être -- verifying that the seqcount_t writer serialization lock is held -- cannot thus be done. References: 0c3351d451ae ("seqlock: Use raw_ prefix instead of _no_lockdep") References: 55f3560df975 ("seqlock: Extend seqcount API with associated locks") Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200827114044.11173-6-a.darwish@linutronix.de
|
#
e2d977c9 |
|
13-Aug-2020 |
Thomas Gleixner <tglx@linutronix.de> |
timekeeping: Provide multi-timestamp accessor to NMI safe timekeeper printk wants to store various timestamps (MONOTONIC, REALTIME, BOOTTIME) to make correlation of dmesg from several systems easier. Provide an interface to retrieve all three timestamps in one go. There are some caveats: 1) Boot time and late sleep time injection Boot time is a racy access on 32bit systems if the sleep time injection happens late during resume and not in timekeeping_resume(). That could be avoided by expanding struct tk_read_base with boot offset for 32bit and adding more overhead to the update. As this is a hard to observe once per resume event which can be filtered with reasonable effort using the accurate mono/real timestamps, it's probably not worth the trouble. Aside of that it might be possible on 32 and 64 bit to observe the following when the sleep time injection happens late: CPU 0 CPU 1 timekeeping_resume() ktime_get_fast_timestamps() mono, real = __ktime_get_real_fast() inject_sleep_time() update boot offset boot = mono + bootoffset; That means that boot time already has the sleep time adjustment, but real time does not. On the next readout both are in sync again. Preventing this for 64bit is not really feasible without destroying the careful cache layout of the timekeeper because the sequence count and struct tk_read_base would then need two cache lines instead of one. 2) Suspend/resume timestamps Access to the time keeper clock source is disabled accross the innermost steps of suspend/resume. The accessors still work, but the timestamps are frozen until time keeping is resumed which happens very early. For regular suspend/resume there is no observable difference vs. sched clock, but it might affect some of the nasty low level debug printks. OTOH, access to sched clock is not guaranteed accross suspend/resume on all systems either so it depends on the hardware in use. If that turns out to be a real problem then this could be mitigated by using sched clock in a similar way as during early boot. But it's not as trivial as on early boot because it needs some careful protection against the clock monotonic timestamp jumping backwards on resume. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20200814115512.159981360@linutronix.de
|
#
71419b30 |
|
13-Aug-2020 |
Thomas Gleixner <tglx@linutronix.de> |
timekeeping: Utilize local_clock() for NMI safe timekeeper during early boot During early boot the NMI safe timekeeper returns 0 until the first clocksource becomes available. This prevents it from being used for printk or other facilities which today use sched clock. sched clock can be available way before timekeeping is initialized. The obvious workaround for this is to utilize the early sched clock in the default dummy clock read function until a clocksource becomes available. After switching to the clocksource clock MONOTONIC and BOOTTIME will not jump because the timekeeping_init() bases clock MONOTONIC on sched clock and the offset between clock MONOTONIC and BOOTTIME is zero during boot. Clock REALTIME cannot provide useful timestamps during early boot up to the point where a persistent clock becomes available, which is either in timekeeping_init() or later when the RTC driver which might depend on I2C or other subsystems is initialized. There is a minor difference to sched_clock() vs. suspend/resume. As the timekeeper clock source might not be accessible during suspend, after timekeeping_suspend() timestamps freeze up to the point where timekeeping_resume() is invoked. OTOH this is true for some sched clock implementations as well. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20200814115512.041422402@linutronix.de
|
#
b0294f30 |
|
06-Aug-2020 |
Randy Dunlap <rdunlap@infradead.org> |
time: Delete repeated words in comments Drop repeated words in kernel/time/. {when, one, into} Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: John Stultz <john.stultz@linaro.org> Link: https://lore.kernel.org/r/20200807033248.8452-1-rdunlap@infradead.org
|
#
19d0070a |
|
04-Aug-2020 |
Thomas Gleixner <tglx@linutronix.de> |
timekeeping/vsyscall: Provide vdso_update_begin/end() Architectures can have the requirement to add additional architecture specific data to the VDSO data page which needs to be updated independent of the timekeeper updates. To protect these updates vs. concurrent readers and a conflicting update through timekeeping, provide helper functions to make such updates safe. vdso_update_begin() takes the timekeeper_lock to protect against a potential update from timekeeper code and increments the VDSO sequence count to signal data inconsistency to concurrent readers. vdso_update_end() makes the sequence count even again to signal data consistency and drops the timekeeper lock. [ Sven: Add interrupt disable handling to the functions ] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20200804150124.41692-3-svens@linux.ibm.com
|
#
025e82bc |
|
20-Jul-2020 |
Ahmed S. Darwish <a.darwish@linutronix.de> |
timekeeping: Use sequence counter with associated raw spinlock A sequence counter write side critical section must be protected by some form of locking to serialize writers. A plain seqcount_t does not contain the information of which lock must be held when entering a write side critical section. Use the new seqcount_raw_spinlock_t data type, which allows to associate a raw spinlock with the sequence counter. This enables lockdep to verify that the raw spinlock used for writer serialization is held when the write side critical section is entered. If lockdep is disabled this lock association is compiled out and has neither storage size nor runtime overhead. Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200720155530.1173732-18-a.darwish@linutronix.de
|
#
46132e3a |
|
01-Jul-2020 |
Paul Gortmaker <paul.gortmaker@windriver.com> |
sched: nohz: stop passing around unused "ticks" parameter. The "ticks" parameter was added in commit 0f004f5a696a ("sched: Cure more NO_HZ load average woes") since calc_global_nohz() was called and needed the "ticks" argument. But in commit c308b56b5398 ("sched: Fix nohz load accounting -- again!") it became unused as the function calc_global_nohz() dropped using "ticks". Fixes: c308b56b5398 ("sched: Fix nohz load accounting -- again!") Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/1593628458-32290-1-git-send-email-paul.gortmaker@windriver.com
|
#
865d3a9a |
|
21-Apr-2020 |
Thomas Gleixner <tglx@linutronix.de> |
x86/mce: Address objtools noinstr complaints Mark the relevant functions noinstr, use the plain non-instrumented MSR accessors. The only odd part is the instrumentation_begin()/end() pair around the indirect machine_check_vector() call as objtool can't figure that out. The possible invoked functions are annotated correctly. Also use notrace variant of nmi_enter/exit(). If MCEs happen then hardware latency tracing is the least of the worries. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Andy Lutomirski <luto@kernel.org> Link: https://lkml.kernel.org/r/20200505135315.476734898@linutronix.de
|
#
e5d4d175 |
|
20-Mar-2020 |
Thomas Gleixner <tglx@linutronix.de> |
timekeeping: Split jiffies seqlock seqlock consists of a sequence counter and a spinlock_t which is used to serialize the writers. spinlock_t is substituted by a "sleeping" spinlock on PREEMPT_RT enabled kernels which breaks the usage in the timekeeping code as the writers are executed in hard interrupt and therefore non-preemptible context even on PREEMPT_RT. The spinlock in seqlock cannot be unconditionally replaced by a raw_spinlock_t as many seqlock users have nesting spinlock sections or other code which is not suitable to run in truly atomic context on RT. Instead of providing a raw_seqlock API for a single use case, open code the seqlock for the jiffies use case and implement it with a raw_spinlock_t and a sequence counter. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200321113242.120587764@linutronix.de
|
#
4cbbc3a0 |
|
20-Jan-2020 |
Wen Yang <wenyang@linux.alibaba.com> |
timekeeping: Prevent 32bit truncation in scale64_check_overflow() While unlikely the divisor in scale64_check_overflow() could be >= 32bit in scale64_check_overflow(). do_div() truncates the divisor to 32bit at least on 32bit platforms. Use div64_u64() instead to avoid the truncation to 32-bit. [ tglx: Massaged changelog ] Signed-off-by: Wen Yang <wenyang@linux.alibaba.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20200120100523.45656-1-wenyang@linux.alibaba.com
|
#
b99328a6 |
|
22-Aug-2019 |
Thomas Gleixner <tglx@linutronix.de> |
timekeeping/vsyscall: Prevent math overflow in BOOTTIME update The VDSO update for CLOCK_BOOTTIME has a overflow issue as it shifts the nanoseconds based boot time offset left by the clocksource shift. That overflows once the boot time offset becomes large enough. As a consequence CLOCK_BOOTTIME in the VDSO becomes a random number causing applications to misbehave. Fix it by storing a timespec64 representation of the offset when boot time is adjusted and add that to the MONOTONIC base time value in the vdso data page. Using the timespec64 representation avoids a 64bit division in the update code. Fixes: 44f57d788e7d ("timekeeping: Provide a generic update_vsyscall() implementation") Reported-by: Chris Clayton <chris2553@googlemail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Chris Clayton <chris2553@googlemail.com> Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1908221257580.1983@nanos.tec.linutronix.de
|
#
0354c1a3 |
|
21-Jun-2019 |
Jason A. Donenfeld <Jason@zx2c4.com> |
timekeeping: Use proper ktime_add when adding nsecs in coarse offset While this doesn't actually amount to a real difference, since the macro evaluates to the same thing, every place else operates on ktime_t using these functions, so let's not break the pattern. Fixes: e3ff9c3678b4 ("timekeeping: Repair ktime_get_coarse*() granularity") Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Link: https://lkml.kernel.org/r/20190621203249.3909-1-Jason@zx2c4.com
|
#
e3ff9c36 |
|
13-Jun-2019 |
Thomas Gleixner <tglx@linutronix.de> |
timekeeping: Repair ktime_get_coarse*() granularity Jason reported that the coarse ktime based time getters advance only once per second and not once per tick as advertised. The code reads only the monotonic base time, which advances once per second. The nanoseconds are accumulated on every tick in xtime_nsec up to a second and the regular time getters take this nanoseconds offset into account, but the ktime_get_coarse*() implementation fails to do so. Add the accumulated xtime_nsec value to the monotonic base time to get the proper per tick advancing coarse tinme. Fixes: b9ff604cff11 ("timekeeping: Add ktime_get_coarse_with_offset") Reported-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Jason A. Donenfeld <Jason@zx2c4.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Clemens Ladisch <clemens@ladisch.de> Cc: Sultan Alsawaf <sultan@kerneltoast.com> Cc: Waiman Long <longman@redhat.com> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1906132136280.1791@nanos.tec.linutronix.de
|
#
7e8eda73 |
|
10-Apr-2019 |
Ondrej Mosnacek <omosnace@redhat.com> |
ntp: Audit NTP parameters adjustment Emit an audit record every time selected NTP parameters are modified from userspace (via adjtimex(2) or clock_adjtime(2)). These parameters may be used to indirectly change system clock, and thus their modifications should be audited. Such events will now generate records of type AUDIT_TIME_ADJNTPVAL containing the following fields: - op -- which value was adjusted: - offset -- corresponding to the time_offset variable - freq -- corresponding to the time_freq variable - status -- corresponding to the time_status variable - adjust -- corresponding to the time_adjust variable - tick -- corresponding to the tick_usec variable - tai -- corresponding to the timekeeping's TAI offset - old -- the old value - new -- the new value Example records: type=TIME_ADJNTPVAL msg=audit(1530616044.507:7): op=status old=64 new=8256 type=TIME_ADJNTPVAL msg=audit(1530616044.511:11): op=freq old=0 new=49180377088000 The records of this type will be associated with the corresponding syscall records. An overview of parameter changes that can be done via do_adjtimex() (based on information from Miroslav Lichvar) and whether they are audited: __timekeeping_set_tai_offset() -- sets the offset from the International Atomic Time (AUDITED) NTP variables: time_offset -- can adjust the clock by up to 0.5 seconds per call and also speed it up or slow down by up to about 0.05% (43 seconds per day) (AUDITED) time_freq -- can speed up or slow down by up to about 0.05% (AUDITED) time_status -- can insert/delete leap seconds and it also enables/ disables synchronization of the hardware real-time clock (AUDITED) time_maxerror, time_esterror -- change error estimates used to inform userspace applications (NOT AUDITED) time_constant -- controls the speed of the clock adjustments that are made when time_offset is set (NOT AUDITED) time_adjust -- can temporarily speed up or slow down the clock by up to 0.05% (AUDITED) tick_usec -- a more extreme version of time_freq; can speed up or slow down the clock by up to 10% (AUDITED) Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com> Reviewed-by: Richard Guy Briggs <rgb@redhat.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Paul Moore <paul@paul-moore.com>
|
#
2d87a067 |
|
10-Apr-2019 |
Ondrej Mosnacek <omosnace@redhat.com> |
timekeeping: Audit clock adjustments Emit an audit record whenever the system clock is changed (i.e. shifted by a non-zero offset) by a syscall from userspace. The syscalls than can (at the time of writing) trigger such record are: - settimeofday(2), stime(2), clock_settime(2) -- via do_settimeofday64() - adjtimex(2), clock_adjtime(2) -- via do_adjtimex() The new records have type AUDIT_TIME_INJOFFSET and contain the following fields: - sec -- the 'seconds' part of the offset - nsec -- the 'nanoseconds' part of the offset Example record (time was shifted backwards by ~15.875 seconds): type=TIME_INJOFFSET msg=audit(1530616049.652:13): sec=-16 nsec=124887145 The records of this type will be associated with the corresponding syscall records. Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com> Reviewed-by: Richard Guy Briggs <rgb@redhat.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> [PM: fixed a line width problem in __audit_tk_injoffset()] Signed-off-by: Paul Moore <paul@paul-moore.com>
|
#
7a8e61f8 |
|
23-Mar-2019 |
Thomas Gleixner <tglx@linutronix.de> |
timekeeping: Force upper bound for setting CLOCK_REALTIME Several people reported testing failures after setting CLOCK_REALTIME close to the limits of the kernel internal representation in nanoseconds, i.e. year 2262. The failures are exposed in subsequent operations, i.e. when arming timers or when the advancing CLOCK_MONOTONIC makes the calculation of CLOCK_REALTIME overflow into negative space. Now people start to paper over the underlying problem by clamping calculations to the valid range, but that's just wrong because such workarounds will prevent detection of real issues as well. It is reasonable to force an upper bound for the various methods of setting CLOCK_REALTIME. Year 2262 is the absolute upper bound. Assume a maximum uptime of 30 years which is plenty enough even for esoteric embedded systems. That results in an upper bound of year 2232 for setting the time. Once that limit is reached in reality this limit is only a small part of the problem space. But until then this stops people from trying to paper over the problem at the wrong places. Reported-by: Xiongfeng Wang <wangxiongfeng2@huawei.com> Reported-by: Hongbo Yao <yaohongbo@huawei.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: John Stultz <john.stultz@linaro.org> Cc: Stephen Boyd <sboyd@kernel.org> Cc: Miroslav Lichvar <mlichvar@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1903231125480.2157@nanos.tec.linutronix.de
|
#
e1e41b6c |
|
18-Mar-2019 |
Rasmus Villemoes <linux@rasmusvillemoes.dk> |
timekeeping: Consistently use unsigned int for seqcount snapshot The timekeeping code uses a random mix of "unsigned long" and "unsigned int" for the seqcount snapshots (ratio 14:12). Since the seqlock.h API is entirely based on unsigned int, use that throughout. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Stephen Boyd <sboyd@kernel.org> Link: https://lkml.kernel.org/r/20190318195557.20773-1-linux@rasmusvillemoes.dk
|
#
ead25417 |
|
02-Jul-2018 |
Deepa Dinamani <deepa.kernel@gmail.com> |
timex: use __kernel_timex internally struct timex is not y2038 safe. Replace all uses of timex with y2038 safe __kernel_timex. Note that struct __kernel_timex is an ABI interface definition. We could define a new structure based on __kernel_timex that is only available internally instead. Right now, there isn't a strong motivation for this as the structure is isolated to a few defined struct timex interfaces and such a structure would be exactly the same as struct timex. The patch was generated by the following coccinelle script: virtual patch @depends on patch forall@ identifier ts; expression e; @@ ( - struct timex ts; + struct __kernel_timex ts; | - struct timex ts = {}; + struct __kernel_timex ts = {}; | - struct timex ts = e; + struct __kernel_timex ts = e; | - struct timex *ts; + struct __kernel_timex *ts; | (memset \| copy_from_user \| copy_to_user \)(..., - sizeof(struct timex)) + sizeof(struct __kernel_timex)) ) @depends on patch forall@ identifier ts; identifier fn; @@ fn(..., - struct timex *ts, + struct __kernel_timex *ts, ...) { ... } @depends on patch forall@ identifier ts; identifier fn; @@ fn(..., - struct timex *ts) { + struct __kernel_timex *ts) { ... } Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Cc: linux-alpha@vger.kernel.org Cc: netdev@vger.kernel.org Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
#
92661788 |
|
14-Aug-2018 |
Arnd Bergmann <arnd@arndb.de> |
timekeeping: remove unused {read,update}_persistent_clock After arch/sh has removed the last reference to these functions, we can remove them completely and just rely on the 64-bit time_t based versions. This cleans up a rather ugly use of __weak functions. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: John Stultz <john.stultz@linaro.org>
|
#
ce10a5b3 |
|
28-Nov-2018 |
Bart Van Assche <bvanassche@acm.org> |
timekeeping: Use proper seqcount initializer tk_core.seq is initialized open coded, but that misses to initialize the lockdep map when lockdep is enabled. Lockdep splats involving tk_core seq consequently lack a name and are hard to read. Use the proper initializer which takes care of the lockdep map initialization. [ tglx: Massaged changelog ] Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: peterz@infradead.org Cc: tj@kernel.org Cc: johannes.berg@intel.com Link: https://lkml.kernel.org/r/20181128234325.110011-12-bvanassche@acm.org
|
#
35728b82 |
|
31-Oct-2018 |
Thomas Gleixner <tglx@linutronix.de> |
time: Add SPDX license identifiers Update the time(r) core files files with the correct SPDX license identifier based on the license text in the file itself. The SPDX identifier is a legally binding shorthand, which can be used instead of the full boiler plate text. This work is based on a script and data from Philippe Ombredanne, Kate Stewart and myself. The data has been created with two independent license scanners and manual inspection. The following files do not contain any direct license information and have been omitted from the big initial SPDX changes: timeconst.bc: The .bc files were not touched time.c, timer.c, timekeeping.c: Licence was deduced from EXPORT_SYMBOL_GPL As those files do not contain direct license references they fall under the project license, i.e. GPL V2 only. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Kees Cook <keescook@chromium.org> Acked-by: Ingo Molnar <mingo@kernel.org> Acked-by: John Stultz <john.stultz@linaro.org> Acked-by: Corey Minyard <cminyard@mvista.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Kate Stewart <kstewart@linuxfoundation.org> Cc: Philippe Ombredanne <pombredanne@nexb.com> Cc: Russell King <rmk+kernel@armlinux.org.uk> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Nicolas Pitre <nicolas.pitre@linaro.org> Cc: David Riley <davidriley@chromium.org> Cc: Colin Cross <ccross@android.com> Cc: Mark Brown <broonie@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Link: https://lkml.kernel.org/r/20181031182252.879109557@linutronix.de
|
#
58c5fc2b |
|
31-Oct-2018 |
Thomas Gleixner <tglx@linutronix.de> |
time: Remove useless filenames in top level comments Remove the pointless filenames in the top level comments. They have no value at all and just occupy space. While at it tidy up some of the comments and remove a stale one. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Nicolas Pitre <nico@linaro.org> Acked-by: Kees Cook <keescook@chromium.org> Acked-by: Ingo Molnar <mingo@kernel.org> Acked-by: John Stultz <john.stultz@linaro.org> Acked-by: Corey Minyard <cminyard@mvista.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Kate Stewart <kstewart@linuxfoundation.org> Cc: Philippe Ombredanne <pombredanne@nexb.com> Cc: Peter Anvin <hpa@zytor.com> Cc: Russell King <rmk+kernel@armlinux.org.uk> Cc: Richard Cochran <richardcochran@gmail.com> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: David Riley <davidriley@chromium.org> Cc: Colin Cross <ccross@android.com> Cc: Mark Brown <broonie@kernel.org> Link: https://lkml.kernel.org/r/20181031182252.794898238@linutronix.de
|
#
33e26418 |
|
14-Aug-2018 |
Arnd Bergmann <arnd@arndb.de> |
y2038: make do_gettimeofday() and get_seconds() inline get_seconds() and do_gettimeofday() are only used by a few modules now any more (waiting for the respective patches to get accepted), and they are among the last holdouts of code that is not y2038 safe in the core kernel. Move the implementation into the timekeeping32.h header to clean up the core kernel and isolate the old interfaces further. Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
#
684ad537 |
|
25-Jul-2018 |
Pavel Tatashin <pasha.tatashin@oracle.com> |
timekeeping: Prevent false warning when persistent clock is not available On arches with no persistent clock a message like this is printed during boot: [ 0.000000] Persistent clock returned invalid value The value is not invalid: Zero means that no persistent clock is available and the absence of persistent clock should be quietly accepted. Fixes: 3eca993740b8 ("timekeeping: Replace read_boot_clock64() with read_persistent_wall_and_boot_offset()") Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: steven.sistare@oracle.com Cc: daniel.m.jordan@oracle.com Cc: sboyd@kernel.org Cc: john.stultz@linaro.org Link: https://lkml.kernel.org/r/20180725200018.23722-1-pasha.tatashin@oracle.com
|
#
39232ed5 |
|
17-Jul-2018 |
Baolin Wang <baolin.wang@linaro.org> |
time: Introduce one suspend clocksource to compensate the suspend time On some hardware with multiple clocksources, we have coarse grained clocksources that support the CLOCK_SOURCE_SUSPEND_NONSTOP flag, but which are less than ideal for timekeeping whereas other clocksources can be better candidates but halt on suspend. Currently, the timekeeping core only supports timing suspend using CLOCK_SOURCE_SUSPEND_NONSTOP clocksources if that clocksource is the current clocksource for timekeeping. As a result, some architectures try to implement read_persistent_clock64() using those non-stop clocksources, but isn't really ideal, which will introduce more duplicate code. To fix this, provide logic to allow a registered SUSPEND_NONSTOP clocksource, which isn't the current clocksource, to be used to calculate the suspend time. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Miroslav Lichvar <mlichvar@redhat.com> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Stephen Boyd <sboyd@kernel.org> Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org> Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Baolin Wang <baolin.wang@linaro.org> [jstultz: minor tweaks to merge with previous resume changes] Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
f473e5f4 |
|
16-Jul-2018 |
Mukesh Ojha <mojha@codeaurora.org> |
time: Fix extra sleeptime injection when suspend fails Currently, there exists a corner case assuming when there is only one clocksource e.g RTC, and system failed to go to suspend mode. While resume rtc_resume() injects the sleeptime as timekeeping_rtc_skipresume() returned 'false' (default value of sleeptime_injected) due to which we can see mismatch in timestamps. This issue can also come in a system where more than one clocksource are present and very first suspend fails. Success case: ------------ {sleeptime_injected=false} rtc_suspend() => timekeeping_suspend() => timekeeping_resume() => (sleeptime injected) rtc_resume() Failure case: ------------ {failure in sleep path} {sleeptime_injected=false} rtc_suspend() => rtc_resume() {sleeptime injected again which was not required as the suspend failed} Fix this by handling the boolean logic properly. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Miroslav Lichvar <mlichvar@redhat.com> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Stephen Boyd <sboyd@kernel.org> Originally-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Mukesh Ojha <mojha@codeaurora.org> Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
985e6950 |
|
13-Jul-2018 |
Ondrej Mosnacek <omosnace@redhat.com> |
timekeeping/ntp: Constify some function arguments Add 'const' to some function arguments and variables to make it easier to read the code. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Miroslav Lichvar <mlichvar@redhat.com> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Stephen Boyd <sboyd@kernel.org> Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com> [jstultz: Also fixup pre-existing checkpatch warnings for prototype arguments with no variable name] Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
4b1b7f80 |
|
19-Jul-2018 |
Pavel Tatashin <pasha.tatashin@oracle.com> |
timekeeping: Default boot time offset to local_clock() read_persistent_wall_and_boot_offset() is called during boot to read both the persistent clock and also return the offset between the boot time and the value of persistent clock. Change the default boot_offset from zero to local_clock() so architectures, that do not have a dedicated boot_clock but have early sched_clock(), such as SPARCv9, x86, and possibly more will benefit from this change by getting a better and more consistent estimate of the boot time without need for an arch specific implementation. Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: steven.sistare@oracle.com Cc: daniel.m.jordan@oracle.com Cc: linux@armlinux.org.uk Cc: schwidefsky@de.ibm.com Cc: heiko.carstens@de.ibm.com Cc: john.stultz@linaro.org Cc: sboyd@codeaurora.org Cc: hpa@zytor.com Cc: douly.fnst@cn.fujitsu.com Cc: peterz@infradead.org Cc: prarit@redhat.com Cc: feng.tang@intel.com Cc: pmladek@suse.com Cc: gnomes@lxorguk.ukuu.org.uk Cc: linux-s390@vger.kernel.org Cc: boris.ostrovsky@oracle.com Cc: jgross@suse.com Cc: pbonzini@redhat.com Link: https://lkml.kernel.org/r/20180719205545.16512-17-pasha.tatashin@oracle.com
|
#
3eca9937 |
|
19-Jul-2018 |
Pavel Tatashin <pasha.tatashin@oracle.com> |
timekeeping: Replace read_boot_clock64() with read_persistent_wall_and_boot_offset() If architecture does not support exact boot time, it is challenging to estimate boot time without having a reference to the current persistent clock value. Yet, it cannot read the persistent clock time again, because this may lead to math discrepancies with the caller of read_boot_clock64() who have read the persistent clock at a different time. This is why it is better to provide two values simultaneously: the persistent clock value, and the boot time. Replace read_boot_clock64() with: read_persistent_wall_and_boot_offset(wall_time, boot_offset) Where wall_time is returned by read_persistent_clock() And boot_offset is wall_time - boot time, which defaults to 0. Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: steven.sistare@oracle.com Cc: daniel.m.jordan@oracle.com Cc: linux@armlinux.org.uk Cc: schwidefsky@de.ibm.com Cc: heiko.carstens@de.ibm.com Cc: john.stultz@linaro.org Cc: sboyd@codeaurora.org Cc: hpa@zytor.com Cc: douly.fnst@cn.fujitsu.com Cc: peterz@infradead.org Cc: prarit@redhat.com Cc: feng.tang@intel.com Cc: pmladek@suse.com Cc: gnomes@lxorguk.ukuu.org.uk Cc: linux-s390@vger.kernel.org Cc: boris.ostrovsky@oracle.com Cc: jgross@suse.com Cc: pbonzini@redhat.com Link: https://lkml.kernel.org/r/20180719205545.16512-16-pasha.tatashin@oracle.com
|
#
b061c7a5 |
|
04-Jun-2018 |
Miroslav Lichvar <mlichvar@redhat.com> |
timekeeping: Update multiplier when NTP frequency is set directly When the NTP frequency is set directly from userspace using the ADJ_FREQUENCY or ADJ_TICK timex mode, immediately update the timekeeper's multiplier instead of waiting for the next tick. This removes a hidden non-deterministic delay in setting of the frequency and allows an extremely tight control of the system clock with update rates close to or even exceeding the kernel HZ. The update is limited to archs using modern timekeeping (!ARCH_USES_GETTIMEOFFSET). Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Miroslav Lichvar <mlichvar@redhat.com> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Stephen Boyd <sboyd@kernel.org> Signed-off-by: Miroslav Lichvar <mlichvar@redhat.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
d30faff9 |
|
18-Jun-2018 |
Arnd Bergmann <arnd@arndb.de> |
timekeeping: Use ktime_get_real_ts64() instead of getnstimeofday64() The two do the same, this moves all users to the newer name for consistency. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: John Stultz <john.stultz@linaro.org> Cc: y2038@lists.linaro.org Cc: Stephen Boyd <sboyd@kernel.org> Cc: Miroslav Lichvar <mlichvar@redhat.com> Link: https://lkml.kernel.org/r/20180618140811.2998503-3-arnd@arndb.de
|
#
b9ff604c |
|
27-Apr-2018 |
Arnd Bergmann <arnd@arndb.de> |
timekeeping: Add ktime_get_coarse_with_offset I have run into a couple of drivers using current_kernel_time() suffering from the y2038 problem, and they could be converted to using ktime_t, but don't have interfaces that skip the nanosecond calculation at the moment. This introduces ktime_get_coarse_with_offset() as a simpler variant of ktime_get_with_offset(), and adds wrappers for the three time domains we support with the existing function. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Stephen Boyd <sboyd@kernel.org> Cc: y2038@lists.linaro.org Cc: John Stultz <john.stultz@linaro.org> Link: https://lkml.kernel.org/r/20180427134016.2525989-5-arnd@arndb.de
|
#
fb7fcc96 |
|
27-Apr-2018 |
Arnd Bergmann <arnd@arndb.de> |
timekeeping: Standardize on ktime_get_*() naming The current_kernel_time64, get_monotonic_coarse64, getrawmonotonic64, get_monotonic_boottime64 and timekeeping_clocktai64 interfaces have rather inconsistent naming, and they differ in the calling conventions by passing the output either by reference or as a return value. Rename them to ktime_get_coarse_real_ts64, ktime_get_coarse_ts64, ktime_get_raw_ts64, ktime_get_boottime_ts64 and ktime_get_clocktai_ts64 respectively, and provide the interfaces with macros or inline functions as needed. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Stephen Boyd <sboyd@kernel.org> Cc: y2038@lists.linaro.org Cc: John Stultz <john.stultz@linaro.org> Link: https://lkml.kernel.org/r/20180427134016.2525989-4-arnd@arndb.de
|
#
edca71fe |
|
27-Apr-2018 |
Arnd Bergmann <arnd@arndb.de> |
timekeeping: Clean up ktime_get_real_ts64 In a move to make ktime_get_*() the preferred driver interface into the timekeeping code, sanitizes ktime_get_real_ts64() to be a proper exported symbol rather than an alias for getnstimeofday64(). The internal __getnstimeofday64() is no longer used, so remove that and merge it into ktime_get_real_ts64(). Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Stephen Boyd <sboyd@kernel.org> Cc: y2038@lists.linaro.org Cc: John Stultz <john.stultz@linaro.org> Link: https://lkml.kernel.org/r/20180427134016.2525989-3-arnd@arndb.de
|
#
a3ed0e43 |
|
25-Apr-2018 |
Thomas Gleixner <tglx@linutronix.de> |
Revert: Unify CLOCK_MONOTONIC and CLOCK_BOOTTIME Revert commits 92af4dcb4e1c ("tracing: Unify the "boot" and "mono" tracing clocks") 127bfa5f4342 ("hrtimer: Unify MONOTONIC and BOOTTIME clock behavior") 7250a4047aa6 ("posix-timers: Unify MONOTONIC and BOOTTIME clock behavior") d6c7270e913d ("timekeeping: Remove boot time specific code") f2d6fdbfd238 ("Input: Evdev - unify MONOTONIC and BOOTTIME clock behavior") d6ed449afdb3 ("timekeeping: Make the MONOTONIC clock behave like the BOOTTIME clock") 72199320d49d ("timekeeping: Add the new CLOCK_MONOTONIC_ACTIVE clock") As stated in the pull request for the unification of CLOCK_MONOTONIC and CLOCK_BOOTTIME, it was clear that we might have to revert the change. As reported by several folks systemd and other applications rely on the documented behaviour of CLOCK_MONOTONIC on Linux and break with the above changes. After resume daemons time out and other timeout related issues are observed. Rafael compiled this list: * systemd kills daemons on resume, after >WatchdogSec seconds of suspending (Genki Sky). [Verified that that's because systemd uses CLOCK_MONOTONIC and expects it to not include the suspend time.] * systemd-journald misbehaves after resume: systemd-journald[7266]: File /var/log/journal/016627c3c4784cd4812d4b7e96a34226/system.journal corrupted or uncleanly shut down, renaming and replacing. (Mike Galbraith). * NetworkManager reports "networking disabled" and networking is broken after resume 50% of the time (Pavel). [May be because of systemd.] * MATE desktop dims the display and starts the screensaver right after system resume (Pavel). * Full system hang during resume (me). [May be due to systemd or NM or both.] That happens on debian and open suse systems. It's sad, that these problems were neither catched in -next nor by those folks who expressed interest in this change. Reported-by: Rafael J. Wysocki <rjw@rjwysocki.net> Reported-by: Genki Sky <sky@genki.is>, Reported-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kevin Easton <kevin@guarana.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mark Salyzyn <salyzyn@android.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org>
|
#
e142aa09 |
|
12-Apr-2018 |
Baolin Wang <baolin.wang@linaro.org> |
timekeeping: Remove __current_kernel_time() The __current_kernel_time() function based on 'struct timespec' is no longer recommended for new code, and the only user of this function has been replaced by commit 6909e29fdefb ("kdb: use __ktime_get_real_seconds instead of __current_kernel_time"). Remove the obsolete interface. Signed-off-by: Baolin Wang <baolin.wang@linaro.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: arnd@arndb.de Cc: sboyd@kernel.org Cc: broonie@kernel.org Cc: john.stultz@linaro.org Link: https://lkml.kernel.org/r/1a9dbea7ee2cda7efe9ed330874075cf17fdbff6.1523596316.git.baolin.wang@linaro.org
|
#
127bfa5f |
|
01-Mar-2018 |
Thomas Gleixner <tglx@linutronix.de> |
hrtimer: Unify MONOTONIC and BOOTTIME clock behavior Now that th MONOTONIC and BOOTTIME clocks are indentical remove all the special casing. The user space visible interfaces still support both clocks, but their behavior is identical. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kevin Easton <kevin@guarana.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mark Salyzyn <salyzyn@android.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20180301165150.410218515@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
d6c7270e |
|
01-Mar-2018 |
Thomas Gleixner <tglx@linutronix.de> |
timekeeping: Remove boot time specific code Now that the MONOTONIC and BOOTTIME clocks are the same, remove all the special handling from timekeeping. Keep wrappers for the existing users of the *boot* timekeeper interfaces. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kevin Easton <kevin@guarana.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mark Salyzyn <salyzyn@android.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20180301165150.236279497@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
d6ed449a |
|
01-Mar-2018 |
Thomas Gleixner <tglx@linutronix.de> |
timekeeping: Make the MONOTONIC clock behave like the BOOTTIME clock The MONOTONIC clock is not fast forwarded by the time spent in suspend on resume. This is only done for the BOOTTIME clock. The reason why the MONOTONIC clock is not forwarded is historical: the original Linux implementation was using jiffies as a base for the MONOTONIC clock and jiffies have never been advanced after resume. At some point when timekeeping was unified in the core code, the MONONOTIC clock was advanced after resume which also advanced jiffies causing interesting side effects. As a consequence the the MONOTONIC clock forwarding was disabled again and the BOOTTIME clock was introduced, which allows to read time since boot. Back then it was not possible to completely distangle the MONOTONIC clock and jiffies because there were still interfaces which exposed the MONOTONIC clock behaviour based on the timer wheel and therefore jiffies. As of today none of the MONOTONIC clock facilities depends on jiffies anymore so the forwarding can be done seperately. This is achieved by forwarding the variables which are used for the jiffies update after resume before the tick is restarted, In timekeeping resume, the change is rather simple. Instead of updating the offset between the MONOTONIC clock and the REALTIME/BOOTTIME clocks, advance the time keeper base for the MONOTONIC and the MONOTONIC_RAW clocks by the time spent in suspend. The MONOTONIC clock is now the same as the BOOTTIME clock and the offset between the REALTIME and the MONOTONIC clocks is the same as before suspend. There might be side effects in applications, which rely on the (unfortunately) well documented behaviour of the MONOTONIC clock, but the downsides of the existing behaviour are probably worse. There is one obvious issue. Up to now it was possible to retrieve the time spent in suspend by observing the delta between the MONOTONIC clock and the BOOTTIME clock. This is not longer available, but the previously introduced mechanism to read the active non-suspended monotonic time can mitigate that in a detectable fashion. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kevin Easton <kevin@guarana.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mark Salyzyn <salyzyn@android.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20180301165150.062975504@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
72199320 |
|
01-Mar-2018 |
Thomas Gleixner <tglx@linutronix.de> |
timekeeping: Add the new CLOCK_MONOTONIC_ACTIVE clock The planned change to unify the behaviour of the MONOTONIC and BOOTTIME clocks vs. suspend removes the ability to retrieve the active non-suspended time of a system. Provide a new CLOCK_MONOTONIC_ACTIVE clock which returns the active non-suspended time of the system via clock_gettime(). This preserves the old behaviour of CLOCK_MONOTONIC before the BOOTTIME/MONOTONIC unification. This new clock also allows applications to detect programmatically that the MONOTONIC and BOOTTIME clocks are identical. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kevin Easton <kevin@guarana.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mark Salyzyn <salyzyn@android.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20180301165149.965235774@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
78b98e3c |
|
09-Mar-2018 |
Miroslav Lichvar <mlichvar@redhat.com> |
timekeeping/ntp: Determine the multiplier directly from NTP tick length When the length of the NTP tick changes significantly, e.g. when an NTP/PTP application is correcting the initial offset of the clock, a large value may accumulate in the NTP error before the multiplier converges to the correct value. It may then take a very long time (hours or even days) before the error is corrected. This causes the clock to have an unstable frequency offset, which has a negative impact on the stability of synchronization with precise time sources (e.g. NTP/PTP using hardware timestamping or the PTP KVM clock). Use division to determine the correct multiplier directly from the NTP tick length and replace the iterative approach. This removes the last major source of the NTP error. The only remaining source is now limited resolution of the multiplier, which is corrected by adding 1 to the multiplier when the system clock is behind the NTP time. Signed-off-by: Miroslav Lichvar <mlichvar@redhat.com> Signed-off-by: John Stultz <john.stultz@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Stephen Boyd <stephen.boyd@linaro.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1520620971-9567-3-git-send-email-john.stultz@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
c2cda2a5 |
|
09-Mar-2018 |
Miroslav Lichvar <mlichvar@redhat.com> |
timekeeping/ntp: Don't align NTP frequency adjustments to ticks When the timekeeping multiplier is changed, the NTP error is updated to correct the clock for the delay between the tick and the update of the clock. This error is corrected in later updates and the clock appears as if the frequency was changed exactly on the tick. Remove this correction to keep the point where the frequency is effectively changed at the time of the update. This removes a major source of the NTP error. Signed-off-by: Miroslav Lichvar <mlichvar@redhat.com> Signed-off-by: John Stultz <john.stultz@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Stephen Boyd <stephen.boyd@linaro.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1520620971-9567-2-git-send-email-john.stultz@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
aea3706c |
|
13-Nov-2017 |
Miroslav Lichvar <mlichvar@redhat.com> |
timekeeping: Remove CONFIG_GENERIC_TIME_VSYSCALL_OLD As of d4d1fc61eb38f (ia64: Update fsyscall gettime to use modern vsyscall_update)the last user of CONFIG_GENERIC_TIME_VSYSCALL_OLD have been updated, the legacy support for old-style vsyscall implementations can be removed from the timekeeping code. (Thanks again to Tony Luck for helping remove the last user!) [jstultz: Commit message rework] Signed-off-by: Miroslav Lichvar <mlichvar@redhat.com> Signed-off-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Stephen Boyd <stephen.boyd@linaro.org> Link: https://lkml.kernel.org/r/1510613491-16695-1-git-send-email-john.stultz@linaro.org
|
#
df27067e |
|
10-Nov-2017 |
Arnd Bergmann <arnd@arndb.de> |
pstore: Use ktime_get_real_fast_ns() instead of __getnstimeofday() __getnstimeofday() is a rather odd interface, with a number of quirks: - The caller may come from NMI context, but the implementation is not NMI safe, one way to get there from NMI is NMI handler: something bad panic() kmsg_dump() pstore_dump() pstore_record_init() __getnstimeofday() - The calling conventions are different from any other timekeeping functions, to deal with returning an error code during suspended timekeeping. Address the above issues by using a completely different method to get the time: ktime_get_real_fast_ns() is NMI safe and has a reasonable behavior when timekeeping is suspended: it returns the time at which it got suspended. As Thomas Gleixner explained, this is safe, as ktime_get_real_fast_ns() does not call into the clocksource driver that might be suspended. The result can easily be transformed into a timespec structure. Since ktime_get_real_fast_ns() was not exported to modules, add the export. The pstore behavior for the suspended case changes slightly, as it now stores the timestamp at which timekeeping was suspended instead of storing a zero timestamp. This change is not addressing y2038-safety, that's subject to a more complex follow up patch. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Kees Cook <keescook@chromium.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Anton Vorontsov <anton@enomsg.org> Cc: Stephen Boyd <sboyd@codeaurora.org> Cc: John Stultz <john.stultz@linaro.org> Cc: Colin Cross <ccross@android.com> Link: https://lkml.kernel.org/r/20171110152530.1926955-1-arnd@arndb.de
|
#
1572fa03 |
|
19-Oct-2017 |
Arnd Bergmann <arnd@arndb.de> |
timekeeping: Use timespec64 in timekeeping_inject_offset As part of changing all the timekeeping code to use 64-bit time_t consistently, this removes the uses of timeval and timespec as much as possible from do_adjtimex() and timekeeping_inject_offset(). The timeval_inject_offset_valid() and timespec_inject_offset_valid() just complicate this, so I'm folding them into the respective callers. This leaves the actual 'struct timex' definition, which is part of the user-space ABI and should be dealt with separately when we have agreed on the ABI change. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Miroslav Lichvar <mlichvar@redhat.com> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Stephen Boyd <stephen.boyd@linaro.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
e0956dcc |
|
19-Oct-2017 |
Arnd Bergmann <arnd@arndb.de> |
timekeeping: Consolidate timekeeping_inject_offset code The code to check the adjtimex() or clock_adjtime() arguments is spread out across multiple files for presumably only historic reasons. As a preparatation for a rework to get rid of the use of 'struct timeval' and 'struct timespec' in there, this moves all the portions into kernel/time/timekeeping.c and marks them as 'static'. The warp_clock() function here is not as closely related as the others, but I feel it still makes sense to move it here in order to consolidate all callers of timekeeping_inject_offset(). Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Miroslav Lichvar <mlichvar@redhat.com> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Stephen Boyd <stephen.boyd@linaro.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de> [jstultz: Whitespace fixup] Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
4c3711d7 |
|
31-Aug-2017 |
Thomas Gleixner <tglx@linutronix.de> |
timekeeping: Provide NMI safe access to clock realtime The configurable printk timestamping wants access to clock realtime. Right now there is no ktime_get_real_fast_ns() accessor because reading the monotonic base and the realtime offset cannot be done atomically. Contrary to boot time this offset can change during runtime and cause half updated readouts. struct tk_read_base was fully packed when the fast timekeeper access was implemented. commit ceea5e3771ed ("time: Fix clock->read(clock) race around clocksource changes") removed the 'read' function pointer from the structure, but of course left the comment stale. So now the structure can fit a new 64bit member w/o violating the cache line constraints. Add real_base to tk_read_base and update it in the fast timekeeper update sequence. Implement an accessor which follows the same scheme as the accessor to clock monotonic, but uses the new real_base to access clock real time. The runtime overhead for updating real_base is minimal as it just adds two cache hot values and stores them into an already dirtied cache line along with the other fast timekeeper updates. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Prarit Bhargava <prarit@redhat.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Peter Zijlstra <peterz@infradead,org> Link: https://lkml.kernel.org/r/1505757060-2004-3-git-send-email-prarit@redhat.com
|
#
5df32107 |
|
28-Aug-2017 |
Prarit Bhargava <prarit@redhat.com> |
timekeeping: Make fast accessors return 0 before timekeeping is initialized printk timestamps will be extended to include mono and boot time by using the fast timekeeping accessors ktime_get_mono|boot_fast_ns(). The functions can return garbage before timekeeping is initialized resulting in garbage timestamps. Initialize the fast timekeepers with dummy clocks which guarantee a 0 readout up to timekeeping_init(). Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Prarit Bhargava <prarit@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Stephen Boyd <sboyd@codeaurora.org> Cc: John Stultz <john.stultz@linaro.org> Link: http://lkml.kernel.org/r/1503922914-10660-2-git-send-email-prarit@redhat.com
|
#
a2d81803 |
|
08-Sep-2017 |
Robert P. J. Day <rpjday@crashcourse.ca> |
drivers/pps: aesthetic tweaks to PPS-related content Collection of aesthetic adjustments to various PPS-related files, directories and Documentation, some quite minor just for the sake of consistency, including: * Updated example of pps device tree node (courtesy Rodolfo G.) * "PPS-API" -> "PPS API" * "pps_source_info_s" -> "pps_source_info" * "ktimer driver" -> "pps-ktimer driver" * "ppstest /dev/pps0" -> "ppstest /dev/pps1" to match example * Add missing PPS-related entries to MAINTAINERS file * Other trivialities Link: http://lkml.kernel.org/r/alpine.LFD.2.20.1708261048220.8106@localhost.localdomain Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca> Acked-by: Rodolfo Giometti <giometti@enneenne.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
0bcdc098 |
|
25-Aug-2017 |
John Stultz <john.stultz@linaro.org> |
time: Fix ktime_get_raw() incorrect base accumulation In comqit fc6eead7c1e2 ("time: Clean up CLOCK_MONOTONIC_RAW time handling"), the following code got mistakenly added to the update of the raw timekeeper: /* Update the monotonic raw base */ seconds = tk->raw_sec; nsec = (u32)(tk->tkr_raw.xtime_nsec >> tk->tkr_raw.shift); tk->tkr_raw.base = ns_to_ktime(seconds * NSEC_PER_SEC + nsec); Which adds the raw_sec value and the shifted down raw xtime_nsec to the base value. But the read function adds the shifted down tk->tkr_raw.xtime_nsec value another time, The result of this is that ktime_get_raw() users (which are all internal users) see the raw time move faster then it should (the rate at which can vary with the current size of tkr_raw.xtime_nsec), which has resulted in at least problems with graphics rendering performance. The change tried to match the monotonic base update logic: seconds = (u64)(tk->xtime_sec + tk->wall_to_monotonic.tv_sec); nsec = (u32) tk->wall_to_monotonic.tv_nsec; tk->tkr_mono.base = ns_to_ktime(seconds * NSEC_PER_SEC + nsec); Which adds the wall_to_monotonic.tv_nsec value, but not the tk->tkr_mono.xtime_nsec value to the base. To fix this, simplify the tkr_raw.base accumulation to only accumulate the raw_sec portion, and do not include the tkr_raw.xtime_nsec portion, which will be added at read time. Fixes: fc6eead7c1e2 ("time: Clean up CLOCK_MONOTONIC_RAW time handling") Reported-and-tested-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Stephen Boyd <stephen.boyd@linaro.org> Cc: Will Deacon <will.deacon@arm.com> Cc: Miroslav Lichvar <mlichvar@redhat.com> Cc: Daniel Mentz <danielmentz@google.com> Link: http://lkml.kernel.org/r/1503701824-1645-1-git-send-email-john.stultz@linaro.org
|
#
a529bea8 |
|
28-Jun-2017 |
Stafford Horne <shorne@gmail.com> |
timekeeping: Use proper timekeeper for debug code When CONFIG_DEBUG_TIMEKEEPING is enabled the timekeeping_check_update() function will update status like last_warning and underflow_seen on the timekeeper. If there are issues found this state is used to rate limit the warnings that get printed. This rate limiting doesn't really really work if stored in real_tk as the shadow timekeeper is overwritten onto real_tk at the end of every update_wall_time() call, resetting last_warning and other statuses. Fix rate limiting by using the shadow_timekeeper for timekeeping_check_update(). Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Miroslav Lichvar <mlichvar@redhat.com> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Stephen Boyd <stephen.boyd@linaro.org> Fixes: commit 57d05a93ada7 ("time: Rework debugging variables so they aren't global") Signed-off-by: Stafford Horne <shorne@gmail.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
369adf04 |
|
12-May-2017 |
John Stultz <john.stultz@linaro.org> |
time: Add warning about imminent deprecation of CONFIG_GENERIC_TIME_VSYSCALL_OLD CONFIG_GENERIC_TIME_VSYSCALL_OLD was introduced five years ago to allow a transition from the old vsyscall implementations to the new method (which simplified internal accounting and made timekeeping more precise). However, PPC and IA64 have yet to make the transition, despite in some cases me sending test patches to try to help it along. http://patches.linaro.org/patch/30501/ http://patches.linaro.org/patch/35412/ If its helpful, my last pass at the patches can be found here: https://git.linaro.org/people/john.stultz/linux.git dev/oldvsyscall-cleanup So I think its time to set a deadline and make it clear this is going away. So this patch adds warnings about this functionality being dropped. Likely to be in v4.15. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Miroslav Lichvar <mlichvar@redhat.com> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Anton Blanchard <anton@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
fc6eead7 |
|
22-May-2017 |
John Stultz <john.stultz@linaro.org> |
time: Clean up CLOCK_MONOTONIC_RAW time handling Now that we fixed the sub-ns handling for CLOCK_MONOTONIC_RAW, remove the duplicitive tk->raw_time.tv_nsec, which can be stored in tk->tkr_raw.xtime_nsec (similarly to how its handled for monotonic time). Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Miroslav Lichvar <mlichvar@redhat.com> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Stephen Boyd <stephen.boyd@linaro.org> Cc: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Daniel Mentz <danielmentz@google.com> Tested-by: Daniel Mentz <danielmentz@google.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
3d88d56c |
|
08-Jun-2017 |
John Stultz <john.stultz@linaro.org> |
time: Fix CLOCK_MONOTONIC_RAW sub-nanosecond accounting Due to how the MONOTONIC_RAW accumulation logic was handled, there is the potential for a 1ns discontinuity when we do accumulations. This small discontinuity has for the most part gone un-noticed, but since ARM64 enabled CLOCK_MONOTONIC_RAW in their vDSO clock_gettime implementation, we've seen failures with the inconsistency-check test in kselftest. This patch addresses the issue by using the same sub-ns accumulation handling that CLOCK_MONOTONIC uses, which avoids the issue for in-kernel users. Since the ARM64 vDSO implementation has its own clock_gettime calculation logic, this patch reduces the frequency of errors, but failures are still seen. The ARM64 vDSO will need to be updated to include the sub-nanosecond xtime_nsec values in its calculation for this issue to be completely fixed. Signed-off-by: John Stultz <john.stultz@linaro.org> Tested-by: Daniel Mentz <danielmentz@google.com> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Stephen Boyd <stephen.boyd@linaro.org> Cc: Will Deacon <will.deacon@arm.com> Cc: "stable #4 . 8+" <stable@vger.kernel.org> Cc: Miroslav Lichvar <mlichvar@redhat.com> Link: http://lkml.kernel.org/r/1496965462-20003-3-git-send-email-john.stultz@linaro.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
ceea5e37 |
|
08-Jun-2017 |
John Stultz <john.stultz@linaro.org> |
time: Fix clock->read(clock) race around clocksource changes In tests, which excercise switching of clocksources, a NULL pointer dereference can be observed on AMR64 platforms in the clocksource read() function: u64 clocksource_mmio_readl_down(struct clocksource *c) { return ~(u64)readl_relaxed(to_mmio_clksrc(c)->reg) & c->mask; } This is called from the core timekeeping code via: cycle_now = tkr->read(tkr->clock); tkr->read is the cached tkr->clock->read() function pointer. When the clocksource is changed then tkr->clock and tkr->read are updated sequentially. The code above results in a sequential load operation of tkr->read and tkr->clock as well. If the store to tkr->clock hits between the loads of tkr->read and tkr->clock, then the old read() function is called with the new clock pointer. As a consequence the read() function dereferences a different data structure and the resulting 'reg' pointer can point anywhere including NULL. This problem was introduced when the timekeeping code was switched over to use struct tk_read_base. Before that, it was theoretically possible as well when the compiler decided to reload clock in the code sequence: now = tk->clock->read(tk->clock); Add a helper function which avoids the issue by reading tk_read_base->clock once into a local variable clk and then issue the read function via clk->read(clk). This guarantees that the read() function always gets the proper clocksource pointer handed in. Since there is now no use for the tkr.read pointer, this patch also removes it, and to address stopping the fast timekeeper during suspend/resume, it introduces a dummy clocksource to use rather then just a dummy read function. Signed-off-by: John Stultz <john.stultz@linaro.org> Acked-by: Ingo Molnar <mingo@kernel.org> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Stephen Boyd <stephen.boyd@linaro.org> Cc: stable <stable@vger.kernel.org> Cc: Miroslav Lichvar <mlichvar@redhat.com> Cc: Daniel Mentz <danielmentz@google.com> Link: http://lkml.kernel.org/r/1496965462-20003-2-git-send-email-john.stultz@linaro.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
5fc63f95 |
|
24-Mar-2017 |
Nicholas Mc Guire <der.herr@hofr.at> |
timekeeping: Remove pointless conversion to bool interp_forward is type bool so assignment from a logical operation directly is sufficient. Signed-off-by: Nicholas Mc Guire <der.herr@hofr.at> Cc: "Christopher S. Hall" <christopher.s.hall@intel.com> Cc: John Stultz <john.stultz@linaro.org> Link: http://lkml.kernel.org/r/1490382215-30505-1-git-send-email-der.herr@hofr.at Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
38b8d208 |
|
08-Feb-2017 |
Ingo Molnar <mingo@kernel.org> |
sched/headers: Prepare for new header dependencies before moving code to <linux/sched/nmi.h> We are going to move softlockup APIs out of <linux/sched.h>, which will have to be picked up from other headers and a couple of .c files. <linux/nmi.h> already includes <linux/sched.h>. Include the <linux/nmi.h> header in the files that are going to need it. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
4f17722c |
|
08-Feb-2017 |
Ingo Molnar <mingo@kernel.org> |
sched/headers: Prepare for new header dependencies before moving code to <linux/sched/loadavg.h> We are going to split <linux/sched/loadavg.h> out of <linux/sched.h>, which will have to be picked up from a couple of .c files. Create a trivial placeholder <linux/sched/topology.h> file that just maps to <linux/sched.h> to make this patch obviously correct and bisectable. Include the new header in the files that are going to need it. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
40d9f827 |
|
07-Dec-2016 |
Stephen Boyd <sboyd@codeaurora.org> |
timekeeping: Remove unused timekeeping_{get,set}_tai_offset() The last caller to timekeeping_set_tai_offset() was in commit 0b5154fb9040 (timekeeping: Simplify tai updating from do_adjtimex, 2013-03-22) and the last caller to timekeeping_get_tai_offset() was in commit 76f4108892d9 (hrtimer: Cleanup hrtimer accessors to the timekepeing state, 2014-07-16). Remove these unused functions now that we handle TAI offsets differently. Cc: John Stultz <john.stultz@linaro.org> Signed-off-by: Stephen Boyd <sboyd@codeaurora.org> Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
2456e855 |
|
25-Dec-2016 |
Thomas Gleixner <tglx@linutronix.de> |
ktime: Get rid of the union ktime is a union because the initial implementation stored the time in scalar nanoseconds on 64 bit machine and in a endianess optimized timespec variant for 32bit machines. The Y2038 cleanup removed the timespec variant and switched everything to scalar nanoseconds. The union remained, but become completely pointless. Get rid of the union and just keep ktime_t as simple typedef of type s64. The conversion was done with coccinelle and some manual mopping up. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org>
|
#
a5a1d1c2 |
|
21-Dec-2016 |
Thomas Gleixner <tglx@linutronix.de> |
clocksource: Use a plain u64 instead of cycle_t There is no point in having an extra type for extra confusion. u64 is unambiguous. Conversion was done with the following coccinelle script: @rem@ @@ -typedef u64 cycle_t; @fix@ typedef cycle_t; @@ -cycle_t +u64 Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: John Stultz <john.stultz@linaro.org>
|
#
c029a2be |
|
08-Dec-2016 |
Thomas Gleixner <tglx@linutronix.de> |
timekeeping: Use mul_u64_u32_shr() instead of open coding it The resume code must deal with a clocksource delta which is potentially big enough to overflow the 64bit mult. Replace the open coded handling with the proper function. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Parit Bhargava <prarit@redhat.com> Cc: Laurent Vivier <lvivier@redhat.com> Cc: "Christopher S. Hall" <christopher.s.hall@intel.com> Cc: Chris Metcalf <cmetcalf@mellanox.com> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Liav Rehana <liavr@mellanox.com> Cc: John Stultz <john.stultz@linaro.org> Link: http://lkml.kernel.org/r/20161208204228.921674404@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
cbd99e3b |
|
08-Dec-2016 |
Thomas Gleixner <tglx@linutronix.de> |
timekeeping: Get rid of pointless typecasts cycle_t is defined as u64, so casting it to u64 is a pointless and confusing exercise. cycle_t should simply go away and be replaced with a plain u64 to avoid further confusion. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Parit Bhargava <prarit@redhat.com> Cc: Laurent Vivier <lvivier@redhat.com> Cc: "Christopher S. Hall" <christopher.s.hall@intel.com> Cc: Chris Metcalf <cmetcalf@mellanox.com> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Liav Rehana <liavr@mellanox.com> Cc: John Stultz <john.stultz@linaro.org> Link: http://lkml.kernel.org/r/20161208204228.844699737@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
acc89612 |
|
08-Dec-2016 |
Thomas Gleixner <tglx@linutronix.de> |
timekeeping: Make the conversion call chain consistently unsigned Propagating a unsigned value through signed variables and functions makes absolutely no sense and is just prone to (re)introduce subtle signed vs. unsigned issues as happened recently. Clean it up. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Parit Bhargava <prarit@redhat.com> Cc: Laurent Vivier <lvivier@redhat.com> Cc: "Christopher S. Hall" <christopher.s.hall@intel.com> Cc: Chris Metcalf <cmetcalf@mellanox.com> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Liav Rehana <liavr@mellanox.com> Cc: John Stultz <john.stultz@linaro.org> Link: http://lkml.kernel.org/r/20161208204228.765843099@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
9c164572 |
|
08-Dec-2016 |
Thomas Gleixner <tglx@linutronix.de> |
timekeeping_Force_unsigned_clocksource_to_nanoseconds_conversion The clocksource delta to nanoseconds conversion is using signed math, but the delta is unsigned. This makes the conversion space smaller than necessary and in case of a multiplication overflow the conversion can become negative. The conversion is done with scaled math: s64 nsec_delta = ((s64)clkdelta * clk->mult) >> clk->shift; Shifting a signed integer right obvioulsy preserves the sign, which has interesting consequences: - Time jumps backwards - __iter_div_u64_rem() which is used in one of the calling code pathes will take forever to piecewise calculate the seconds/nanoseconds part. This has been reported by several people with different scenarios: David observed that when stopping a VM with a debugger: "It was essentially the stopped by debugger case. I forget exactly why, but the guest was being explicitly stopped from outside, it wasn't just scheduling lag. I think it was something in the vicinity of 10 minutes stopped." When lifting the stop the machine went dead. The stopped by debugger case is not really interesting, but nevertheless it would be a good thing not to die completely. But this was also observed on a live system by Liav: "When the OS is too overloaded, delta will get a high enough value for the msb of the sum delta * tkr->mult + tkr->xtime_nsec to be set, and so after the shift the nsec variable will gain a value similar to 0xffffffffff000000." Unfortunately this has been reintroduced recently with commit 6bd58f09e1d8 ("time: Add cycles to nanoseconds translation"). It had been fixed a year ago already in commit 35a4933a8959 ("time: Avoid signed overflow in timekeeping_get_ns()"). Though it's not surprising that the issue has been reintroduced because the function itself and the whole call chain uses s64 for the result and the propagation of it. The change in this recent commit is subtle: s64 nsec; - nsec = (d * m + n) >> s: + nsec = d * m + n; + nsec >>= s; d being type of cycle_t adds another level of obfuscation. This wouldn't have happened if the previous change to unsigned computation would have made the 'nsec' variable u64 right away and a follow up patch had cleaned up the whole call chain. There have been patches submitted which basically did a revert of the above patch leaving everything else unchanged as signed. Back to square one. This spawned a admittedly pointless discussion about potential users which rely on the unsigned behaviour until someone pointed out that it had been fixed before. The changelogs of said patches added further confusion as they made finally false claims about the consequences for eventual users which expect signed results. Despite delta being cycle_t, aka. u64, it's very well possible to hand in a signed negative value and the signed computation will happily return the correct result. But nobody actually sat down and analyzed the code which was added as user after the propably unintended signed conversion. Though in sensitive code like this it's better to analyze it proper and make sure that nothing relies on this than hunting the subtle wreckage half a year later. After analyzing all call chains it stands that no caller can hand in a negative value (which actually would work due to the s64 cast) and rely on the signed math to do the right thing. Change the conversion function to unsigned math. The conversion of all call chains is done in a follow up patch. This solves the starvation issue, which was caused by the negative result, but it does not solve the underlying problem. It merily procrastinates it. When the timekeeper update is deferred long enough that the unsigned multiplication overflows, then time going backwards is observable again. It does neither solve the issue of clocksources with a small counter width which will wrap around possibly several times and cause random time stamps to be generated. But those are usually not found on systems used for virtualization, so this is likely a non issue. I took the liberty to claim authorship for this simply because analyzing all callsites and writing the changelog took substantially more time than just making the simple s/s64/u64/ change and ignore the rest. Fixes: 6bd58f09e1d8 ("time: Add cycles to nanoseconds translation") Reported-by: David Gibson <david@gibson.dropbear.id.au> Reported-by: Liav Rehana <liavr@mellanox.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Parit Bhargava <prarit@redhat.com> Cc: Laurent Vivier <lvivier@redhat.com> Cc: "Christopher S. Hall" <christopher.s.hall@intel.com> Cc: Chris Metcalf <cmetcalf@mellanox.com> Cc: Richard Cochran <richardcochran@gmail.com> Cc: John Stultz <john.stultz@linaro.org> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/20161208204228.688545601@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
948a5312 |
|
28-Nov-2016 |
Joel Fernandes <joelaf@google.com> |
timekeeping: Add a fast and NMI safe boot clock This boot clock can be used as a tracing clock and will account for suspend time. To keep it NMI safe since we're accessing from tracing, we're not using a separate timekeeper with updates to monotonic clock and boot offset protected with seqlocks. This has the following minor side effects: (1) Its possible that a timestamp be taken after the boot offset is updated but before the timekeeper is updated. If this happens, the new boot offset is added to the old timekeeping making the clock appear to update slightly earlier: CPU 0 CPU 1 timekeeping_inject_sleeptime64() __timekeeping_inject_sleeptime(tk, delta); timestamp(); timekeeping_update(tk, TK_CLEAR_NTP...); (2) On 32-bit systems, the 64-bit boot offset (tk->offs_boot) may be partially updated. Since the tk->offs_boot update is a rare event, this should be a rare occurrence which postprocessing should be able to handle. Signed-off-by: Joel Fernandes <joelaf@google.com> Signed-off-by: John Stultz <john.stultz@linaro.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/1480372524-15181-6-git-send-email-john.stultz@linaro.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
58bfea95 |
|
04-Oct-2016 |
John Stultz <john.stultz@linaro.org> |
timekeeping: Fix __ktime_get_fast_ns() regression In commit 27727df240c7 ("Avoid taking lock in NMI path with CONFIG_DEBUG_TIMEKEEPING"), I changed the logic to open-code the timekeeping_get_ns() function, but I forgot to include the unit conversion from cycles to nanoseconds, breaking the function's output, which impacts users like perf. This results in bogus perf timestamps like: swapper 0 [000] 253.427536: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms]) swapper 0 [000] 254.426573: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms]) swapper 0 [000] 254.426687: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms]) swapper 0 [000] 254.426800: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms]) swapper 0 [000] 254.426905: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms]) swapper 0 [000] 254.427022: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms]) swapper 0 [000] 254.427127: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms]) swapper 0 [000] 254.427239: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms]) swapper 0 [000] 254.427346: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms]) swapper 0 [000] 254.427463: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms]) swapper 0 [000] 255.426572: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms]) Instead of more reasonable expected timestamps like: swapper 0 [000] 39.953768: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms]) swapper 0 [000] 40.064839: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms]) swapper 0 [000] 40.175956: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms]) swapper 0 [000] 40.287103: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms]) swapper 0 [000] 40.398217: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms]) swapper 0 [000] 40.509324: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms]) swapper 0 [000] 40.620437: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms]) swapper 0 [000] 40.731546: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms]) swapper 0 [000] 40.842654: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms]) swapper 0 [000] 40.953772: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms]) swapper 0 [000] 41.064881: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms]) Add the proper use of timekeeping_delta_to_ns() to convert the cycle delta to nanoseconds as needed. Thanks to Brendan and Alexei for finding this quickly after the v4.8 release. Unfortunately the problematic commit has landed in some -stable trees so they'll need this fix as well. Many apologies for this mistake. I'll be looking to add a perf-clock sanity test to the kselftest timers tests soon. Fixes: 27727df240c7 "timekeeping: Avoid taking lock in NMI path with CONFIG_DEBUG_TIMEKEEPING" Reported-by: Brendan Gregg <bgregg@netflix.com> Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com> Tested-and-reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: John Stultz <john.stultz@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: stable <stable@vger.kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/1475636148-26539-1-git-send-email-john.stultz@linaro.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
27727df2 |
|
23-Aug-2016 |
John Stultz <john.stultz@linaro.org> |
timekeeping: Avoid taking lock in NMI path with CONFIG_DEBUG_TIMEKEEPING When I added some extra sanity checking in timekeeping_get_ns() under CONFIG_DEBUG_TIMEKEEPING, I missed that the NMI safe __ktime_get_fast_ns() method was using timekeeping_get_ns(). Thus the locking added to the debug checks broke the NMI-safety of __ktime_get_fast_ns(). This patch open-codes the timekeeping_get_ns() logic for __ktime_get_fast_ns(), so can avoid any deadlocks in NMI. Fixes: 4ca22c2648f9 "timekeeping: Add warnings when overflows or underflows are observed" Reported-by: Steven Rostedt <rostedt@goodmis.org> Reported-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: John Stultz <john.stultz@linaro.org> Cc: stable <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/1471993702-29148-2-git-send-email-john.stultz@linaro.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
eaaa7ec7 |
|
09-Mar-2016 |
Gregor Boirie <gregor.boirie@parrot.com> |
timekeeping: export get_monotonic_coarse64 symbol EXPORT_SYMBOL() get_monotonic_coarse64 for new IIO timestamping clock selection usage. This provides user apps the ability to request a particular IIO device to timestamp samples using a monotonic coarse clock granularity. Signed-off-by: Gregor Boirie <gregor.boirie@parrot.com> Signed-off-by: Jonathan Cameron <jic23@kernel.org>
|
#
0209b937 |
|
31-May-2016 |
Thomas Graziadei <thomas.graziadei@omicronenergy.com> |
timekeeping: Fix 1ns/tick drift with GENERIC_TIME_VSYSCALL_OLD The user notices the problem in a raw and real time drift, calling clock_gettime with CLOCK_REALTIME / CLOCK_MONOTONIC_RAW on a system with no ntp correction taking place (no ntpd or ptp stuff running). The problem is, that old_vsyscall_fixup adds an extra 1ns even though xtime_nsec is already held in full nsecs and the remainder in this case is 0. Do the rounding up buisness only if needed. Cc: Prarit Bhargava <prarit@redhat.com> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: Thomas Graziadei <thomas.graziadei@omicronenergy.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
6436257b |
|
08-Mar-2016 |
Ingo Molnar <mingo@kernel.org> |
time/timekeeping: Work around false positive GCC warning Newer GCC versions trigger the following warning: kernel/time/timekeeping.c: In function ‘get_device_system_crosststamp’: kernel/time/timekeeping.c:987:5: warning: ‘clock_was_set_seq’ may be used uninitialized in this function [-Wmaybe-uninitialized] if (discontinuity) { ^ kernel/time/timekeeping.c:1045:15: note: ‘clock_was_set_seq’ was declared here unsigned int clock_was_set_seq; ^ GCC clearly is unable to recognize that the 'do_interp' boolean tracks the initialization status of 'clock_was_set_seq'. The GCC version used was: gcc version 5.3.1 20151207 (Red Hat 5.3.1-2) (GCC) Work it around by initializing clock_was_set_seq to 0. Compilers that are able to recognize the code flow will eliminate the unnecessary initialization. Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: John Stultz <john.stultz@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
2c756feb |
|
22-Feb-2016 |
Christopher S. Hall <christopher.s.hall@intel.com> |
time: Add history to cross timestamp interface supporting slower devices Another representative use case of time sync and the correlated clocksource (in addition to PTP noted above) is PTP synchronized audio. In a streaming application, as an example, samples will be sent and/or received by multiple devices with a presentation time that is in terms of the PTP master clock. Synchronizing the audio output on these devices requires correlating the audio clock with the PTP master clock. The more precise this correlation is, the better the audio quality (i.e. out of sync audio sounds bad). From an application standpoint, to correlate the PTP master clock with the audio device clock, the system clock is used as a intermediate timebase. The transforms such an application would perform are: System Clock <-> Audio clock System Clock <-> Network Device Clock [<-> PTP Master Clock] Modern Intel platforms can perform a more accurate cross timestamp in hardware (ART,audio device clock). The audio driver requires ART->system time transforms -- the same as required for the network driver. These platforms offload audio processing (including cross-timestamps) to a DSP which to ensure uninterrupted audio processing, communicates and response to the host only once every millsecond. As a result is takes up to a millisecond for the DSP to receive a request, the request is processed by the DSP, the audio output hardware is polled for completion, the result is copied into shared memory, and the host is notified. All of these operation occur on a millisecond cadence. This transaction requires about 2 ms, but under heavier workloads it may take up to 4 ms. Adding a history allows these slow devices the option of providing an ART value outside of the current interval. In this case, the callback provided is an accessor function for the previously obtained counter value. If get_system_device_crosststamp() receives a counter value previous to cycle_last, it consults the history provided as an argument in history_ref and interpolates the realtime and monotonic raw system time using the provided counter value. If there are any clock discontinuities, e.g. from calling settimeofday(), the monotonic raw time is interpolated in the usual way, but the realtime clock time is adjusted by scaling the monotonic raw adjustment. When an accessor function is used a history argument *must* be provided. The history is initialized using ktime_get_snapshot() and must be called before the counter values are read. Cc: Prarit Bhargava <prarit@redhat.com> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: kevin.b.stanton@intel.com Cc: kevin.j.clarke@intel.com Cc: hpa@zytor.com Cc: jeffrey.t.kirsher@intel.com Cc: netdev@vger.kernel.org Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Christopher S. Hall <christopher.s.hall@intel.com> [jstultz: Fixed up cycles_t/cycle_t type confusion] Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
8006c245 |
|
22-Feb-2016 |
Christopher S. Hall <christopher.s.hall@intel.com> |
time: Add driver cross timestamp interface for higher precision time synchronization ACKNOWLEDGMENT: cross timestamp code was developed by Thomas Gleixner <tglx@linutronix.de>. It has changed considerably and any mistakes are mine. The precision with which events on multiple networked systems can be synchronized using, as an example, PTP (IEEE 1588, 802.1AS) is limited by the precision of the cross timestamps between the system clock and the device (timestamp) clock. Precision here is the degree of simultaneity when capturing the cross timestamp. Currently the PTP cross timestamp is captured in software using the PTP device driver ioctl PTP_SYS_OFFSET. Reads of the device clock are interleaved with reads of the realtime clock. At best, the precision of this cross timestamp is on the order of several microseconds due to software latencies. Sub-microsecond precision is required for industrial control and some media applications. To achieve this level of precision hardware supported cross timestamping is needed. The function get_device_system_crosstimestamp() allows device drivers to return a cross timestamp with system time properly scaled to nanoseconds. The realtime value is needed to discipline that clock using PTP and the monotonic raw value is used for applications that don't require a "real" time, but need an unadjusted clock time. The get_device_system_crosstimestamp() code calls back into the driver to ensure that the system counter is within the current timekeeping update interval. Modern Intel hardware provides an Always Running Timer (ART) which is exactly related to TSC through a known frequency ratio. The ART is routed to devices on the system and is used to precisely and simultaneously capture the device clock with the ART. Cc: Prarit Bhargava <prarit@redhat.com> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: kevin.b.stanton@intel.com Cc: kevin.j.clarke@intel.com Cc: hpa@zytor.com Cc: jeffrey.t.kirsher@intel.com Cc: netdev@vger.kernel.org Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Christopher S. Hall <christopher.s.hall@intel.com> [jstultz: Reworked to remove extra structures and simplify calling] Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
ba26621e |
|
22-Feb-2016 |
Christopher S. Hall <christopher.s.hall@intel.com> |
time: Remove duplicated code in ktime_get_raw_and_real() The code in ktime_get_snapshot() is a superset of the code in ktime_get_raw_and_real() code. Further, ktime_get_raw_and_real() is called only by the PPS code, pps_get_ts(). Consolidate the pps_get_ts() code into a single function calling ktime_get_snapshot() and eliminate ktime_get_raw_and_real(). A side effect of this is that the raw and real results of pps_get_ts() correspond to exactly the same clock cycle. Previously these values represented separate reads of the system clock. Cc: Prarit Bhargava <prarit@redhat.com> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: kevin.b.stanton@intel.com Cc: kevin.j.clarke@intel.com Cc: hpa@zytor.com Cc: jeffrey.t.kirsher@intel.com Cc: netdev@vger.kernel.org Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Christopher S. Hall <christopher.s.hall@intel.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
9da0f49c |
|
22-Feb-2016 |
Christopher S. Hall <christopher.s.hall@intel.com> |
time: Add timekeeping snapshot code capturing system time and counter In the current timekeeping code there isn't any interface to atomically capture the current relationship between the system counter and system time. ktime_get_snapshot() returns this triple (counter, monotonic raw, realtime) in the system_time_snapshot struct. Cc: Prarit Bhargava <prarit@redhat.com> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: kevin.b.stanton@intel.com Cc: kevin.j.clarke@intel.com Cc: hpa@zytor.com Cc: jeffrey.t.kirsher@intel.com Cc: netdev@vger.kernel.org Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Christopher S. Hall <christopher.s.hall@intel.com> [jstultz: Moved structure definitions around to clean things up, fixed cycles_t/cycle_t confusion.] Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
6bd58f09 |
|
22-Feb-2016 |
Christopher S. Hall <christopher.s.hall@intel.com> |
time: Add cycles to nanoseconds translation The timekeeping code does not currently provide a way to translate externally provided clocksource cycles to system time. The cycle count is always provided by the result clocksource read() method internal to the timekeeping code. The added function timekeeping_cycles_to_ns() calculated a nanosecond value from a cycle count that can be added to tk_read_base.base value yielding the current system time. This allows clocksource cycle values external to the timekeeping code to provide a cycle count that can be transformed to system time. Cc: Prarit Bhargava <prarit@redhat.com> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: kevin.b.stanton@intel.com Cc: kevin.j.clarke@intel.com Cc: hpa@zytor.com Cc: jeffrey.t.kirsher@intel.com Cc: netdev@vger.kernel.org Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Christopher S. Hall <christopher.s.hall@intel.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
fc4fa6e1 |
|
12-Dec-2015 |
Masanari Iida <standby24x7@gmail.com> |
treewide: Fix typo in printk This patch fix spelling typos found in printk and Kconfig. Signed-off-by: Masanari Iida <standby24x7@gmail.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
|
#
ec02b076 |
|
03-Dec-2015 |
John Stultz <john.stultz@linaro.org> |
timekeeping: Cap adjustments so they don't exceed the maxadj value Thus its been occasionally noted that users have seen confusing warnings like: Adjusting tsc more than 11% (5941981 vs 7759439) We try to limit the maximum total adjustment to 11% (10% tick adjustment + 0.5% frequency adjustment). But this is done by bounding the requested adjustment values, and the internal steering that is done by tracking the error from what was requested and what was applied, does not have any such limits. This is usually not problematic, but in some cases has a risk that an adjustment could cause the clocksource mult value to overflow, so its an indication things are outside of what is expected. It ends up most of the reports of this 11% warning are on systems using chrony, which utilizes the adjtimex() ADJ_TICK interface (which allows a +-10% adjustment). The original rational for ADJ_TICK unclear to me but my assumption it was originally added to allow broken systems to get a big constant correction at boot (see adjtimex userspace package for an example) which would allow the system to work w/ ntpd's 0.5% adjustment limit. Chrony uses ADJ_TICK to make very aggressive short term corrections (usually right at startup). Which push us close enough to the max bound that a few late ticks can cause the internal steering to push past the max adjust value (tripping the warning). Thus this patch adds some extra logic to enforce the max adjustment cap in the internal steering. Note: This has the potential to slow corrections when the ADJ_TICK value is furthest away from the default value. So it would be good to get some testing from folks using chrony, to make sure we don't cause any troubles there. Cc: Miroslav Lichvar <mlichvar@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Tested-by: Miroslav Lichvar <mlichvar@redhat.com> Reported-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
dee36654 |
|
12-Dec-2015 |
DengChao <chao.deng@linaro.org> |
timekeeping: Provide internal function __ktime_get_real_seconds In order to fix Y2038 issues in the ntp code we will need replace get_seconds() with ktime_get_real_seconds() but as the ntp code uses the timekeeping lock which is also used by ktime_get_real_seconds(), we need a version without locking. Add a new function __ktime_get_real_seconds() in timekeeping to do this. Reviewed-by: John Stultz <john.stultz@linaro.org> Signed-off-by: DengChao <chao.deng@linaro.org> Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
37cf4dc3 |
|
03-Dec-2015 |
John Stultz <john.stultz@linaro.org> |
time: Verify time values in adjtimex ADJ_SETOFFSET to avoid overflow For adjtimex()'s ADJ_SETOFFSET, make sure the tv_usec value is sane. We might multiply them later which can cause an overflow and undefined behavior. This patch introduces new helper functions to simplify the checking code and adds comments to clarify Orginally this patch was by Sasha Levin, but I've basically rewritten it, so he should get credit for finding the issue and I should get the blame for any mistakes made since. Also, credit to Richard Cochran for the phrasing used in the comment for what is considered valid here. Cc: Sasha Levin <sasha.levin@oracle.com> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Reported-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
35a4933a |
|
29-Nov-2015 |
David Gibson <david@gibson.dropbear.id.au> |
time: Avoid signed overflow in timekeeping_get_ns() 1e75fa8 "time: Condense timekeeper.xtime into xtime_sec" replaced a call to clocksource_cyc2ns() from timekeeping_get_ns() with an open-coded version of the same logic to avoid keeping a semi-redundant struct timespec in struct timekeeper. However, the commit also introduced a subtle semantic change - where clocksource_cyc2ns() uses purely unsigned math, the new version introduces a signed temporary, meaning that if (delta * tk->mult) has a 63-bit overflow the following shift will still give a negative result. The choice of 'maxsec' in __clocksource_updatefreq_scale() means this will generally happen if there's a ~10 minute pause in examining the clocksource. This can be triggered on a powerpc KVM guest by stopping it from qemu for a bit over 10 minutes. After resuming time has jumped backwards several minutes causing numerous problems (jiffies does not advance, msleep()s can be extended by minutes..). It doesn't happen on x86 KVM guests, because the guest TSC is effectively frozen while the guest is stopped, which is not the case for the powerpc timebase. Obviously an unsigned (64 bit) overflow will only take twice as long as a signed, 63-bit overflow. I don't know the time code well enough to know if that will still cause incorrect calculations, or if a 64-bit overflow is avoided elsewhere. Still, an incorrect forwards clock adjustment will cause less trouble than time going backwards. So, this patch removes the potential for intermediate signed overflow. Cc: stable@vger.kernel.org (3.7+) Suggested-by: Laurent Vivier <lvivier@redhat.com> Tested-by: Laurent Vivier <lvivier@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
79211c8e |
|
09-Nov-2015 |
Andrew Morton <akpm@linux-foundation.org> |
remove abs64() Switch everything to the new and more capable implementation of abs(). Mainly to give the new abs() a bit of a workout. Cc: Michal Nazarewicz <mina86@mina86.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
56fd16ca |
|
16-Oct-2015 |
Thomas Gleixner <tglx@linutronix.de> |
timekeeping: Increment clock_was_set_seq in timekeeping_init() timekeeping_init() can set the wall time offset, so we need to increment the clock_was_set_seq counter. That way hrtimers will pick up the early offset immediately. Otherwise on a machine which does not set wall time later in the boot process the hrtimer offset is stale at 0 and wall time timers are going to expire with a delay of 45 years. Fixes: 868a3e915f7f "hrtimer: Make offset update smarter" Reported-and-tested-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Stefan Liebler <stli@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: John Stultz <john.stultz@linaro.org>
|
#
071eee45 |
|
28-Sep-2015 |
Arnd Bergmann <arnd@arndb.de> |
ntp/pps: replace getnstime_raw_and_real with 64-bit version There is exactly one caller of getnstime_raw_and_real in the kernel, which is the pps_get_ts function. This changes the caller and the implementation to work on timespec64 types rather than timespec, to avoid the time_t overflow on 32-bit architectures. For consistency with the other new functions (ktime_get_seconds, ktime_get_real_*, ...), I'm renaming the function to ktime_get_raw_and_real_ts64. We still need to convert from the internal 64-bit type to 32 bit types in the caller, but this conversion is now pushed out from getnstime_raw_and_real to pps_get_ts. A follow-up patch changes the remaining pps code to completely avoid the conversion. Acked-by: Richard Cochran <richardcochran@gmail.com> Acked-by: David S. Miller <davem@davemloft.net> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
7ec88e4b |
|
28-Sep-2015 |
Arnd Bergmann <arnd@arndb.de> |
ntp/pps: use timespec64 for hardpps() There is only one user of the hardpps function in the kernel, so it makes sense to atomically change it over to using 64-bit timestamps for y2038 safety. In the hardpps implementation, we also need to change the pps_normtime structure, which is similar to struct timespec and also requires a 64-bit seconds portion. This introduces two temporary variables in pps_kc_event() to do the conversion, they will be removed again in the next step, which seemed preferable to having a larger patch changing it all at the same time. Acked-by: Richard Cochran <richardcochran@gmail.com> Acked-by: David S. Miller <davem@davemloft.net> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
571af55a |
|
25-Aug-2015 |
Zhen Lei <thunder.leizhen@huawei.com> |
time: Fix spelling in comments Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Cc: Hanjun Guo <guohanjun@huawei.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tianhong Ding <dingtianhong@huawei.com> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: Xinwei Hu <huxinwei@huawei.com> Cc: Xunlei Pang <pang.xunlei@linaro.org> Cc: Zefan Li <lizefan@huawei.com> Link: http://lkml.kernel.org/r/1440484973-13892-1-git-send-email-thunder.leizhen@huawei.com [ Fixed yet another typo in one of the sentences fixed. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
2619d7e9 |
|
09-Sep-2015 |
John Stultz <john.stultz@linaro.org> |
time: Fix timekeeping_freqadjust()'s incorrect use of abs() instead of abs64() The internal clocksteering done for fine-grained error correction uses a logarithmic approximation, so any time adjtimex() adjusts the clock steering, timekeeping_freqadjust() quickly approximates the correct clock frequency over a series of ticks. Unfortunately, the logic in timekeeping_freqadjust(), introduced in commit: dc491596f639 ("timekeeping: Rework frequency adjustments to work better w/ nohz") used the abs() function with a s64 error value to calculate the size of the approximated adjustment to be made. Per include/linux/kernel.h: "abs() should not be used for 64-bit types (s64, u64, long long) - use abs64()". Thus on 32-bit platforms, this resulted in the clocksteering to take a quite dampended random walk trying to converge on the proper frequency, which caused the adjustments to be made much slower then intended (most easily observed when large adjustments are made). This patch fixes the issue by using abs64() instead. Reported-by: Nuno Gonçalves <nunojpg@gmail.com> Tested-by: Nuno Goncalves <nunojpg@gmail.com> Signed-off-by: John Stultz <john.stultz@linaro.org> Cc: <stable@vger.kernel.org> # v3.17+ Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Miroslav Lichvar <mlichvar@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1441840051-20244-1-git-send-email-john.stultz@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
8758a240 |
|
29-Jul-2015 |
Baolin Wang <baolin.wang@linaro.org> |
time: Introduce current_kernel_time64() The current_kernel_time() is not year 2038 safe on 32bit systems since it returns a timespec value. Introduce current_kernel_time64() which returns a timespec64 value. Cc: Prarit Bhargava <prarit@redhat.com> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Baolin Wang <baolin.wang@linaro.org> Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
e1d7ba87 |
|
23-Jun-2015 |
Wang YanQing <udknight@gmail.com> |
time: Always make sure wall_to_monotonic isn't positive Two issues were found on an IMX6 development board without an enabled RTC device(resulting in the boot time and monotonic time being initialized to 0). Issue 1:exportfs -a generate: "exportfs: /opt/nfs/arm does not support NFS export" Issue 2:cat /proc/stat: "btime 4294967236" The same issues can be reproduced on x86 after running the following code: int main(void) { struct timeval val; int ret; val.tv_sec = 0; val.tv_usec = 0; ret = settimeofday(&val, NULL); return 0; } Two issues are different symptoms of same problem: The reason is a positive wall_to_monotonic pushes boot time back to the time before Epoch, and getboottime will return negative value. In symptom 1: negative boot time cause get_expiry() to overflow time_t when input expire time is 2147483647, then cache_flush() always clears entries just added in ip_map_parse. In symptom 2: show_stat() uses "unsigned long" to print negative btime value returned by getboottime. This patch fix the problem by prohibiting time from being set to a value which would cause a negative boot time. As a result one can't set the CLOCK_REALTIME time prior to (1970 + system uptime). Cc: Prarit Bhargava <prarit@redhat.com> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Wang YanQing <udknight@gmail.com> [jstultz: reworded commit message] Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
906c5557 |
|
17-Jun-2015 |
John Stultz <john.stultz@linaro.org> |
timekeeping: Copy the shadow-timekeeper over the real timekeeper last The fix in d151832650ed9 (time: Move clock_was_set_seq update before updating shadow-timekeeper) was unfortunately incomplete. The main gist of that change was to do the shadow-copy update last, so that any state changes were properly duplicated, and we wouldn't accidentally have stale data in the shadow. Unfortunately in the main update_wall_time() logic, we update use the shadow-timekeeper to calculate the next update values, then while holding the lock, copy the shadow-timekeeper over, then call timekeeping_update() to do some additional bookkeeping, (skipping the shadow mirror). The bug with this is the additional bookkeeping isn't all read-only, and some changes timkeeper state. Thus we might then overwrite this state change on the next update. To avoid this problem, do the timekeeping_update() on the shadow-timekeeper prior to copying the full state over to the real-timekeeper. This avoids problems with both the clock_was_set_seq and next_leap_ktime being overwritten and possibly the fast-timekeepers as well. Many thanks to Prarit for his rigorous testing, which discovered this problem, along with Prarit and Daniel's work validating this fix. Reported-by: Prarit Bhargava <prarit@redhat.com> Tested-by: Prarit Bhargava <prarit@redhat.com> Tested-by: Daniel Bristot de Oliveira <bristot@redhat.com> Signed-off-by: John Stultz <john.stultz@linaro.org> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Jan Kara <jack@suse.cz> Cc: Jiri Bohac <jbohac@suse.cz> Cc: Ingo Molnar <mingo@kernel.org> Link: http://lkml.kernel.org/r/1434560753-7441-1-git-send-email-john.stultz@linaro.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
833f32d7 |
|
11-Jun-2015 |
John Stultz <john.stultz@linaro.org> |
time: Prevent early expiry of hrtimers[CLOCK_REALTIME] at the leap second edge Currently, leapsecond adjustments are done at tick time. As a result, the leapsecond was applied at the first timer tick *after* the leapsecond (~1-10ms late depending on HZ), rather then exactly on the second edge. This was in part historical from back when we were always tick based, but correcting this since has been avoided since it adds extra conditional checks in the gettime fastpath, which has performance overhead. However, it was recently pointed out that ABS_TIME CLOCK_REALTIME timers set for right after the leapsecond could fire a second early, since some timers may be expired before we trigger the timekeeping timer, which then applies the leapsecond. This isn't quite as bad as it sounds, since behaviorally it is similar to what is possible w/ ntpd made leapsecond adjustments done w/o using the kernel discipline. Where due to latencies, timers may fire just prior to the settimeofday call. (Also, one should note that all applications using CLOCK_REALTIME timers should always be careful, since they are prone to quirks from settimeofday() disturbances.) However, the purpose of having the kernel do the leap adjustment is to avoid such latencies, so I think this is worth fixing. So in order to properly keep those timers from firing a second early, this patch modifies the ntp and timekeeping logic so that we keep enough state so that the update_base_offsets_now accessor, which provides the hrtimer core the current time, can check and apply the leapsecond adjustment on the second edge. This prevents the hrtimer core from expiring timers too early. This patch does not modify any other time read path, so no additional overhead is incurred. However, this also means that the leap-second continues to be applied at tick time for all other read-paths. Apologies to Richard Cochran, who pushed for similar changes years ago, which I resisted due to the concerns about the performance overhead. While I suspect this isn't extremely critical, folks who care about strict leap-second correctness will likely want to watch this. Potentially a -stable candidate eventually. Originally-suggested-by: Richard Cochran <richardcochran@gmail.com> Reported-by: Daniel Bristot de Oliveira <bristot@redhat.com> Reported-by: Prarit Bhargava <prarit@redhat.com> Signed-off-by: John Stultz <john.stultz@linaro.org> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Jan Kara <jack@suse.cz> Cc: Jiri Bohac <jbohac@suse.cz> Cc: Shuah Khan <shuahkh@osg.samsung.com> Cc: Ingo Molnar <mingo@kernel.org> Link: http://lkml.kernel.org/r/1434063297-28657-4-git-send-email-john.stultz@linaro.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
d1518326 |
|
11-Jun-2015 |
John Stultz <john.stultz@linaro.org> |
time: Move clock_was_set_seq update before updating shadow-timekeeper It was reported that 868a3e915f7f5eba (hrtimer: Make offset update smarter) was causing timer problems after suspend/resume. The problem with that change is the modification to clock_was_set_seq in timekeeping_update is done prior to mirroring the time state to the shadow-timekeeper. Thus the next time we do update_wall_time() the updated sequence is overwritten by whats in the shadow copy. This patch moves the shadow-timekeeper mirroring to the end of the function, after all updates have been made, so all data is kept in sync. (This patch also affects the update_fast_timekeeper calls which were also problematically done prior to the mirroring). Reported-and-tested-by: Jeremiah Mahler <jmmahler@gmail.com> Signed-off-by: John Stultz <john.stultz@linaro.org> Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Link: http://lkml.kernel.org/r/1434063297-28657-2-git-send-email-john.stultz@linaro.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
7fc26327 |
|
26-May-2015 |
Peter Zijlstra <peterz@infradead.org> |
seqlock: Introduce raw_read_seqcount_latch() Because with latches there is a strict data dependency on the seq load we can avoid the rmb in favour of a read_barrier_depends. Suggested-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
#
6695b92a |
|
26-May-2015 |
Peter Zijlstra <peterz@infradead.org> |
seqlock: Better document raw_write_seqcount_latch() Improve the documentation of the latch technique as used in the current timekeeping code, such that it can be readily employed elsewhere. Borrow from the comments in timekeeping and replace those with a reference to this more generic comment. Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: David Woodhouse <David.Woodhouse@intel.com> Cc: Rik van Riel <riel@redhat.com> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Acked-by: Michel Lespinasse <walken@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
#
e83d0a41 |
|
08-Apr-2015 |
Xunlei Pang <pang.xunlei@linaro.org> |
time: Remove read_boot_clock() Now that we have a read_boot_clock64() function available on every architecture, and converted all the users to it, it's time to remove the (now unused) read_boot_clock() completely from the kernel. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: Xunlei Pang <pang.xunlei@linaro.org> [jstultz: Minor commit message tweak suggested by Ingo] Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
57d05a93 |
|
13-May-2015 |
John Stultz <john.stultz@linaro.org> |
time: Rework debugging variables so they aren't global Ingo suggested that the timekeeping debugging variables recently added should not be global, and should be tied to the timekeeper's read_base. Thus this patch implements that suggestion. This version is different from the earlier versions as it keeps the variables in the timekeeper structure rather then in the tkr. Cc: Ingo Molnar <mingo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Richard Cochran <richardcochran@gmail.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
6374f912 |
|
07-Apr-2015 |
Harald Geyer <harald@ccbib.org> |
timekeeping: Provide new API to get the current time resolution This patch series introduces a new function u32 ktime_get_resolution_ns(void) which allows to clean up some driver code. In particular the IIO subsystem has a function to provide timestamps for events but no means to get their resolution. So currently the dht11 driver tries to guess the resolution in a rather messy and convoluted way. We can do much better with the new code. This API is not designed to be exposed to user space. This has been tested on i386, sunxi and mxs. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: Harald Geyer <harald@ccbib.org> [jstultz: Tweaked to make it build after upstream changes] Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
868a3e91 |
|
14-Apr-2015 |
Thomas Gleixner <tglx@linutronix.de> |
hrtimer: Make offset update smarter On every tick/hrtimer interrupt we update the offset variables of the clock bases. That's silly because these offsets change very seldom. Add a sequence counter to the time keeping code which keeps track of the offset updates (clock_was_set()). Have a sequence cache in the hrtimer cpu bases to evaluate whether the offsets must be updated or not. This allows us later to avoid pointless cacheline pollution. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: John Stultz <john.stultz@linaro.org> Link: http://lkml.kernel.org/r/20150414203501.132820245@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: John Stultz <john.stultz@linaro.org>
|
#
21d6d52a |
|
14-Apr-2015 |
Thomas Gleixner <tglx@linutronix.de> |
hrtimer: Get rid of softirq time The softirq time field in the clock bases is an optimization from the early days of hrtimers. It provides a coarse "jiffies" like time mostly for self rearming timers. But that comes with a price: - Larger code size - Extra storage space - Duplicated functions with really small differences The benefit of this is optimization is marginal for contemporary systems. Consolidate everything on the high resolution timer implementation. This makes further optimizations possible. Text size reduction: x8664 -95, i386 -356, ARM -148, ARM64 -40, power64 -16 Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Link: http://lkml.kernel.org/r/20150414203501.039977424@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
347c6f6d |
|
02-Apr-2015 |
Thomas Gleixner <tglx@linutronix.de> |
timekeeping: Get rid of stale comment Arch specific management of xtime/jiffies/wall_to_monotonic is gone for quite a while. Zap the stale comment. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: John Stultz <john.stultz@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: John Stultz <john.stultz@linaro.org> Link: http://lkml.kernel.org/r/2422730.dmO29q661S@vostro.rjw.lan Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
0fa88cb4 |
|
01-Apr-2015 |
Xunlei Pang <pang.xunlei@linaro.org> |
time, drivers/rtc: Don't bother with rtc_resume() for the nonstop clocksource If a system does not provide a persistent_clock(), the time will be updated on resume by rtc_resume(). With the addition of the non-stop clocksources for suspend timing, those systems set the time on resume in timekeeping_resume(), but may not provide a valid persistent_clock(). This results in the rtc_resume() logic thinking no one has set the time and it then will over-write the suspend time again, which is not necessary and only increases clock error. So, fix this for rtc_resume(). This patch also improves the name of persistent_clock_exist to make it more grammatical. Signed-off-by: Xunlei Pang <pang.xunlei@linaro.org> Signed-off-by: John Stultz <john.stultz@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1427945681-29972-19-git-send-email-john.stultz@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
264bb3f7 |
|
01-Apr-2015 |
Xunlei Pang <pang.xunlei@linaro.org> |
time: Fix a bug in timekeeping_suspend() with no persistent clock When there's no persistent clock, normally timekeeping_suspend_time should always be zero, but this can break in timekeeping_suspend(). At T1, there was a system suspend, so old_delta was assigned T1. After some time, one time adjustment happened, and xtime got the value of T1-dt(0s<dt<2s). Then, there comes another system suspend soon after this adjustment, obviously we will get a small negative delta_delta, resulting in a negative timekeeping_suspend_time. This is problematic, when doing timekeeping_resume() if there is no nonstop clocksource for example, it will hit the else leg and inject the improper sleeptime which is the wrong logic. So, we can solve this problem by only doing delta related code when the persistent clock is existent. Actually the code only makes sense for persistent clock cases. Signed-off-by: Xunlei Pang <pang.xunlei@linaro.org> Signed-off-by: John Stultz <john.stultz@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1427945681-29972-18-git-send-email-john.stultz@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
7f298139 |
|
01-Apr-2015 |
Xunlei Pang <pang.xunlei@linaro.org> |
time: Don't build timekeeping_inject_sleeptime64() if no one uses it timekeeping_inject_sleeptime64() is only used by RTC suspend/resume, so add build dependencies on the necessary RTC related macros. Signed-off-by: Xunlei Pang <pang.xunlei@linaro.org> [ Improve commit message clarity. ] Signed-off-by: John Stultz <john.stultz@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1427945681-29972-16-git-send-email-john.stultz@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
2ee96632 |
|
01-Apr-2015 |
Xunlei Pang <pang.xunlei@linaro.org> |
time: Add y2038 safe read_persistent_clock64() As part of addressing in-kernel y2038 issues, this patch adds read_persistent_clock64() and replaces all the call sites of read_persistent_clock() with this function. This is a __weak implementation, which simply calls the existing y2038 unsafe read_persistent_clock(). This allows architecture specific implementations to be converted independently, and eventually the y2038 unsafe read_persistent_clock() can be removed after all its architecture specific implementations have been converted to read_persistent_clock64(). Suggested-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Xunlei Pang <pang.xunlei@linaro.org> Signed-off-by: John Stultz <john.stultz@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1427945681-29972-3-git-send-email-john.stultz@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
9a806ddb |
|
01-Apr-2015 |
Xunlei Pang <pang.xunlei@linaro.org> |
time: Add y2038 safe read_boot_clock64() As part of addressing in-kernel y2038 issues, this patch adds read_boot_clock64() and replaces all the call sites of read_boot_clock() with this function. This is a __weak implementation, which simply calls the existing y2038 unsafe read_boot_clock(). This allows architecture specific implementations to be converted independently, and eventually the y2038 unsafe read_boot_clock() can be removed after all its architecture specific implementations have been converted to read_boot_clock64(). Suggested-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Xunlei Pang <pang.xunlei@linaro.org> Signed-off-by: John Stultz <john.stultz@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1427945681-29972-2-git-send-email-john.stultz@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
4ffee521 |
|
25-Mar-2015 |
Thomas Gleixner <tglx@linutronix.de> |
clockevents: Make suspend/resume calls explicit clockevents_notify() is a leftover from the early design of the clockevents facility. It's really not a notification mechanism, it's a multiplex call. We are way better off to have explicit calls instead of this monstrosity. Split out the suspend/resume() calls and invoke them directly from the call sites. No locking required at this point because these calls happen with interrupts disabled and a single cpu online. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> [ Rebased on top of 4.0-rc5. ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/713674030.jVm1qaHuPf@vostro.rjw.lan [ Rebased on top of latest timers/core. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
f09cb9a1 |
|
19-Mar-2015 |
Peter Zijlstra <peterz@infradead.org> |
time: Introduce tk_fast_raw Add the NMI safe CLOCK_MONOTONIC_RAW accessor.. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: John Stultz <john.stultz@linaro.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20150319093400.562746929@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
4498e746 |
|
19-Mar-2015 |
Peter Zijlstra <peterz@infradead.org> |
time: Parametrize all tk_fast_mono users In preparation for more tk_fast instances, remove all hard-coded tk_fast_mono references. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: John Stultz <john.stultz@linaro.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20150319093400.484279927@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
4a4ad80d |
|
19-Mar-2015 |
Peter Zijlstra <peterz@infradead.org> |
time: Add timerkeeper::tkr_raw Introduce tkr_raw and make use of it. base_raw -> tkr_raw.base clock->{mult,shift} -> tkr_raw.{mult.shift} Kill timekeeping_get_ns_raw() in favour of timekeeping_get_ns(&tkr_raw), this removes all mono_raw special casing. Duplicate the updates to tkr_mono.cycle_last into tkr_raw.cycle_last, both need the same value. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: John Stultz <john.stultz@linaro.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20150319093400.422589590@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
876e7881 |
|
19-Mar-2015 |
Peter Zijlstra <peterz@infradead.org> |
time: Rename timekeeper::tkr to timekeeper::tkr_mono In preparation of adding another tkr field, rename this one to tkr_mono. Also rename tk_read_base::base_mono to tk_read_base::base, since the structure is not specific to CLOCK_MONOTONIC and the mono name got added to the tk_read_base instance. Lots of trivial churn. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: John Stultz <john.stultz@linaro.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20150319093400.344679419@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
4ca22c26 |
|
11-Mar-2015 |
John Stultz <john.stultz@linaro.org> |
timekeeping: Add warnings when overflows or underflows are observed It was suggested that the underflow/overflow protection should probably throw some sort of warning out, rather than just silently fixing the issue. So this patch adds some warnings here. The flag variables used are not protected by locks, but since we can't print from the reading functions, just being able to say we saw an issue in the update interval is useful enough, and can be slightly racy without real consequence. The big complication is that we're only under a read seqlock, so the data could shift under us during our calculation to see if there was a problem. This patch avoids this issue by nesting another seqlock which allows us to snapshot the just required values atomically. So we shouldn't see false positives. I also added some basic rate-limiting here, since on one build machine w/ skewed TSCs it was fairly noisy at bootup. Signed-off-by: John Stultz <john.stultz@linaro.org> Cc: Dave Jones <davej@codemonkey.org.uk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Stephen Boyd <sboyd@codeaurora.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1426133800-29329-8-git-send-email-john.stultz@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
057b87e3 |
|
11-Mar-2015 |
John Stultz <john.stultz@linaro.org> |
timekeeping: Try to catch clocksource delta underflows In the case where there is a broken clocksource where there are multiple actual clocks that aren't perfectly aligned, we may see small "negative" deltas when we subtract 'now' from 'cycle_last'. The values are actually negative with respect to the clocksource mask value, not necessarily negative if cast to a s64, but we can check by checking the delta to see if it is a small (relative to the mask) negative value (again negative relative to the mask). If so, we assume we jumped backwards somehow and instead use zero for our delta. Signed-off-by: John Stultz <john.stultz@linaro.org> Cc: Dave Jones <davej@codemonkey.org.uk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Stephen Boyd <sboyd@codeaurora.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1426133800-29329-7-git-send-email-john.stultz@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
a558cd02 |
|
11-Mar-2015 |
John Stultz <john.stultz@linaro.org> |
timekeeping: Add checks to cap clocksource reads to the 'max_cycles' value When calculating the current delta since the last tick, we currently have no hard protections to prevent a multiplication overflow from occuring. This patch introduces infrastructure to allow a cap that limits the clocksource read delta value to the 'max_cycles' value, which is where an overflow would occur. Since this is in the hotpath, it adds the extra checking under CONFIG_DEBUG_TIMEKEEPING=y. There was some concern that capping time like this could cause problems as we may stop expiring timers, which could go circular if the timer that triggers time accumulation were mis-scheduled too far in the future, which would cause time to stop. However, since the mult overflow would result in a smaller time value, we would effectively have the same problem there. Signed-off-by: John Stultz <john.stultz@linaro.org> Cc: Dave Jones <davej@codemonkey.org.uk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Stephen Boyd <sboyd@codeaurora.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1426133800-29329-6-git-send-email-john.stultz@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
3c17ad19 |
|
11-Mar-2015 |
John Stultz <john.stultz@linaro.org> |
timekeeping: Add debugging checks to warn if we see delays Recently there's been requests for better sanity checking in the time code, so that it's more clear when something is going wrong, since timekeeping issues could manifest in a large number of strange ways in various subsystems. Thus, this patch adds some extra infrastructure to add a check to update_wall_time() to print two new warnings: 1) if we see the call delayed beyond the 'max_cycles' overflow point, 2) or if we see the call delayed beyond the clocksource's 'max_idle_ns' value, which is currently 50% of the overflow point. This extra infrastructure is conditional on a new CONFIG_DEBUG_TIMEKEEPING option, also added in this patch - default off. Tested this a bit by halting qemu for specified lengths of time to trigger the warnings. Signed-off-by: John Stultz <john.stultz@linaro.org> Cc: Dave Jones <davej@codemonkey.org.uk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Stephen Boyd <sboyd@codeaurora.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1426133800-29329-5-git-send-email-john.stultz@linaro.org [ Improved the changelog and the messages a bit. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
124cf911 |
|
13-Feb-2015 |
Rafael J. Wysocki <rafael.j.wysocki@intel.com> |
PM / sleep: Make it possible to quiesce timers during suspend-to-idle The efficiency of suspend-to-idle depends on being able to keep CPUs in the deepest available idle states for as much time as possible. Ideally, they should only be brought out of idle by system wakeup interrupts. However, timer interrupts occurring periodically prevent that from happening and it is not practical to chase all of the "misbehaving" timers in a whack-a-mole fashion. A much more effective approach is to suspend the local ticks for all CPUs and the entire timekeeping along the lines of what is done during full suspend, which also helps to keep suspend-to-idle and full suspend reasonably similar. The idea is to suspend the local tick on each CPU executing cpuidle_enter_freeze() and to make the last of them suspend the entire timekeeping. That should prevent timer interrupts from triggering until an IO interrupt wakes up one of the CPUs. It needs to be done with interrupts disabled on all of the CPUs, though, because otherwise the suspended clocksource might be accessed by an interrupt handler which might lead to fatal consequences. Unfortunately, the existing ->enter callbacks provided by cpuidle drivers generally cannot be used for implementing that, because some of them re-enable interrupts temporarily and some idle entry methods cause interrupts to be re-enabled automatically on exit. Also some of these callbacks manipulate local clock event devices of the CPUs which really shouldn't be done after suspending their ticks. To overcome that difficulty, introduce a new cpuidle state callback, ->enter_freeze, that will be guaranteed (1) to keep interrupts disabled all the time (and return with interrupts disabled) and (2) not to touch the CPU timer devices. Modify cpuidle_enter_freeze() to look for the deepest available idle state with ->enter_freeze present and to make the CPU execute that callback with suspended tick (and the last of the online CPUs to execute it with suspended timekeeping). Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
|
#
060407ae |
|
13-Feb-2015 |
Rafael J. Wysocki <rafael.j.wysocki@intel.com> |
timekeeping: Make it safe to use the fast timekeeper while suspended Theoretically, ktime_get_mono_fast_ns() may be executed after timekeeping has been suspended (or before it is resumed) which in turn may lead to undefined behavior, for example, when the clocksource read from timekeeping_get_ns() called by it is not accessible at that time. Prevent that from happening by setting up a dummy readout base for the fast timekeeper during timekeeping_suspend() such that it will always return the same number of cycles. After the last timekeeping_update() in timekeeping_suspend() the clocksource is read and the result is stored as cycles_at_suspend. The readout base from the current timekeeper is copied onto the dummy and the ->read pointer of the dummy is set to a routine unconditionally returning cycles_at_suspend. Next, the dummy is passed to update_fast_timekeeper(). Then, ktime_get_mono_fast_ns() will work until the subsequent timekeeping_resume() and the proper readout base for the fast timekeeper will be restored by the timekeeping_update() called right after clearing timekeeping_suspended. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: John Stultz <john.stultz@linaro.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
|
#
affe3e85 |
|
10-Feb-2015 |
Rafael J. Wysocki <rafael.j.wysocki@intel.com> |
timekeeping: Pass readout base to update_fast_timekeeper() Modify update_fast_timekeeper() to take a struct tk_read_base pointer as its argument (instead of a struct timekeeper pointer) and update its kerneldoc comment to reflect that. That will allow a struct tk_read_base that is not part of a struct timekeeper to be passed to it in the next patch. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: John Stultz <john.stultz@linaro.org>
|
#
d08c0cdd |
|
08-Dec-2014 |
John Stultz <john.stultz@linaro.org> |
time: Expose getboottime64 for in-kernel uses Adds a timespec64 based getboottime64() implementation that can be used as we convert internal users of getboottime away from using timespecs. Cc: pang.xunlei <pang.xunlei@linaro.org> Cc: Arnd Bergmann <arnd.bergmann@linaro.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
cb2aa634 |
|
24-Nov-2014 |
John Stultz <john.stultz@linaro.org> |
time: Fix sign bug in NTP mult overflow warning In commit 6067dc5a8c2b ("time: Avoid possible NTP adjustment mult overflow") a new check was added to watch for adjustments that could cause a mult overflow. Unfortunately the check compares a signed with unsigned value and ignored the case where the adjustment was negative, which causes spurious warn-ons on some systems (and seems like it would result in problematic time adjustments there as well, due to the early return). Thus this patch adds a check to make sure the adjustment is positive before we check for an overflow, and resovles the issue in my testing. Reported-by: Fengguang Wu <fengguang.wu@intel.com> Debugged-by: pang.xunlei <pang.xunlei@linaro.org> Signed-off-by: John Stultz <john.stultz@linaro.org> Link: http://lkml.kernel.org/r/1416890145-30048-1-git-send-email-john.stultz@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
5322e4c2 |
|
07-Nov-2014 |
John Stultz <john.stultz@linaro.org> |
time: Fixup comments to reflect usage of timespec64 Fix up a few comments that weren't updated when the functions were converted to use timespec64 structures. Acked-by: Arnd Bergmann <arnd.bergmann@linaro.org> Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
334334b5 |
|
07-Nov-2014 |
John Stultz <john.stultz@linaro.org> |
time: Expose get_monotonic_coarse64() for in-kernel uses Adds a timespec64 based get_monotonic_coarse64() implementation that can be used as we convert internal users of get_monotonic_coarse away from using timespecs. Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
cdba2ec5 |
|
07-Nov-2014 |
John Stultz <john.stultz@linaro.org> |
time: Expose getrawmonotonic64 for in-kernel uses Adds a timespec64 based getrawmonotonic64() implementation that can be used as we convert internal users of getrawmonotonic away from using timespecs. Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
04d90890 |
|
18-Nov-2014 |
pang.xunlei <pang.xunlei@linaro.org> |
time: Provide y2038 safe timekeeping_inject_sleeptime() replacement As part of addressing "y2038 problem" for in-kernel uses, this patch adds timekeeping_inject_sleeptime64() using timespec64. After this patch, timekeeping_inject_sleeptime() is deprecated and all its call sites will be fixed using the new interface, after that it can be removed. NOTE: timekeeping_inject_sleeptime() is safe actually, but we want to eliminate timespec eventually, so comes this patch. Signed-off-by: pang.xunlei <pang.xunlei@linaro.org> Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
21f7eca5 |
|
18-Nov-2014 |
pang.xunlei <pang.xunlei@linaro.org> |
time: Provide y2038 safe do_settimeofday() replacement The kernel uses 32-bit signed value(time_t) for seconds elapsed 1970-01-01:00:00:00, thus it will overflow at 2038-01-19 03:14:08 on 32-bit systems. This is widely known as the y2038 problem. As part of addressing "y2038 problem" for in-kernel uses, this patch adds safe do_settimeofday64() using timespec64. After this patch, do_settimeofday() is deprecated and all its call sites will be fixed using do_settimeofday64(), after that it can be removed. Signed-off-by: pang.xunlei <pang.xunlei@linaro.org> Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
659bc17b |
|
09-Oct-2014 |
pang.xunlei <pang.xunlei@linaro.org> |
time: Complete NTP adjustment threshold judging conditions The clocksource mult-adjustment threshold is [mult-maxadj, mult+maxadj], timekeeping_adjust() only deals with the upper threshold, but misses the lower threshold. This patch adds the lower threshold judging condition. Signed-off-by: pang.xunlei <pang.xunlei@linaro.org> [jstultz: Minor fix for > 80 char line] Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
6067dc5a |
|
08-Oct-2014 |
pang.xunlei <pang.xunlei@linaro.org> |
time: Avoid possible NTP adjustment mult overflow. Ideally, __clocksource_updatefreq_scale, selects the largest shift value possible for a clocksource. This results in the mult memember of struct clocksource being particularly large, although not so large that NTP would adjust the clock to cause it to overflow. That said, nothing actually prohibits an overflow from occuring, its just that it "shouldn't" occur. So while very unlikely, and so far never observed, the value of (cs->mult+cs->maxadj) may have a chance to reach very near 0xFFFFFFFF, so there is a possibility it may overflow when doing NTP positive adjustment See the following detail: When NTP slewes the clock, kernel goes through update_wall_time()->...->timekeeping_apply_adjustment(): tk->tkr.mult += mult_adj; Since there is no guard against it, its possible tk->tkr.mult may overflow during this operation. This patch avoids any possible mult overflow by judging the overflow case before adding mult_adj to mult, also adds the WARNING message when capturing such case. Signed-off-by: pang.xunlei <pang.xunlei@linaro.org> [jstultz: Reworded commit message] Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
dbe7aa62 |
|
29-Oct-2014 |
Heena Sirwani <heenasirwani@gmail.com> |
timekeeping: Provide y2038 safe accessor to the seconds portion of CLOCK_REALTIME ktime_get_real_seconds() is the replacement function for get_seconds() returning the seconds portion of CLOCK_REALTIME in a time64_t. For 64bit the function is equivivalent to get_seconds(), but for 32bit it protects the readout with the timekeeper sequence count. This is required because 32-bit machines cannot access 64-bit tk->xtime_sec variable atomically. [tglx: Massaged changelog and added docbook comment ] Signed-off-by: Heena Sirwani <heenasirwani@gmail.com> Reviewed-by: Arnd Bergman <arnd@arndb.de> Cc: John Stultz <john.stultz@linaro.org> Cc: opw-kernel@googlegroups.com Link: http://lkml.kernel.org/r/7adcfaa8962b8ad58785d9a2456c3f77d93c0ffb.1414578445.git.heenasirwani@gmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
9e3680b1 |
|
29-Oct-2014 |
Heena Sirwani <heenasirwani@gmail.com> |
timekeeping: Provide fast accessor to the seconds part of CLOCK_MONOTONIC This is the counterpart to get_seconds() based on CLOCK_MONOTONIC. The use case for this interface are kernel internal coarse grained timestamps which do neither require the nanoseconds fraction of current time nor the CLOCK_REALTIME properties. Such timestamps can currently only retrieved by calling ktime_get_ts64() and using the tv_sec field of the returned timespec64. That's inefficient as it involves the read of the clocksource, math operations and must be protected by the timekeeper sequence counter. To avoid the sequence counter protection we restrict the return value to unsigned 32bit on 32bit machines. This covers ~136 years of uptime and therefor an overflow is not expected to hit anytime soon. To avoid math in the function we calculate the current seconds portion of CLOCK_MONOTONIC when the timekeeper gets updated in tk_update_ktime_data() similar to the CLOCK_REALTIME counterpart xtime_sec. [ tglx: Massaged changelog, simplified and commented the update function, added docbook comment ] Signed-off-by: Heena Sirwani <heenasirwani@gmail.com> Reviewed-by: Arnd Bergman <arnd@arndb.de> Cc: John Stultz <john.stultz@linaro.org> Cc: opw-kernel@googlegroups.com Link: http://lkml.kernel.org/r/da0b63f4bdf3478909f92becb35861197da3a905.1414578445.git.heenasirwani@gmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
9bf2419f |
|
05-Sep-2014 |
Thomas Gleixner <tglx@linutronix.de> |
timekeeping: Update timekeeper before updating vsyscall and pvclock The update_walltime() code works on the shadow timekeeper to make the seqcount protected region as short as possible. But that update to the shadow timekeeper does not update all timekeeper fields because it's sufficient to do that once before it becomes life. One of these fields is tkr.base_mono. That stays stale in the shadow timekeeper unless an operation happens which copies the real timekeeper to the shadow. The update function is called after the update calls to vsyscall and pvclock. While not correct, it did not cause any problems because none of the invoked update functions used base_mono. commit cbcf2dd3b3d4 (x86: kvm: Make kvm_get_time_and_clockread() nanoseconds based) changed that in the kvm pvclock update function, so the stale mono_base value got used and caused kvm-clock to malfunction. Put the update where it belongs and fix the issue. Reported-by: Chris J Arges <chris.j.arges@canonical.com> Reported-by: Paolo Bonzini <pbonzini@redhat.com> Cc: Gleb Natapov <gleb@kernel.org> Cc: John Stultz <john.stultz@linaro.org> Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1409050000570.3333@nanos Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
0680eb1f |
|
13-Aug-2014 |
John Stultz <john.stultz@linaro.org> |
timekeeping: Another fix to the VSYSCALL_OLD update_vsyscall Benjamin Herrenschmidt pointed out that I further missed modifying update_vsyscall after the wall_to_mono value was changed to a timespec64. This causes issues on powerpc32, which expects a 32bit timespec. This patch fixes the problem by properly converting from a timespec64 to a timespec before passing the value on to the arch-specific vsyscall logic. [ Thomas is currently on vacation, but reviewed it and wanted me to send this fix on to you directly. ] Cc: LKML <linux-kernel@vger.kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Reported-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
375f45b5 |
|
23-Apr-2014 |
John Stultz <john.stultz@linaro.org> |
timekeeping: Use cached ntp_tick_length when accumulating error By caching the ntp_tick_length() when we correct the frequency error, and then using that cached value to accumulate error, we avoid large initial errors when the tick length is changed. This makes convergence happen much faster in the simulator, since the initial error doesn't have to be slowly whittled away. This initially seems like an accounting error, but Miroslav pointed out that ntp_tick_length() can change mid-tick, so when we apply it in the error accumulation, we are applying any recent change to the entire tick. This approach chooses to apply changes in the ntp_tick_length() only to the next tick, which allows us to calculate the freq correction before using the new tick length, which avoids accummulating error. Credit to Miroslav for pointing this out and providing the original patch this functionality has been pulled out from, along with the rational. Cc: Miroslav Lichvar <mlichvar@redhat.com> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Reported-by: Miroslav Lichvar <mlichvar@redhat.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
dc491596 |
|
06-Dec-2013 |
John Stultz <john.stultz@linaro.org> |
timekeeping: Rework frequency adjustments to work better w/ nohz The existing timekeeping_adjust logic has always been complicated to understand. Further, since it was developed prior to NOHZ becoming common, its not surprising it performs poorly when NOHZ is enabled. Since Miroslav pointed out the problematic nature of the existing code in the NOHZ case, I've tried to refactor the code to perform better. The problem with the previous approach was that it tried to adjust for the total cumulative error using a scaled dampening factor. This resulted in large errors to be corrected slowly, while small errors were corrected quickly. With NOHZ the timekeeping code doesn't know how far out the next tick will be, so this results in bad over-correction to small errors, and insufficient correction to large errors. Inspired by Miroslav's patch, I've refactored the code to try to address the correction in two steps. 1) Check the future freq error for the next tick, and if the frequency error is large, try to make sure we correct it so it doesn't cause much accumulated error. 2) Then make a small single unit adjustment to correct any cumulative error that has collected over time. This method performs fairly well in the simulator Miroslav created. Major credit to Miroslav for pointing out the issue, providing the original patch to resolve this, a simulator for testing, as well as helping debug and resolve issues in my implementation so that it performed closer to his original implementation. Cc: Miroslav Lichvar <mlichvar@redhat.com> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Reported-by: Miroslav Lichvar <mlichvar@redhat.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
e2dff1ec |
|
23-Jul-2014 |
John Stultz <john.stultz@linaro.org> |
timekeeping: Minor fixup for timespec64->timespec assignment In the GENERIC_TIME_VSYSCALL_OLD update_vsyscall implementation, we take the tk_xtime() value, which returns a timespec64, and store it in a timespec. This luckily is ok, since the only architectures that use GENERIC_TIME_VSYSCALL_OLD are ia64 and ppc64, which are both 64 bit systems where timespec64 is the same as a timespec. Even so, for cleanliness reasons, use the conversion function to assign the proper type. Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
4396e058 |
|
16-Jul-2014 |
Thomas Gleixner <tglx@linutronix.de> |
timekeeping: Provide fast and NMI safe access to CLOCK_MONOTONIC Tracers want a correlated time between the kernel instrumentation and user space. We really do not want to export sched_clock() to user space, so we need to provide something sensible for this. Using separate data structures with an non blocking sequence count based update mechanism allows us to do that. The data structure required for the readout has a sequence counter and two copies of the timekeeping data. On the update side: smp_wmb(); tkf->seq++; smp_wmb(); update(tkf->base[0], tk); smp_wmb(); tkf->seq++; smp_wmb(); update(tkf->base[1], tk); On the reader side: do { seq = tkf->seq; smp_rmb(); idx = seq & 0x01; now = now(tkf->base[idx]); smp_rmb(); } while (seq != tkf->seq) So if a NMI hits the update of base[0] it will use base[1] which is still consistent, but this timestamp is not guaranteed to be monotonic across an update. The timestamp is calculated by: now = base_mono + clock_delta * slope So if the update lowers the slope, readers who are forced to the not yet updated second array are still using the old steeper slope. tmono ^ | o n | o n | u | o |o |12345678---> reader order o = old slope u = update n = new slope So reader 6 will observe time going backwards versus reader 5. While other CPUs are likely to be able observe that, the only way for a CPU local observation is when an NMI hits in the middle of the update. Timestamps taken from that NMI context might be ahead of the following timestamps. Callers need to be aware of that and deal with it. V2: Got rid of clock monotonic raw and reorganized the data structures. Folded in the barrier fix from Mathieu. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
0e5ac3a8 |
|
16-Jul-2014 |
Thomas Gleixner <tglx@linutronix.de> |
timekeeping: Use tk_read_base as argument for timekeeping_get_ns() All the function needs is in the tk_read_base struct. No functional change for the current code, just a preparatory patch for the NMI safe accessor to clock monotonic which will use struct tk_read_base as well. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
d28ede83 |
|
16-Jul-2014 |
Thomas Gleixner <tglx@linutronix.de> |
timekeeping: Create struct tk_read_base and use it in struct timekeeper The members of the new struct are the required ones for the new NMI safe accessor to clcok monotonic. In order to reuse the existing timekeeping code and to make the update of the fast NMI safe timekeepers a simple memcpy use the struct for the timekeeper as well and convert all users. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
6d3aadf3 |
|
16-Jul-2014 |
Thomas Gleixner <tglx@linutronix.de> |
timekeeping: Restructure the timekeeper some more Access to time requires to touch two cachelines at minimum 1) The timekeeper data structure 2) The clocksource data structure The access to the clocksource data structure can be avoided as almost all clocksource implementations ignore the argument to the read callback, which is a pointer to the clocksource. But the core needs to touch it to access the members @read and @mask. So we are better off by copying the @read function pointer and the @mask from the clocksource to the core data structure itself. For the most used ktime_get() access all required data including the @read and @mask copies fits together with the sequence counter into a single 64 byte cacheline. For the other time access functions we touch in the current code three cache lines in the worst case. But with the clocksource data copies we can reduce that to two adjacent cachelines, which is more efficient than disjunct cache lines. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
4a0e6377 |
|
16-Jul-2014 |
Thomas Gleixner <tglx@linutronix.de> |
clocksource: Get rid of cycle_last cycle_last was added to the clocksource to support the TSC validation. We moved that to the core code, so we can get rid of the extra copy. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
3a978377 |
|
16-Jul-2014 |
Thomas Gleixner <tglx@linutronix.de> |
clocksource: Make delta calculation a function We want to move the TSC sanity check into core code to make NMI safe accessors to clock monotonic[_raw] possible. For this we need to sanity check the delta calculation. Create a helper function and convert all sites to use it. [ Build fix from jstultz ] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
f519b1a2 |
|
16-Jul-2014 |
Thomas Gleixner <tglx@linutronix.de> |
timekeeping: Provide ktime_get_raw() Provide a ktime_t based interface for raw monotonic time. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
61edec81 |
|
16-Jul-2014 |
Thomas Gleixner <tglx@linutronix.de> |
timekeeping: Simplify timekeeping_clocktai() timekeeping_clocktai() is not used in fast pathes, so the extra timespec conversion is not problematic. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
47da70d3 |
|
16-Jul-2014 |
Thomas Gleixner <tglx@linutronix.de> |
timekeeping: Remove timekeeper.total_sleep_time No more users. Remove it Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
02cba159 |
|
16-Jul-2014 |
Thomas Gleixner <tglx@linutronix.de> |
timekeeping: Simplify getboottime() Subtracting plain nsec values and converting to timespec is simpler than the whole timespec math. Not really fastpath code, so the division is not an issue. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
48f18fd6 |
|
16-Jul-2014 |
Thomas Gleixner <tglx@linutronix.de> |
timekeeping: Use ktime_get_boottime() for get_monotonic_boottime() get_monotonic_boottime() is not used in fast pathes, so the extra timespec conversion is not problematic. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
250fade8 |
|
16-Jul-2014 |
Thomas Gleixner <tglx@linutronix.de> |
timekeeping: Remove monotonic_to_bootbased No more users. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
dcaab54e |
|
16-Jul-2014 |
Thomas Gleixner <tglx@linutronix.de> |
timekeeping: Remove ktime_get_monotonic_offset() No more users. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
9a6b5197 |
|
16-Jul-2014 |
Thomas Gleixner <tglx@linutronix.de> |
timekeeping: Provide ktime_mono_to_any() ktime based conversion function to map a monotonic time stamp to a different CLOCK. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
48064f5f |
|
16-Jul-2014 |
Thomas Gleixner <tglx@linutronix.de> |
timekeeping; Use ktime based data for ktime_get_update_offsets_tick() No need to juggle with timespecs. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
a37c0aad |
|
16-Jul-2014 |
Thomas Gleixner <tglx@linutronix.de> |
timekeeping: Use ktime_t data for ktime_get_update_offsets_now() No need to juggle with timespecs. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
afab07c0 |
|
16-Jul-2014 |
Thomas Gleixner <tglx@linutronix.de> |
timekeeping: Use ktime_t based data for ktime_get_clocktai() Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
b82c817e |
|
16-Jul-2014 |
Thomas Gleixner <tglx@linutronix.de> |
timekeeping; Use ktime_t based data for ktime_get_boottime() Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
f5264d5d |
|
16-Jul-2014 |
Thomas Gleixner <tglx@linutronix.de> |
timekeeping: Use ktime_t based data for ktime_get_real() Speed up the readout. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
0077dc60 |
|
16-Jul-2014 |
Thomas Gleixner <tglx@linutronix.de> |
timekeeping: Provide ktime_get_with_offset() Provide a helper function which lets us implement ktime_t based interfaces for real, boot and tai clocks. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
a016a5bd |
|
16-Jul-2014 |
Thomas Gleixner <tglx@linutronix.de> |
timekeeping: Use ktime_t based data for ktime_get() Speed up ktime_get() by using ktime_t based data. Text size shrinks by 64 bytes on x8664. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
7c032df5 |
|
16-Jul-2014 |
Thomas Gleixner <tglx@linutronix.de> |
timekeeping: Provide internal ktime_t based data The ktime_t based interfaces are used a lot in performance critical code pathes. Add ktime_t based data so the interfaces don't have to convert from the xtime/timespec based data. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
f111adfd |
|
16-Jul-2014 |
Thomas Gleixner <tglx@linutronix.de> |
timekeeping: Use timekeeping_update() instead of memcpy() We already have a function which does the right thing, that also makes sure that the coming ktime_t based cached values are getting updated. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
3fdb14fd |
|
16-Jul-2014 |
Thomas Gleixner <tglx@linutronix.de> |
timekeeping: Cache optimize struct timekeeper struct timekeeper is quite badly sorted for the hot readout path. Most time access functions need to load two cache lines. Rearrange it so ktime_get() and getnstimeofday() are happy with a single cache line. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
c905fae4 |
|
16-Jul-2014 |
Thomas Gleixner <tglx@linutronix.de> |
timekeeper: Move tk_xtime to core code No users outside of the core. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
d6d29896 |
|
16-Jul-2014 |
Thomas Gleixner <tglx@linutronix.de> |
timekeeping: Provide timespec64 based interfaces To convert callers of the core code to timespec64 we need to provide the proper interfaces. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
7d489d15 |
|
16-Jul-2014 |
John Stultz <john.stultz@linaro.org> |
timekeeping: Convert timekeeping core to use timespec64s Convert the core timekeeping logic to use timespec64s. This moves the 2038 issues out of the core logic and into all of the accessor functions. Future changes will need to push the timespec64s out to all timekeeping users, but that can be done interface by interface. Signed-off-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
24e4a8c3 |
|
16-Jul-2014 |
John Stultz <john.stultz@linaro.org> |
ktime: Kill non-scalar ktime_t implementation for 2038 The non-scalar ktime_t implementation is basically a timespec which has to be changed to support dates past 2038 on 32bit systems. This patch removes the non-scalar ktime_t implementation, forcing the scalar s64 nanosecond version on all architectures. This may have additional performance overhead on some 32bit systems when converting between ktime_t and timespec structures, however the majority of 32bit systems (arm and i386) were already using scalar ktime_t, so no performance regressions will be seen on those platforms. On affected platforms, I'm open to finding optimizations, including avoiding converting to timespecs where possible. [ tglx: We can now cleanup the ktime_t.tv64 mess, but thats a different issue and we can throw a coccinelle script at it ] Signed-off-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
76f41088 |
|
16-Jul-2014 |
John Stultz <john.stultz@linaro.org> |
hrtimer: Cleanup hrtimer accessors to the timekepeing state Rather then having two similar but totally different implementations that provide timekeeping state to the hrtimer code, try to unify the two implementations to be more simliar. Thus this clarifies ktime_get_update_offsets to ktime_get_update_offsets_now and changes get_xtime... to ktime_get_update_offsets_tick. Signed-off-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
e06fde37 |
|
16-Jul-2014 |
Thomas Gleixner <tglx@linutronix.de> |
timekeeping: Simplify arch_gettimeoffset() Provide a default stub function instead of having the extra conditional. Cuts binary size on a m68k build by ~100 bytes. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
6d9bcb62 |
|
04-Jun-2014 |
John Stultz <john.stultz@linaro.org> |
timekeeping: use printk_deferred when holding timekeeping seqlock Jiri Bohac pointed out that there are rare but potential deadlock possibilities when calling printk while holding the timekeeping seqlock. This is due to printk() triggering console sem wakeup, which can cause scheduling code to trigger hrtimers which may try to read the time. Specifically, as Jiri pointed out, that path is: printk vprintk_emit console_unlock up(&console_sem) __up wake_up_process try_to_wake_up ttwu_do_activate ttwu_activate activate_task enqueue_task enqueue_task_fair hrtick_update hrtick_start_fair hrtick_start_fair get_time ktime_get --> endless loop on read_seqcount_retry(&timekeeper_seq, ...) This patch tries to avoid this issue by using printk_deferred (previously named printk_sched) which should defer printing via a irq_work_queue. Signed-off-by: John Stultz <john.stultz@linaro.org> Reported-by: Jiri Bohac <jbohac@suse.cz> Reviewed-by: Steven Rostedt <rostedt@goodmis.org> Cc: Jan Kara <jack@suse.cz> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
52f5684c |
|
07-Apr-2014 |
Gideon Israel Dsouza <gidisrael@gmail.com> |
kernel: use macros from compiler.h instead of __attribute__((...)) To increase compiler portability there is <linux/compiler.h> which provides convenience macros for various gcc constructs. Eg: __weak for __attribute__((weak)). I've replaced all instances of gcc attributes with the right macro in the kernel subsystem. Signed-off-by: Gideon Israel Dsouza <gidisrael@gmail.com> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
cab5e127 |
|
27-Mar-2014 |
John Stultz <john.stultz@linaro.org> |
time: Revert to calling clock_was_set_delayed() while in irq context In commit 47a1b796306356f35 ("tick/timekeeping: Call update_wall_time outside the jiffies lock"), we moved to calling clock_was_set() due to the fact that we were no longer holding the timekeeping or jiffies lock. However, there is still the problem that clock_was_set() triggers an IPI, which cannot be done from the timer's hard irq context, and will generate WARN_ON warnings. Apparently in my earlier testing, I'm guessing I didn't bump the dmesg log level, so I somehow missed the WARN_ONs. Thus we need to revert back to calling clock_was_set_delayed(). Signed-off-by: John Stultz <john.stultz@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1395963049-11923-1-git-send-email-john.stultz@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
38aef31c |
|
23-Dec-2013 |
John Stultz <john.stultz@linaro.org> |
timekeeping: Remove comment that's mostly out of date Prior to 92bb1fcf57a0c2e45f7e67fbf0a8ed475a749236 (Only do nanosecond rounding on GENERIC_TIME_VSYSCALL_OLD systems), the comment here was accuate, but now we can mostly avoid the extra rounding which causes the unlikey to be actually likely here. So remove the out of date comment. Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
d26e4fe0 |
|
28-Nov-2013 |
Yijing Wang <wangyijing@huawei.com> |
timekeeper: fix comment typo for tk_setup_internals() Fix trivial comment typo for tk_setup_internals(). Signed-off-by: Yijing Wang <wangyijing@huawei.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
330a1617 |
|
11-Dec-2013 |
John Stultz <john.stultz@linaro.org> |
timekeeping: Fix missing timekeeping_update in suspend path Since 48cdc135d4840 (Implement a shadow timekeeper), we have to call timekeeping_update() after any adjustment to the timekeeping structure in order to make sure that any adjustments to the structure persist. In the timekeeping suspend path, we udpate the timekeeper structure, so we should be sure to update the shadow-timekeeper before releasing the timekeeping locks. Currently this isn't done. In most cases, the next time related code to run would be timekeeping_resume, which does update the shadow-timekeeper, but in an abundence of caution, this patch adds the call to timekeeping_update() in the suspend path. Cc: Sasha Levin <sasha.levin@oracle.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: stable <stable@vger.kernel.org> #3.10+ Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
04005f60 |
|
10-Dec-2013 |
John Stultz <john.stultz@linaro.org> |
timekeeping: Fix CLOCK_TAI timer/nanosleep delays A think-o in the calculation of the monotonic -> tai time offset results in CLOCK_TAI timers and nanosleeps to expire late (the latency is ~2x the tai offset). Fix this by adding the tai offset from the realtime offset instead of subtracting. Cc: Sasha Levin <sasha.levin@oracle.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: stable <stable@vger.kernel.org> #3.10+ Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
47a1b796 |
|
12-Dec-2013 |
John Stultz <john.stultz@linaro.org> |
tick/timekeeping: Call update_wall_time outside the jiffies lock Since the xtime lock was split into the timekeeping lock and the jiffies lock, we no longer need to call update_wall_time() while holding the jiffies lock. Thus, this patch splits update_wall_time() out from do_timer(). This allows us to get away from calling clock_was_set_delayed() in update_wall_time() and instead use the standard clock_was_set() call that previously would deadlock, as it causes the jiffies lock to be acquired. Cc: Sasha Levin <sasha.levin@oracle.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
6fdda9a9 |
|
10-Dec-2013 |
John Stultz <john.stultz@linaro.org> |
timekeeping: Avoid possible deadlock from clock_was_set_delayed As part of normal operaions, the hrtimer subsystem frequently calls into the timekeeping code, creating a locking order of hrtimer locks -> timekeeping locks clock_was_set_delayed() was suppoed to allow us to avoid deadlocks between the timekeeping the hrtimer subsystem, so that we could notify the hrtimer subsytem the time had changed while holding the timekeeping locks. This was done by scheduling delayed work that would run later once we were out of the timekeeing code. But unfortunately the lock chains are complex enoguh that in scheduling delayed work, we end up eventually trying to grab an hrtimer lock. Sasha Levin noticed this in testing when the new seqlock lockdep enablement triggered the following (somewhat abrieviated) message: [ 251.100221] ====================================================== [ 251.100221] [ INFO: possible circular locking dependency detected ] [ 251.100221] 3.13.0-rc2-next-20131206-sasha-00005-g8be2375-dirty #4053 Not tainted [ 251.101967] ------------------------------------------------------- [ 251.101967] kworker/10:1/4506 is trying to acquire lock: [ 251.101967] (timekeeper_seq){----..}, at: [<ffffffff81160e96>] retrigger_next_event+0x56/0x70 [ 251.101967] [ 251.101967] but task is already holding lock: [ 251.101967] (hrtimer_bases.lock#11){-.-...}, at: [<ffffffff81160e7c>] retrigger_next_event+0x3c/0x70 [ 251.101967] [ 251.101967] which lock already depends on the new lock. [ 251.101967] [ 251.101967] [ 251.101967] the existing dependency chain (in reverse order) is: [ 251.101967] -> #5 (hrtimer_bases.lock#11){-.-...}: [snipped] -> #4 (&rt_b->rt_runtime_lock){-.-...}: [snipped] -> #3 (&rq->lock){-.-.-.}: [snipped] -> #2 (&p->pi_lock){-.-.-.}: [snipped] -> #1 (&(&pool->lock)->rlock){-.-...}: [ 251.101967] [<ffffffff81194803>] validate_chain+0x6c3/0x7b0 [ 251.101967] [<ffffffff81194d9d>] __lock_acquire+0x4ad/0x580 [ 251.101967] [<ffffffff81194ff2>] lock_acquire+0x182/0x1d0 [ 251.101967] [<ffffffff84398500>] _raw_spin_lock+0x40/0x80 [ 251.101967] [<ffffffff81153e69>] __queue_work+0x1a9/0x3f0 [ 251.101967] [<ffffffff81154168>] queue_work_on+0x98/0x120 [ 251.101967] [<ffffffff81161351>] clock_was_set_delayed+0x21/0x30 [ 251.101967] [<ffffffff811c4bd1>] do_adjtimex+0x111/0x160 [ 251.101967] [<ffffffff811e2711>] compat_sys_adjtimex+0x41/0x70 [ 251.101967] [<ffffffff843a4b49>] ia32_sysret+0x0/0x5 [ 251.101967] -> #0 (timekeeper_seq){----..}: [snipped] [ 251.101967] other info that might help us debug this: [ 251.101967] [ 251.101967] Chain exists of: timekeeper_seq --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock#11 [ 251.101967] Possible unsafe locking scenario: [ 251.101967] [ 251.101967] CPU0 CPU1 [ 251.101967] ---- ---- [ 251.101967] lock(hrtimer_bases.lock#11); [ 251.101967] lock(&rt_b->rt_runtime_lock); [ 251.101967] lock(hrtimer_bases.lock#11); [ 251.101967] lock(timekeeper_seq); [ 251.101967] [ 251.101967] *** DEADLOCK *** [ 251.101967] [ 251.101967] 3 locks held by kworker/10:1/4506: [ 251.101967] #0: (events){.+.+.+}, at: [<ffffffff81154960>] process_one_work+0x200/0x530 [ 251.101967] #1: (hrtimer_work){+.+...}, at: [<ffffffff81154960>] process_one_work+0x200/0x530 [ 251.101967] #2: (hrtimer_bases.lock#11){-.-...}, at: [<ffffffff81160e7c>] retrigger_next_event+0x3c/0x70 [ 251.101967] [ 251.101967] stack backtrace: [ 251.101967] CPU: 10 PID: 4506 Comm: kworker/10:1 Not tainted 3.13.0-rc2-next-20131206-sasha-00005-g8be2375-dirty #4053 [ 251.101967] Workqueue: events clock_was_set_work So the best solution is to avoid calling clock_was_set_delayed() while holding the timekeeping lock, and instead using a flag variable to decide if we should call clock_was_set() once we've released the locks. This works for the case here, where the do_adjtimex() was the deadlock trigger point. Unfortuantely, in update_wall_time() we still hold the jiffies lock, which would deadlock with the ipi triggered by clock_was_set(), preventing us from calling it even after we drop the timekeeping lock. So instead call clock_was_set_delayed() at that point. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: stable <stable@vger.kernel.org> #3.10+ Reported-by: Sasha Levin <sasha.levin@oracle.com> Tested-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
5258d3f2 |
|
11-Dec-2013 |
John Stultz <john.stultz@linaro.org> |
timekeeping: Fix potential lost pv notification of time change In 780427f0e11 (Indicate that clock was set in the pvclock gtod notifier), logic was added to pass a CLOCK_WAS_SET notification to the pvclock notifier chain. While that patch added a action flag returned from accumulate_nsecs_to_secs(), it only uses the returned value in one location, and not in the logarithmic accumulation. This means if a leap second triggered during the logarithmic accumulation (which is most likely where it would happen), the notification that the clock was set would not make it to the pv notifiers. This patch extends the logarithmic_accumulation pass down that action flag so proper notification will occur. This patch also changes the varialbe action -> clock_set per Ingo's suggestion. Cc: Sasha Levin <sasha.levin@oracle.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Richard Cochran <richardcochran@gmail.com> Cc: <xen-devel@lists.xen.org> Cc: stable <stable@vger.kernel.org> #3.11+ Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
f55c0760 |
|
11-Dec-2013 |
John Stultz <john.stultz@linaro.org> |
timekeeping: Fix lost updates to tai adjustment Since 48cdc135d4840 (Implement a shadow timekeeper), we have to call timekeeping_update() after any adjustment to the timekeeping structure in order to make sure that any adjustments to the structure persist. Unfortunately, the updates to the tai offset via adjtimex do not trigger this update, causing adjustments to the tai offset to be made and then over-written by the previous value at the next update_wall_time() call. This patch resovles the issue by calling timekeeping_update() right after setting the tai offset. Cc: Sasha Levin <sasha.levin@oracle.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: stable <stable@vger.kernel.org> #3.10+ Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
4be77398 |
|
22-Nov-2013 |
Martin Schwidefsky <schwidefsky@de.ibm.com> |
time: Fix 1ns/tick drift w/ GENERIC_TIME_VSYSCALL_OLD Since commit 1e75fa8be9f (time: Condense timekeeper.xtime into xtime_sec - merged in v3.6), there has been an problem with the error accounting in the timekeeping code, such that when truncating to nanoseconds, we round up to the next nsec, but the balancing adjustment to the ntp_error value was dropped. This causes 1ns per tick drift forward of the clock. In 3.7, this logic was isolated to only GENERIC_TIME_VSYSCALL_OLD architectures (s390, ia64, powerpc). The fix is simply to balance the accounting and to subtract the added nanosecond from ntp_error. This allows the internal long-term clock steering to keep the clock accurate. While this fix removes the regression added in 1e75fa8be9f, the ideal solution is to move away from GENERIC_TIME_VSYSCALL_OLD and use the new VSYSCALL method, which avoids entirely the nanosecond granular rounding, and the resulting short-term clock adjustment oscillation needed to keep long term accurate time. [ jstultz: Many thanks to Martin for his efforts identifying this subtle bug, and providing the fix. ] Originally-from: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Paul Turner <pjt@google.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: stable <stable@vger.kernel.org> #v3.6+ Link: http://lkml.kernel.org/r/1385149491-20307-1-git-send-email-john.stultz@linaro.org Signed-off-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
b7bc50e4 |
|
17-Oct-2013 |
Xie XiuQi <xiexiuqi@huawei.com> |
timekeeping: Fix some trivial typos in comments Fix some typos in timekeeping comments. Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com> [jstultz: Commit message tweaks] Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
1e4cfed1 |
|
17-Oct-2013 |
Xie XiuQi <xiexiuqi@huawei.com> |
timekeeping: Fix some trivial typos in comments Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
|
#
7bd36014 |
|
11-Sep-2013 |
John Stultz <john.stultz@linaro.org> |
timekeeping: Fix HRTICK related deadlock from ntp lock changes Gerlando Falauto reported that when HRTICK is enabled, it is possible to trigger system deadlocks. These were hard to reproduce, as HRTICK has been broken in the past, but seemed to be connected to the timekeeping_seq lock. Since seqlock/seqcount's aren't supported w/ lockdep, I added some extra spinlock based locking and triggered the following lockdep output: [ 15.849182] ntpd/4062 is trying to acquire lock: [ 15.849765] (&(&pool->lock)->rlock){..-...}, at: [<ffffffff810aa9b5>] __queue_work+0x145/0x480 [ 15.850051] [ 15.850051] but task is already holding lock: [ 15.850051] (timekeeper_lock){-.-.-.}, at: [<ffffffff810df6df>] do_adjtimex+0x7f/0x100 <snip> [ 15.850051] Chain exists of: &(&pool->lock)->rlock --> &p->pi_lock --> timekeeper_lock [ 15.850051] Possible unsafe locking scenario: [ 15.850051] [ 15.850051] CPU0 CPU1 [ 15.850051] ---- ---- [ 15.850051] lock(timekeeper_lock); [ 15.850051] lock(&p->pi_lock); [ 15.850051] lock(timekeeper_lock); [ 15.850051] lock(&(&pool->lock)->rlock); [ 15.850051] [ 15.850051] *** DEADLOCK *** The deadlock was introduced by 06c017fdd4dc48451a ("timekeeping: Hold timekeepering locks in do_adjtimex and hardpps") in 3.10 This patch avoids this deadlock, by moving the call to schedule_delayed_work() outside of the timekeeper lock critical section. Reported-by: Gerlando Falauto <gerlando.falauto@keymile.com> Tested-by: Lin Ming <minggr@gmail.com> Signed-off-by: John Stultz <john.stultz@linaro.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: stable <stable@vger.kernel.org> #3.11, 3.10 Link: http://lkml.kernel.org/r/1378943457-27314-1-git-send-email-john.stultz@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
780427f0 |
|
27-Jun-2013 |
David Vrabel <david.vrabel@citrix.com> |
timekeeping: Indicate that clock was set in the pvclock gtod notifier If the clock was set (stepped), set the action parameter to functions in the pvclock gtod notifier chain to non-zero. This allows the callee to only do work if the clock was stepped. This will be used on Xen as the synchronization of the Xen wallclock to the control domain's (dom0) system time will be done with this notifier and updating on every timer tick is unnecessary and too expensive. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: John Stultz <john.stultz@linaro.org> Cc: <xen-devel@lists.xen.org> Link: http://lkml.kernel.org/r/1372329348-20841-4-git-send-email-david.vrabel@citrix.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
04397fe9 |
|
27-Jun-2013 |
David Vrabel <david.vrabel@citrix.com> |
timekeeping: Pass flags instead of multiple bools to timekeeping_update() Instead of passing multiple bools to timekeeping_updated(), define flags and use a single 'action' parameter. It is then more obvious what each timekeeping_update() call does. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: John Stultz <john.stultz@linaro.org> Cc: <xen-devel@lists.xen.org> Link: http://lkml.kernel.org/r/1372329348-20841-3-git-send-email-david.vrabel@citrix.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
5c83545f |
|
21-May-2013 |
Colin Cross <ccross@android.com> |
power: Add option to log time spent in suspend Below is a patch from android kernel that maintains a histogram of suspend times. Please review and provide feedback. Statistices on the time spent in suspend are kept in /sys/kernel/debug/sleep_time. Cc: Android Kernel Team <kernel-team@android.com> Cc: Colin Cross <ccross@android.com> Cc: Todd Poynor <toddpoynor@google.com> Cc: San Mehat <san@google.com> Cc: Benoit Goby <benoit@android.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Colin Cross <ccross@android.com> Signed-off-by: Todd Poynor <toddpoynor@google.com> [zoran.markovic@linaro.org: Re-formatted suspend time table to better fit expected values. Moved accounting of suspend time into timekeeping core. Removed CONFIG_SUSPEND_TIME flag and made the feature conditional on CONFIG_DEBUG_FS. Changed the file name to sleep_time to better fit terminology in timekeeping core. Changed seq_printf to seq_puts. Tweaked commit message] Signed-off-by: Zoran Markovic <zoran.markovic@linaro.org> Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
0d6bd995 |
|
17-May-2013 |
Zoran Markovic <zoran.markovic@linaro.org> |
timekeeping: Correct run-time detection of persistent_clock. Since commit 31ade30692dc9680bfc95700d794818fa3f754ac, timekeeping_init() checks for presence of persistent clock by attempting to read a non-zero time value. This is an issue on platforms where persistent_clock (instead is implemented as a free-running counter (instead of an RTC) starting from zero on each boot and running during suspend. Examples are some ARM platforms (e.g. PandaBoard). An attempt to read such a clock during timekeeping_init() may return zero value and falsely declare persistent clock as missing. Additionally, in the above case suspend times may be accounted twice (once from timekeeping_resume() and once from rtc_resume()), resulting in a gradual drift of system time. This patch does a run-time correction of the issue by doing the same check during timekeeping_suspend(). A better long-term solution would have to return error when trying to read non-existing clock and zero when trying to read an uninitialized clock, but that would require changing all persistent_clock implementations. This patch addresses the immediate breakage, for now. Cc: John Stultz <john.stultz@linaro.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Feng Tang <feng.tang@intel.com> Cc: stable@vger.kernel.org Signed-off-by: Zoran Markovic <zoran.markovic@linaro.org> [jstultz: Tweaked commit message and subject] Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
09ac369c |
|
25-Apr-2013 |
Thomas Gleixner <tglx@linutronix.de> |
clocksource: Add module refcount Add a module refcount, so the current clocksource cannot be removed unconditionally. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: John Stultz <john.stultz@linaro.org> Cc: Magnus Damm <magnus.damm@gmail.com> Link: http://lkml.kernel.org/r/20130425143435.762417789@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
ba919d1c |
|
25-Apr-2013 |
Thomas Gleixner <tglx@linutronix.de> |
clocksource: Let timekeeping_notify return success/error timekeeping_notify() can fail due cs->enable() failure. Though the caller does not notice and happily keeps the wrong clocksource as the current one. Let the caller know about failure, so the current clocksource will be shown correctly in sysfs. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: John Stultz <john.stultz@linaro.org> Cc: Magnus Damm <magnus.damm@gmail.com> Link: http://lkml.kernel.org/r/20130425143435.696321912@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
77c675ba |
|
22-Apr-2013 |
Thomas Gleixner <tglx@linutronix.de> |
timekeeping: Update tk->cycle_last in resume commit 7ec98e15aa (timekeeping: Delay update of clock->cycle_last) forgot to update tk->cycle_last in the resume path. This results in a stale value versus clock->cycle_last and prevents resume in the worst case. Reported-by: Jiri Slaby <jslaby@suse.cz> Reported-and-tested-by: Borislav Petkov <bp@alien8.de> Acked-by: John Stultz <john.stultz@linaro.org> Cc: Linux-pm mailing list <linux-pm@lists.linux-foundation.org> Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1304211648150.21884@ionos Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
4e8f8b34 |
|
10-Apr-2013 |
John Stultz <john.stultz@linaro.org> |
timekeeping: Make sure to notify hrtimers when TAI offset changes Now that we have CLOCK_TAI timers, make sure we notify hrtimer code when TAI offset is changed. Signed-off-by: John Stultz <john.stultz@linaro.org> Link: http://lkml.kernel.org/r/1365622909-953-1-git-send-email-john.stultz@linaro.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
ca4523cd |
|
21-Feb-2013 |
Thomas Gleixner <tglx@linutronix.de> |
timekeeping: Shorten seq_count region Shorten the seqcount write hold region to the actual update of the timekeeper and the related data (e.g vsyscall). On a contemporary x86 system this reduces the maximum latencies on Preempt-RT from 8us to 4us on the non-timekeeping cores. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
48cdc135 |
|
21-Feb-2013 |
Thomas Gleixner <tglx@linutronix.de> |
timekeeping: Implement a shadow timekeeper Use the shadow timekeeper to do the update_wall_time() adjustments and then copy it over to the real timekeeper. Keep the shadow timekeeper in sync when updating stuff outside of update_wall_time(). This allows us to limit the timekeeper_seq hold time to the update of the real timekeeper and the vsyscall data in the next patch. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
7ec98e15 |
|
21-Feb-2013 |
Thomas Gleixner <tglx@linutronix.de> |
timekeeping: Delay update of clock->cycle_last For calculating the new timekeeper values store the new cycle_last value in the timekeeper and update the clock->cycle_last just when we actually update the new values. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
14a3b6ab |
|
21-Feb-2013 |
Thomas Gleixner <tglx@linutronix.de> |
timekeeping: Store cycle_last value in timekeeper struct as well For implementing a shadow timekeeper and a split calculation/update region we need to store the cycle_last value in the timekeeper and update the value in the clocksource struct only in the update region. Add the extra storage to the timekeeper. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
0b5154fb |
|
22-Mar-2013 |
John Stultz <john.stultz@linaro.org> |
timekeeping: Simplify tai updating from do_adjtimex Since we are taking the timekeeping locks, just go ahead and update any tai change directly, rather then dropping the lock and calling a function that will just take it again. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
06c017fd |
|
22-Mar-2013 |
John Stultz <john.stultz@linaro.org> |
timekeeping: Hold timekeepering locks in do_adjtimex and hardpps In moving the NTP state to be protected by the timekeeping locks, be sure to acquire the timekeeping locks prior to calling ntp functions. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
cef90377 |
|
22-Mar-2013 |
John Stultz <john.stultz@linaro.org> |
timekeeping: Move ADJ_SETOFFSET to top level do_adjtimex() Since ADJ_SETOFFSET adjusts the timekeeping state, process it as part of the top level do_adjtimex() function in timekeeping.c. This avoids deadlocks that could occur once we change the ntp locking rules. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
87ace39b |
|
22-Mar-2013 |
John Stultz <john.stultz@linaro.org> |
ntp: Rework do_adjtimex to take timespec and tai arguments In order to change the locking rules, we need to provide the timespec and tai values rather then having the ntp logic acquire these values itself. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
e4085693 |
|
22-Mar-2013 |
John Stultz <john.stultz@linaro.org> |
ntp: Move timex validation to timekeeping do_adjtimex call. Move logic that does not need the ntp state to be done in the timekeeping do_adjtimex() call. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
aa6f9c59 |
|
22-Mar-2013 |
John Stultz <john.stultz@linaro.org> |
ntp: Move do_adjtimex() and hardpps() functions to timekeeping.c In preparation for changing the ntp locking rules, move do_adjtimex and hardpps accessor functions to timekeeping.c, but keep the code logic in ntp.c. This patch also introduces a ntp_internal.h file so timekeeping specific interfaces of ntp.c can be more limitedly shared with timekeeping.c. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
dd5d70e8 |
|
25-Mar-2013 |
Fengguang Wu <fengguang.wu@intel.com> |
timekeeping: __timekeeping_set_tai_offset can be static Yet again, the kbuild test robot saves the day, noting I left out defining __timekeeping_set_tai_offset as static. It even sent me this patch. Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
9a7a71b1 |
|
21-Feb-2013 |
Thomas Gleixner <tglx@linutronix.de> |
timekeeping: Split timekeeper_lock into lock and seqcount We want to shorten the seqcount write hold time. So split the seqlock into a lock and a seqcount. Open code the seqwrite_lock in the places which matter and drop the sequence counter update where it's pointless. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> [jstultz: Merge fixups from CLOCK_TAI collisions] Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
7e40672d |
|
21-Feb-2013 |
Thomas Gleixner <tglx@linutronix.de> |
timekeeping: Move lock out of timekeeper struct Make the lock a separate entity. Preparatory patch for shadow timekeeper structure. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> [Merged with CLOCK_TAI changes] Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
eb93e4d9 |
|
21-Feb-2013 |
Thomas Gleixner <tglx@linutronix.de> |
timekeeping: Make jiffies_lock internal Nothing outside of the timekeeping core needs that lock. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
23a9537a |
|
21-Feb-2013 |
Thomas Gleixner <tglx@linutronix.de> |
timekeeping: Calc stuff once Calculate the cycle interval shifted value once. No functional change, just makes the code more readable. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
90adda98 |
|
21-Jan-2013 |
John Stultz <john.stultz@linaro.org> |
hrtimer: Add hrtimer support for CLOCK_TAI Add hrtimer support for CLOCK_TAI, as well as posix timer interfaces. Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
1ff3c967 |
|
03-May-2012 |
John Stultz <john.stultz@linaro.org> |
timekeeping: Add CLOCK_TAI clockid This add a CLOCK_TAI clockid and the needed accessors. CC: Thomas Gleixner <tglx@linutronix.de> CC: Eric Dumazet <eric.dumazet@gmail.com> CC: Richard Cochran <richardcochran@gmail.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
cc244dda |
|
03-May-2012 |
John Stultz <john.stultz@linaro.org> |
timekeeping: Move TAI managment into timekeeping core from ntp Currently NTP manages the TAI offset. Since there's plans for a CLOCK_TAI clockid, push the TAI management into the timekeeping core. CC: Thomas Gleixner <tglx@linutronix.de> CC: Eric Dumazet <eric.dumazet@gmail.com> CC: Richard Cochran <richardcochran@gmail.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
e445cf1c |
|
11-Mar-2013 |
Feng Tang <feng.tang@intel.com> |
timekeeping: utilize the suspend-nonstop clocksource to count suspended time There are some new processors whose TSC clocksource won't stop during suspend. Currently, after system resumes, kernel will use persistent clock or RTC to compensate the sleep time, but with these nonstop clocksources, we could skip the special compensation from external sources, and just use current clocksource for time recounting. This can solve some time drift bugs caused by some not-so-accurate or error-prone RTC devices. The current way to count suspended time is first try to use the persistent clock, and then try the RTC if persistent clock can't be used. This patch will change the trying order to: suspend-nonstop clocksource -> persistent clock -> RTC When counting the sleep time with nonstop clocksource, use an accurate way suggested by Jason Gunthorpe to cover very large delta cycles. Signed-off-by: Feng Tang <feng.tang@intel.com> [jstultz: Small optimization, avoiding re-reading the clocksource] Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
31ade306 |
|
15-Jan-2013 |
Feng Tang <feng.tang@intel.com> |
timekeeping: Add persistent_clock_exist flag In current kernel, there are several places which need to check whether there is a persistent clock for the platform. Current check is done by calling the read_persistent_clock() and validating its return value. So one optimization is to do the check only once in timekeeping_init(), and use a flag persistent_clock_exist to record it. v2: Add a has_persistent_clock() helper function, as suggested by John. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: John Stultz <john.stultz@linaro.org> Signed-off-by: Feng Tang <feng.tang@intel.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
1e817fb6 |
|
19-Nov-2012 |
Kees Cook <keescook@chromium.org> |
time: create __getnstimeofday for WARNless calls The pstore RAM backend can get called during resume, and must be defensive against a suspended time source. Expose getnstimeofday logic that returns an error instead of a WARN. This can be detected and the timestamp can be zeroed out. Reported-by: Doug Anderson <dianders@chromium.org> Cc: John Stultz <johnstul@us.ibm.com> Cc: Anton Vorontsov <anton.vorontsov@linaro.org> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
7b1f6207 |
|
07-Nov-2012 |
Stephen Warren <swarren@nvidia.com> |
time: convert arch_gettimeoffset to a pointer Currently, whenever CONFIG_ARCH_USES_GETTIMEOFFSET is enabled, each arch core provides a single implementation of arch_gettimeoffset(). In many cases, different sub-architectures, different machines, or different timer providers exist, and so the arch ends up implementing arch_gettimeoffset() as a call-through-pointer anyway. Examples are ARM, Cris, M68K, and it's arguable that the remaining architectures, M32R and Blackfin, should be doing this anyway. Modify arch_gettimeoffset so that it itself is a function pointer, which the arch initializes. This will allow later changes to move the initialization of this function into individual machine support or timer drivers. This is particularly useful for code in drivers/clocksource which should rely on an arch-independant mechanism to register their implementation of arch_gettimeoffset(). This patch also converts the Cris architecture to set arch_gettimeoffset directly to the final implementation in time_init(), because Cris already had separate time_init() functions per sub-architecture. M68K and ARM are converted to set arch_gettimeoffset to the final implementation in later patches, because they already have function pointers in place for this purpose. Cc: Russell King <linux@arm.linux.org.uk> Cc: Mike Frysinger <vapier@gentoo.org> Cc: Mikael Starvik <starvik@axis.com> Cc: Hirokazu Takata <takata@linux-m32r.org> Cc: Thomas Gleixner <tglx@linutronix.de> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Jesper Nilsson <jesper.nilsson@axis.com> Acked-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: Stephen Warren <swarren@nvidia.com>
|
#
e0b306fe |
|
27-Nov-2012 |
Marcelo Tosatti <mtosatti@redhat.com> |
time: export time information for KVM pvclock As suggested by John, export time data similarly to how its done by vsyscall support. This allows KVM to retrieve necessary information to implement vsyscall support in KVM guests. Acked-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
|
#
d6ad4187 |
|
28-Feb-2012 |
John Stultz <john.stultz@linaro.org> |
time: Kill xtime_lock, replacing it with jiffies_lock Now that timekeeping is protected by its own locks, rename the xtime_lock to jifffies_lock to better describe what it protects. CC: Thomas Gleixner <tglx@linutronix.de> CC: Eric Dumazet <eric.dumazet@gmail.com> CC: Richard Cochran <richardcochran@gmail.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
5b3900cd |
|
09-Oct-2012 |
Dan Carpenter <dan.carpenter@oracle.com> |
timekeeping: Cast raw_interval to u64 to avoid shift overflow We fixed a bunch of integer overflows in timekeeping code during the 3.6 cycle. I did an audit based on that and found this potential overflow. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: John Stultz <johnstul@us.ibm.com> Link: http://lkml.kernel.org/r/20121009071823.GA19159@elgon.mountain Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org
|
#
92bb1fcf |
|
04-Sep-2012 |
John Stultz <john.stultz@linaro.org> |
time: Only do nanosecond rounding on GENERIC_TIME_VSYSCALL_OLD systems We only do rounding to the next nanosecond so we don't see minor 1ns inconsistencies in the vsyscall implementations. Since we're changing the vsyscall implementations to avoid this, conditionalize the rounding only to the GENERIC_TIME_VSYSCALL_OLD architectures. Cc: Tony Luck <tony.luck@intel.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Paul Turner <pjt@google.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
576094b7 |
|
11-Sep-2012 |
John Stultz <john.stultz@linaro.org> |
time: Introduce new GENERIC_TIME_VSYSCALL Now that we moved everyone over to GENERIC_TIME_VSYSCALL_OLD, introduce the new declaration and config option for the new update_vsyscall method. Cc: Tony Luck <tony.luck@intel.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Paul Turner <pjt@google.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
70639421 |
|
04-Sep-2012 |
John Stultz <john.stultz@linaro.org> |
time: Convert CONFIG_GENERIC_TIME_VSYSCALL to CONFIG_GENERIC_TIME_VSYSCALL_OLD To help migrate archtectures over to the new update_vsyscall method, redfine CONFIG_GENERIC_TIME_VSYSCALL as CONFIG_GENERIC_TIME_VSYSCALL_OLD Cc: Tony Luck <tony.luck@intel.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Paul Turner <pjt@google.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
d7b4202e |
|
04-Sep-2012 |
John Stultz <john.stultz@linaro.org> |
time: Move timekeeper structure to timekeeper_internal.h for vsyscall changes We're going to need to access the timekeeper in update_vsyscall, so make the structure available for those who need it. Cc: Tony Luck <tony.luck@intel.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Paul Turner <pjt@google.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
ec145bab |
|
11-Sep-2012 |
John Stultz <john.stultz@linaro.org> |
time: Fix timeekeping_get_ns overflow on 32bit systems Daniel Lezcano reported seeing multi-second stalls from keyboard input on his T61 laptop when NOHZ and CPU_IDLE were enabled on a 32bit kernel. He bisected the problem down to commit 1e75fa8be9fb6 ("time: Condense timekeeper.xtime into xtime_sec"). After reproducing this issue, I narrowed the problem down to the fact that timekeeping_get_ns() returns a 64bit nsec value that hasn't been accumulated. In some cases this value was being then stored in timespec.tv_nsec (which is a long). On 32bit systems, with idle times larger then 4 seconds (or less, depending on the value of xtime_nsec), the returned nsec value would overflow 32bits. This limited kept time from increasing, causing timers to not expire. The fix is to make sure we don't directly store the result of timekeeping_get_ns() into a tv_nsec field, instead using a 64bit nsec value which can then be added into the timespec via timespec_add_ns(). Reported-and-bisected-by: Daniel Lezcano <daniel.lezcano@linaro.org> Tested-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: John Stultz <john.stultz@linaro.org> Acked-by: Prarit Bhargava <prarit@redhat.com> Cc: Richard Cochran <richardcochran@gmail.com> Link: http://lkml.kernel.org/r/1347405963-35715-1-git-send-email-john.stultz@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
adc78e6b |
|
05-Aug-2012 |
Rafael J. Wysocki <rjw@rjwysocki.net> |
timekeeping: Add suspend and resume of clock event devices Some clock event devices, for example such that belong to PM domains, need to be handled in a spcial way during the timekeeping suspend and resume (which takes place in the system core, or "syscore", stages of system power transitions) in analogy with clock sources. Introduce .suspend() and .resume() callbacks for clock event devices that will be executed by timekeeping_suspend/_resume(), respectively, next the the clock sources' .suspend() and .resume() callbacks. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
|
#
cee58483 |
|
31-Aug-2012 |
John Stultz <john.stultz@linaro.org> |
time: Move ktime_t overflow checking into timespec_valid_strict Andreas Bombe reported that the added ktime_t overflow checking added to timespec_valid in commit 4e8b14526ca7 ("time: Improve sanity checking of timekeeping inputs") was causing problems with X.org because it caused timeouts larger then KTIME_T to be invalid. Previously, these large timeouts would be clamped to KTIME_MAX and would never expire, which is valid. This patch splits the ktime_t overflow checking into a new timespec_valid_strict function, and converts the timekeeping codes internal checking to use this more strict function. Reported-and-tested-by: Andreas Bombe <aeb@debian.org> Cc: Zhouping Liu <zliu@redhat.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Signed-off-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
bf2ac312 |
|
21-Aug-2012 |
John Stultz <john.stultz@linaro.org> |
time: Avoid making adjustments if we haven't accumulated anything If update_wall_time() is called and the current offset isn't large enough to accumulate, avoid re-calling timekeeping_adjust which may change the clock freq and can cause 1ns inconsistencies with CLOCK_REALTIME_COARSE/CLOCK_MONOTONIC_COARSE. Signed-off-by: John Stultz <john.stultz@linaro.org> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/1345595449-34965-5-git-send-email-john.stultz@linaro.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
6ea565a9 |
|
21-Aug-2012 |
John Stultz <john.stultz@linaro.org> |
time: Avoid potential shift overflow with large shift values Andreas Schwab noticed that the 1 << tk->shift could overflow if the shift value was greater than 30, since 1 would be a 32bit long on 32bit architectures. This issue was introduced by 1e75fa8be (time: Condense timekeeper.xtime into xtime_sec) Use 1ULL instead to ensure we don't overflow on the shift. Reported-by: Andreas Schwab <schwab@linux-m68k.org> Signed-off-by: John Stultz <john.stultz@linaro.org> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Ingo Molnar <mingo@kernel.org> Link: http://lkml.kernel.org/r/1345595449-34965-4-git-send-email-john.stultz@linaro.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
85dc8f05 |
|
21-Aug-2012 |
Andreas Schwab <schwab@linux-m68k.org> |
time: Fix casting issue in timekeeping_forward_now arch_gettimeoffset returns a u32 value which when shifted by tk->shift can overflow. This issue was introduced with 1e75fa8be (time: Condense timekeeper.xtime into xtime_sec) Cast it to u64 first. Signed-off-by: Andreas Schwab <schwab@linux-m68k.org> Signed-off-by: John Stultz <john.stultz@linaro.org> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Ingo Molnar <mingo@kernel.org> Link: http://lkml.kernel.org/r/1345595449-34965-3-git-send-email-john.stultz@linaro.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
784ffcbb |
|
21-Aug-2012 |
John Stultz <john.stultz@linaro.org> |
time: Ensure we normalize the timekeeper in tk_xtime_add Andreas noticed problems with resume on specific hardware after commit 1e75fa8b (time: Condense timekeeper.xtime into xtime_sec) combined with commit b44d50dca (time: Fix casting issue in tk_set_xtime and tk_xtime_add) After some digging I realized we aren't normalizing the timekeeper after the add. Add the missing normalize call. Reported-by: Andreas Schwab <schwab@linux-m68k.org> Tested-by: Andreas Schwab <schwab@linux-m68k.org> Signed-off-by: John Stultz <john.stultz@linaro.org> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Ingo Molnar <mingo@kernel.org> Link: http://lkml.kernel.org/r/1345595449-34965-2-git-send-email-john.stultz@linaro.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
4e8b1452 |
|
08-Aug-2012 |
John Stultz <john.stultz@linaro.org> |
time: Improve sanity checking of timekeeping inputs Unexpected behavior could occur if the time is set to a value large enough to overflow a 64bit ktime_t (which is something larger then the year 2262). Also unexpected behavior could occur if large negative offsets are injected via adjtimex. So this patch improves the sanity check timekeeping inputs by improving the timespec_valid() check, and then makes better use of timespec_valid() to make sure we don't set the time to an invalid negative value or one that overflows ktime_t. Note: This does not protect from setting the time close to overflowing ktime_t and then letting natural accumulation cause the overflow. Reported-by: CAI Qian <caiqian@redhat.com> Reported-by: Sasha Levin <levinsasha928@gmail.com> Signed-off-by: John Stultz <john.stultz@linaro.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Zhouping Liu <zliu@redhat.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/1344454580-17031-1-git-send-email-john.stultz@linaro.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
1d17d174 |
|
04-Aug-2012 |
Ingo Molnar <mingo@kernel.org> |
time: Fix adjustment cleanup bug in timekeeping_adjust() Tetsuo Handa reported that sporadically the system clock starts counting up too quickly which is enough to confuse the hangcheck timer to print a bogus stall warning. Commit 2a8c0883 "time: Move xtime_nsec adjustment underflow handling timekeeping_adjust" overlooked this exit path: } else return; which should really be a proper exit sequence, fixing the bug as a side effect. Also make the flow more readable by properly balancing curly braces. Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> wrote: Tested-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> wrote: Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: john.stultz@linaro.org Cc: a.p.zijlstra@chello.nl Cc: richardcochran@gmail.com Cc: prarit@redhat.com Link: http://lkml.kernel.org/r/20120804192114.GA28347@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
4e250fdd |
|
27-Jul-2012 |
John Stultz <john.stultz@linaro.org> |
time: Remove all direct references to timekeeper Ingo noted that the numerous timekeeper.value references made the timekeeping code ugly and caused many long lines that had to be broken up. He recommended replacing timekeeper.value references with tk->value. This patch provides a local tk value for all top level time functions and sets it to &timekeeper. Then all timekeeper access is done via a tk pointer. Signed-off-by: John Stultz <john.stultz@linaro.org> Cc: Prarit Bhargava <prarit@redhat.com> Link: http://lkml.kernel.org/r/1343414893-45779-6-git-send-email-john.stultz@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
6d0ef903 |
|
27-Jul-2012 |
John Stultz <john.stultz@linaro.org> |
time: Clean up offs_real/wall_to_mono and offs_boot/total_sleep_time updates For performance reasons, we maintain ktime_t based duplicates of wall_to_monotonic (offs_real) and total_sleep_time (offs_boot). Since large problems could occur (such as the resume regression on 3.5-rc7, or the leapsecond hrtimer issue) if these value pairs were to be inconsistently updated, this patch this cleans up how we modify these value pairs to ensure we are always consistent. As a side-effect this is also more efficient as we only caulculate the duplicate values when they are changed, rather then every update_wall_time call. This also provides WARN_ONs to detect if future changes break the invariants. Signed-off-by: John Stultz <john.stultz@linaro.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Link: http://lkml.kernel.org/r/1343414893-45779-5-git-send-email-john.stultz@linaro.org [ Cleaned up minor style issues. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
d4e3ab38 |
|
27-Jul-2012 |
John Stultz <john.stultz@linaro.org> |
time: Clean up stray newlines Ingo noted inconsistent newline usage between functions. This patch cleans those up. Signed-off-by: John Stultz <john.stultz@linaro.org> Cc: Prarit Bhargava <prarit@redhat.com> Link: http://lkml.kernel.org/r/1343414893-45779-4-git-send-email-john.stultz@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
b44d50dc |
|
23-Jul-2012 |
John Stultz <john.stultz@linaro.org> |
time: Fix casting issue in tk_set_xtime and tk_xtime_add commit 1e75fa8b (time: Condense timekeeper.xtime into xtime_sec) introduced helper functions which apply a timespec to the core internal timekeeper data. The internal storage type is u64. The timespec tv_nsec value must be shifted before set or added to the internal value. tv_nsec is a long, which is 32bit on a 32bit system, so without casting tv_nsec to u64 we lose the bits which are shifted over the 32bit boundary. Add the proper typecasts. Reported-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: John Stultz <john.stultz@linaro.org> Acked-by: Prarit Bhargava <prarit@redhat.com> Link: http://lkml.kernel.org/r/1343074957-16541-1-git-send-email-john.stultz@linaro.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
3e997130 |
|
15-Jul-2012 |
Thomas Gleixner <tglx@linutronix.de> |
timekeeping: Add missing update call in timekeeping_resume() The leap second rework unearthed another issue of inconsistent data. On timekeeping_resume() the timekeeper data is updated, but nothing calls timekeeping_update(), so now the update code in the timer interrupt sees stale values. This has been the case before those changes, but then the timer interrupt was using stale data as well so this went unnoticed for quite some time. Add the missing update call, so all the data is consistent everywhere. Reported-by: Andreas Schwab <schwab@linux-m68k.org> Reported-and-tested-by: "Rafael J. Wysocki" <rjw@sisk.pl> Reported-and-tested-by: Martin Steigerwald <Martin@lichtvoll.de> Cc: LKML <linux-kernel@vger.kernel.org> Cc: Linux PM list <linux-pm@vger.kernel.org> Cc: John Stultz <johnstul@us.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>, Cc: Prarit Bhargava <prarit@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
f726a697 |
|
12-Jul-2012 |
John Stultz <john.stultz@linaro.org> |
time: Rework timekeeping functions to take timekeeper ptr as argument As part of cleaning up the timekeeping code, this patch converts a number of internal functions to takei a timekeeper ptr as an argument, so that the internal functions don't access the global timekeeper structure directly. This allows for further optimizations to reduce lock hold time later. This patch has been updated to include more consistent usage of the timekeeper value, by making sure it is always passed as a argument to non top-level functions. Signed-off-by: John Stultz <john.stultz@linaro.org> Reviewed-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Link: http://lkml.kernel.org/r/1342156917-25092-9-git-send-email-john.stultz@linaro.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
2a8c0883 |
|
12-Jul-2012 |
John Stultz <john.stultz@linaro.org> |
time: Move xtime_nsec adjustment underflow handling timekeeping_adjust When we make adjustments speeding up the clock, its possible for xtime_nsec to underflow. We already handle this properly, but we do so from update_wall_time() instead of the more logical timekeeping_adjust(), where the possible underflow actually occurs. Thus, move the correction logic to the timekeeping_adjust, which is the function that causes the issue. Making update_wall_time() more readable. Signed-off-by: John Stultz <johnstul@us.ibm.com> Reviewed-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Link: http://lkml.kernel.org/r/1342156917-25092-8-git-send-email-john.stultz@linaro.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
f2a5a085 |
|
12-Jul-2012 |
John Stultz <john.stultz@linaro.org> |
time: Move arch_gettimeoffset() usage into timekeeping_get_ns() Since we call arch_gettimeoffset() in all the accessor functions, move arch_gettimeoffset() calls into timekeeping_get_ns() and timekeeping_get_ns_raw() to simplify the code. This also makes the code easier to maintain as we don't have to worry about forgetting the arch_gettimeoffset() as has happened in the past. Signed-off-by: John Stultz <johnstul@us.ibm.com> Reviewed-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Link: http://lkml.kernel.org/r/1342156917-25092-7-git-send-email-john.stultz@linaro.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
1f4f9487 |
|
12-Jul-2012 |
John Stultz <john.stultz@linaro.org> |
time: Refactor accumulation of nsecs to secs We do the exact same logic moving nsecs to secs in the timekeeper in multiple places, so condense this into a single function. Signed-off-by: John Stultz <john.stultz@linaro.org> Reviewed-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Link: http://lkml.kernel.org/r/1342156917-25092-6-git-send-email-john.stultz@linaro.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
1e75fa8b |
|
12-Jul-2012 |
John Stultz <john.stultz@linaro.org> |
time: Condense timekeeper.xtime into xtime_sec The timekeeper struct has a xtime_nsec, which keeps the sub-nanosecond remainder. This ends up being somewhat duplicative of the timekeeper.xtime.tv_nsec value, and we have to do extra work to keep them apart, copying the full nsec portion out and back in over and over. This patch simplifies some of the logic by taking the timekeeper xtime value and splitting it into timekeeper.xtime_sec and reuses the timekeeper.xtime_nsec for the sub-second portion (stored in higher res shifted nanoseconds). This simplifies some of the accumulation logic. And will allow for more accurate timekeeping once the vsyscall code is updated to use the shifted nanosecond remainder. Signed-off-by: John Stultz <john.stultz@linaro.org> Reviewed-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Link: http://lkml.kernel.org/r/1342156917-25092-5-git-send-email-john.stultz@linaro.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
fee84c43 |
|
12-Jul-2012 |
John Stultz <john.stultz@linaro.org> |
time: Explicitly use u32 instead of int for shift values Ingo noted that using a u32 instead of int for shift values would be better to make sure the compiler doesn't unnecessarily use complex signed arithmetic. Signed-off-by: John Stultz <john.stultz@linaro.org> Reviewed-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Link: http://lkml.kernel.org/r/1342156917-25092-4-git-send-email-john.stultz@linaro.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
42e71e81 |
|
12-Jul-2012 |
John Stultz <john.stultz@linaro.org> |
time: Whitespace cleanups per Ingo%27s requests Ingo noted a number of places where there is inconsistent use of whitespace. This patch tries to address the main culprits. Signed-off-by: John Stultz <john.stultz@linaro.org> Reviewed-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Link: http://lkml.kernel.org/r/1342156917-25092-3-git-send-email-john.stultz@linaro.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
f6c06abf |
|
10-Jul-2012 |
Thomas Gleixner <tglx@linutronix.de> |
timekeeping: Provide hrtimer update function To finally fix the infamous leap second issue and other race windows caused by functions which change the offsets between the various time bases (CLOCK_MONOTONIC, CLOCK_REALTIME and CLOCK_BOOTTIME) we need a function which atomically gets the current monotonic time and updates the offsets of CLOCK_REALTIME and CLOCK_BOOTTIME with minimalistic overhead. The previous patch which provides ktime_t offsets allows us to make this function almost as cheap as ktime_get() which is going to be replaced in hrtimer_interrupt(). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Prarit Bhargava <prarit@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: John Stultz <johnstul@us.ibm.com> Link: http://lkml.kernel.org/r/1341960205-56738-7-git-send-email-johnstul@us.ibm.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
5b9fe759 |
|
10-Jul-2012 |
Thomas Gleixner <tglx@linutronix.de> |
timekeeping: Maintain ktime_t based offsets for hrtimers We need to update the hrtimer clock offsets from the hrtimer interrupt context. To avoid conversions from timespec to ktime_t maintain a ktime_t based representation of those offsets in the timekeeper. This puts the conversion overhead into the code which updates the underlying offsets and provides fast accessible values in the hrtimer interrupt. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <johnstul@us.ibm.com> Reviewed-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Prarit Bhargava <prarit@redhat.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/1341960205-56738-4-git-send-email-johnstul@us.ibm.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
4873fa07 |
|
10-Jul-2012 |
John Stultz <johnstul@us.ibm.com> |
timekeeping: Fix leapsecond triggered load spike issue The timekeeping code misses an update of the hrtimer subsystem after a leap second happened. Due to that timers based on CLOCK_REALTIME are either expiring a second early or late depending on whether a leap second has been inserted or deleted until an operation is initiated which causes that update. Unless the update happens by some other means this discrepancy between the timekeeping and the hrtimer data stays forever and timers are expired either early or late. The reported immediate workaround - $ data -s "`date`" - is causing a call to clock_was_set() which updates the hrtimer data structures. See: http://www.sheeri.com/content/mysql-and-leap-second-high-cpu-and-fix Add the missing clock_was_set() call to update_wall_time() in case of a leap second event. The actual update is deferred to softirq context as the necessary smp function call cannot be invoked from hard interrupt context. Signed-off-by: John Stultz <johnstul@us.ibm.com> Reported-by: Jan Engelhardt <jengelh@inai.de> Reviewed-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Prarit Bhargava <prarit@redhat.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/1341960205-56738-3-git-send-email-johnstul@us.ibm.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
fad0c66c |
|
30-May-2012 |
John Stultz <john.stultz@linaro.org> |
timekeeping: Fix CLOCK_MONOTONIC inconsistency during leapsecond Commit 6b43ae8a61 (ntp: Fix leap-second hrtimer livelock) broke the leapsecond update of CLOCK_MONOTONIC. The missing leapsecond update to wall_to_monotonic causes discontinuities in CLOCK_MONOTONIC. Adjust wall_to_monotonic when NTP inserted a leapsecond. Reported-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: John Stultz <john.stultz@linaro.org> Tested-by: Richard Cochran <richardcochran@gmail.com> Cc: stable@kernel.org Link: http://lkml.kernel.org/r/1338400497-12420-1-git-send-email-john.stultz@linaro.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
d239f49d |
|
27-Apr-2012 |
Richard Cochran <richardcochran@gmail.com> |
timekeeping: Fix a few minor newline issues. Fix a few minor newline issues. Signed-off-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
88b28adf |
|
14-Mar-2012 |
Jim Cromie <jim.cromie@gmail.com> |
kernel-time: fix s/then/than/ spelling errors Use than for comparisons, like more than. CC: John Stultz <john.stultz@linaro.org> Signed-off-by: Jim Cromie <jim.cromie@gmail.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
e919cfd4 |
|
22-Mar-2012 |
John Stultz <john.stultz@linaro.org> |
time: Avoid scary backtraces when warning of > 11% adj Folks have been getting a number of warnings about time adjustments > 11%. The WARN_ON leaves a big useless backtrace so this patch removes it for a printk_once(). I'm still working to narrow down the cause of the > 11% adjustment. Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
6b43ae8a |
|
15-Mar-2012 |
John Stultz <john.stultz@linaro.org> |
ntp: Fix leap-second hrtimer livelock Since commit 7dffa3c673fbcf835cd7be80bb4aec8ad3f51168 the ntp subsystem has used an hrtimer for triggering the leapsecond adjustment. However, this can cause a potential livelock. Thomas diagnosed this as the following pattern: CPU 0 CPU 1 do_adjtimex() spin_lock_irq(&ntp_lock); process_adjtimex_modes(); timer_interrupt() process_adj_status(); do_timer() ntp_start_leap_timer(); write_lock(&xtime_lock); hrtimer_start(); update_wall_time(); hrtimer_reprogram(); ntp_tick_length() tick_program_event() spin_lock(&ntp_lock); clockevents_program_event() ktime_get() seq = req_seqbegin(xtime_lock); This patch tries to avoid the problem by reverting back to not using an hrtimer to inject leapseconds, and instead we handle the leapsecond processing in the second_overflow() function. The downside to this change is that on systems that support highres timers, the leap second processing will occur on a HZ tick boundary, (ie: ~1-10ms, depending on HZ) after the leap second instead of possibly sooner (~34us in my tests w/ x86_64 lapic). This patch applies on top of tip/timers/core. CC: Sasha Levin <levinsasha928@gmail.com> CC: Thomas Gleixner <tglx@linutronix.de> Reported-by: Sasha Levin <levinsasha928@gmail.com> Diagnoised-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Sasha Levin <levinsasha928@gmail.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
f695cf94 |
|
14-Mar-2012 |
John Stultz <john.stultz@linaro.org> |
time: Fix change_clocksource locking change_clocksource() fails to grab locks or call timekeeping_update(), which leaves a race window for time inconsistencies. This adds proper locking and a call to timekeeping_update() to fix this. CC: Andy Lutomirski <luto@amacapital.net> CC: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
a80b83b7 |
|
03-Feb-2012 |
John Stultz <john.stultz@linaro.org> |
Input: add infrastructure for selecting clockid for event time stamps As noted by Arve and others, since wall time can jump backwards, it is difficult to use for input because one cannot determine if one event occurred before another or for how long a key was pressed. However, the timestamp field is part of the kernel ABI, and cannot be changed without possibly breaking existing users. This patch adds a new IOCTL that allows a clockid to be set in the evdev_client struct that will specify which time base to use for event timestamps (ie: CLOCK_MONOTONIC instead of CLOCK_REALTIME). For now we only support CLOCK_MONOTONIC and CLOCK_REALTIME, but in the future we could support other clockids if appropriate. The default remains CLOCK_REALTIME, so we don't change the ABI. Signed-off-by: John Stultz <john.stultz@linaro.org> Reviewed-by: Daniel Kurtz <djkurtz@google.com> Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
|
#
cc06268c |
|
13-Nov-2011 |
Thomas Gleixner <tglx@linutronix.de> |
time: Move common updates to a function CC: Thomas Gleixner <tglx@linutronix.de> CC: Eric Dumazet <eric.dumazet@gmail.com> CC: Richard Cochran <richardcochran@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
058892e6 |
|
13-Nov-2011 |
Thomas Gleixner <tglx@linutronix.de> |
time: Reorder so the hot data is together Keep all the interesting data in a single cache line. CC: Thomas Gleixner <tglx@linutronix.de> CC: Eric Dumazet <eric.dumazet@gmail.com> CC: Richard Cochran <richardcochran@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
92c1d3ed |
|
14-Nov-2011 |
John Stultz <john.stultz@linaro.org> |
time: Remove most of xtime_lock usage in timekeeping.c Now that ntp.c's locking is reworked, we can remove most of the xtime_lock usage in timekeeping.c The remaining xtime_lock presence is really for jiffies access and the global load calculation. CC: Thomas Gleixner <tglx@linutronix.de> CC: Eric Dumazet <eric.dumazet@gmail.com> CC: Richard Cochran <richardcochran@gmail.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
ea7cf49a |
|
14-Nov-2011 |
John Stultz <john.stultz@linaro.org> |
ntp: Access tick_length variable via ntp_tick_length() Currently the NTP managed tick_length value is accessed globally, in preparations for locking cleanups, make sure it is accessed via a function and mark it as static. CC: Thomas Gleixner <tglx@linutronix.de> CC: Eric Dumazet <eric.dumazet@gmail.com> CC: Richard Cochran <richardcochran@gmail.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
70471f2f |
|
14-Nov-2011 |
John Stultz <john.stultz@linaro.org> |
time: Add timekeeper lock Now that all the timekeeping variables are stored in the timekeeper structure, add a new lock to protect the structure. For now, this lock nests under the xtime_lock for writes. For readers, we don't need to take xtime_lock anymore. CC: Thomas Gleixner <tglx@linutronix.de> CC: Eric Dumazet <eric.dumazet@gmail.com> CC: Richard Cochran <richardcochran@gmail.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
8fcce546 |
|
14-Nov-2011 |
John Stultz <john.stultz@linaro.org> |
time: Cleanup global variables and move them to the top Move global xtime_lock and timekeeping_suspended values up to the top of timekeeping.c CC: Thomas Gleixner <tglx@linutronix.de> CC: Eric Dumazet <eric.dumazet@gmail.com> CC: Richard Cochran <richardcochran@gmail.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
01f71b47 |
|
14-Nov-2011 |
John Stultz <john.stultz@linaro.org> |
time: Move raw_time into timekeeper structure In preparation for locking cleanups, move raw_time into timekeeper structure. CC: Thomas Gleixner <tglx@linutronix.de> CC: Eric Dumazet <eric.dumazet@gmail.com> CC: Richard Cochran <richardcochran@gmail.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
8ff2cb92 |
|
14-Nov-2011 |
John Stultz <john.stultz@linaro.org> |
time: Move xtime into timekeeeper structure In preparation for locking cleanups, move xtime into timekeeper structure. CC: Thomas Gleixner <tglx@linutronix.de> CC: Eric Dumazet <eric.dumazet@gmail.com> CC: Richard Cochran <richardcochran@gmail.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
d9f7217a |
|
14-Nov-2011 |
John Stultz <john.stultz@linaro.org> |
time: Move wall_to_monotonic into the timekeeper structure In preparation for locking cleanups, move wall_to_monotonic into the timekeeper structure. CC: Thomas Gleixner <tglx@linutronix.de> CC: Eric Dumazet <eric.dumazet@gmail.com> CC: Richard Cochran <richardcochran@gmail.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
00c5fb77 |
|
14-Nov-2011 |
John Stultz <john.stultz@linaro.org> |
time: Move total_sleep_time into the timekeeper structure Move total_sleep_time into the timekeeper structure in preparation for locking cleanups CC: Thomas Gleixner <tglx@linutronix.de> CC: Eric Dumazet <eric.dumazet@gmail.com> CC: Richard Cochran <richardcochran@gmail.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
3f86f28f |
|
27-Oct-2011 |
John Stultz <john.stultz@linaro.org> |
time: Fix spelling mistakes in new comments Fixup spelling issues caught by Richard CC: Richard Cochran <richardcochran@gmail.com> CC: Chen Jie <chenj@lemote.com> CC: Steven Rostedt <rostedt@goodmis.org> CC: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
c9fad429 |
|
17-Oct-2011 |
Dan McGee <dpmcgee@gmail.com> |
time: fix bogus comment in timekeeping_get_ns_raw The whole point of this function is to return a value not touched by NTP; unfortunately the comment got copied wholesale without adjustment from the timekeeping_get_ns function above. Signed-off-by: Dan McGee <dpmcgee@gmail.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
d004e024 |
|
14-Nov-2011 |
Hector Palacios <hector.palacios@digi.com> |
timekeeping: add arch_offset hook to ktime_get functions ktime_get and ktime_get_ts were calling timekeeping_get_ns() but later they were not calling arch_gettimeoffset() so architectures using this mechanism returned 0 ns when calling these functions. This happened for example when running Busybox's ping which calls syscall(__NR_clock_gettime, CLOCK_MONOTONIC, ts) which eventually calls ktime_get. As a result the returned ping travel time was zero. CC: stable@kernel.org Signed-off-by: Hector Palacios <hector.palacios@digi.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
d65670a7 |
|
31-Oct-2011 |
John Stultz <john.stultz@linaro.org> |
clocksource: Avoid selecting mult values that might overflow when adjusted For some frequencies, the clocks_calc_mult_shift() function will unfortunately select mult values very close to 0xffffffff. This has the potential to overflow when NTP adjusts the clock, adding to the mult value. This patch adds a clocksource.maxadj value, which provides an approximation of an 11% adjustment(NTP limits adjustments to 500ppm and the tick adjustment is limited to 10%), which could be made to the clocksource.mult value. This is then used to both check that the current mult value won't overflow/underflow, as well as warning us if the timekeeping_adjust() code pushes over that 11% boundary. v2: Fix max_adjustment calculation, and improve WARN_ONCE messages. v3: Don't warn before maxadj has actually been set CC: Yong Zhang <yong.zhang0@gmail.com> CC: David Daney <ddaney.cavm@gmail.com> CC: Thomas Gleixner <tglx@linutronix.de> CC: Chen Jie <chenj@lemote.com> CC: zhangfx <zhangfx@lemote.com> CC: stable@kernel.org Reported-by: Chen Jie <chenj@lemote.com> Reported-by: zhangfx <zhangfx@lemote.com> Tested-by: Yong Zhang <yong.zhang0@gmail.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
c2bc1111 |
|
27-Oct-2011 |
John Stultz <john.stultz@linaro.org> |
time: Improve documentation of timekeeeping_adjust() After getting a number of questions in private emails about the math around admittedly very complex timekeeping_adjust() and timekeeping_big_adjust(), I figure the code needs some better comments. Hopefully the explanations are clear enough and don't muddy the water any worse. Still needs documentation for ntp_error, but I couldn't recall exactly the full explanation behind the code that's there (although I do recall once working it out when Roman first proposed it). Given a bit more time I can probably work it out, but I don't want to hold back this documentation until then. Signed-off-by: John Stultz <john.stultz@linaro.org> Cc: Chen Jie <chenj@lemote.com> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/1319764362-32367-1-git-send-email-john.stultz@linaro.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
cbaa5152 |
|
20-Jul-2011 |
John Stultz <john.stultz@linaro.org> |
time: Fix stupid KERN_WARN compile issue Terribly embarassing. Don't know how I committed this, but its KERN_WARNING not KERN_WARN. This fixes the following compile error: kernel/time/timekeeping.c: In function ‘__timekeeping_inject_sleeptime’: kernel/time/timekeeping.c:608: error: ‘KERN_WARN’ undeclared (first use in this function) kernel/time/timekeeping.c:608: error: (Each undeclared identifier is reported only once kernel/time/timekeeping.c:608: error: for each function it appears in.) kernel/time/timekeeping.c:608: error: expected ‘)’ before string constant make[2]: *** [kernel/time/timekeeping.o] Error 1 Reported-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
cb33217b |
|
31-May-2011 |
John Stultz <john.stultz@linaro.org> |
time: Avoid accumulating time drift in suspend/resume Because the read_persistent_clock interface is usually backed by only a second granular interface, each time we read from the persistent clock for suspend/resume, we introduce a half second (on average) of error. In order to avoid this error accumulating as the system is suspended over and over, this patch measures the time delta between the persistent clock and the system CLOCK_REALTIME. If the delta is less then 2 seconds from the last suspend, we compensate by using the previous time delta (keeping it close). If it is larger then 2 seconds, we assume the clock was set or has been changed, so we do no correction and update the delta. Note: If NTP is running, ths could seem to "fight" with the NTP corrected time, where as if the system time was off by 1 second, and NTP slewed the value in, a suspend/resume cycle could undo this correction, by trying to restore the previous offset from the persistent clock. However, without this patch, since each read could cause almost a full second worth of error, its possible to get almost 2 seconds of error just from the suspend/resume cycle alone, so this about equal to any offset added by the compensation. Further on systems that suspend/resume frequently, this should keep time closer then NTP could compensate for if the errors were allowed to accumulate. Credits to Arve Hjønnevåg for suggesting this solution. CC: Arve Hjønnevåg <arve@android.com> CC: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
cb5de2f8d |
|
01-Jun-2011 |
John Stultz <john.stultz@linaro.org> |
time: Catch invalid timespec sleep values in __timekeeping_inject_sleeptime Arve suggested making sure we catch possible negative sleep time intervals that could be passed into timekeeping_inject_sleeptime. CC: Arve Hjønnevåg <arve@android.com> CC: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
99ee5315 |
|
27-Apr-2011 |
Thomas Gleixner <tglx@linutronix.de> |
timerfd: Allow timers to be cancelled when clock was set Some applications must be aware of clock realtime being set backward. A simple example is a clock applet which arms a timer for the next minute display. If clock realtime is set backward then the applet displays a stale time for the amount of time which the clock was set backwards. Due to that applications poll the time because we don't have an interface. Extend the timerfd interface by adding a flag which puts the timer onto a different internal realtime clock. All timers on this clock are expired whenever the clock was set. The timerfd core records the monotonic offset when the timer is created. When the timer is armed, then the current offset is compared to the previous recorded offset. When it has changed, then timerfd_settime returns -ECANCELED. When a timer is read the offset is compared and if it changed -ECANCELED returned to user space. Periodic timers are not rearmed in the cancelation case. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: John Stultz <johnstul@us.ibm.com> Cc: Chris Friesen <chris.friesen@genband.com> Tested-by: Kay Sievers <kay.sievers@vrfy.org> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Davide Libenzi <davidel@xmailserver.org> Reviewed-by: Alexander Shishkin <virtuoso@slind.org> Link: http://lkml.kernel.org/r/%3Calpine.LFD.2.02.1104271359580.3323%40ionos%3E Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
b12a03ce |
|
02-May-2011 |
Thomas Gleixner <tglx@linutronix.de> |
hrtimers: Prepare for cancel on clock was set timers Make clock_was_set() unconditional and rename hres_timers_resume to hrtimers_resume. This is a preparatory patch for hrtimers which are cancelled when clock realtime was set. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
304529b1 |
|
01-Apr-2011 |
John Stultz <john.stultz@linaro.org> |
time: Add timekeeping_inject_sleeptime Some platforms cannot implement read_persistent_clock, as their RTC devices are only accessible when interrupts are enabled. This keeps them from being used by the timekeeping code on resume to measure the time in suspend. The RTC layer tries to work around this, by calling do_settimeofday on resume after irqs are reenabled to set the time properly. However, this only corrects CLOCK_REALTIME, and does not properly adjust the sleep time value. This causes btime in /proc/stat to be incorrect as well as making the new CLOCK_BOTTTIME inaccurate. This patch resolves the issue by introducing a new timekeeping hook to allow the RTC layer to inject the sleep time on resume. The code also checks to make sure that read_persistent_clock is nonfunctional before setting the sleep time, so that should the RTC's HCTOSYS option be configured in on a system that does support read_persistent_clock we will not increase the total_sleep_time twice. CC: Arve Hjønnevåg <arve@android.com> CC: Thomas Gleixner <tglx@linutronix.de> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
e1a85b2c |
|
23-Mar-2011 |
Rafael J. Wysocki <rjw@rjwysocki.net> |
timekeeping: Use syscore_ops instead of sysdev class and sysdev The timekeeping subsystem uses a sysdev class and a sysdev for executing timekeeping_suspend() after interrupts have been turned off on the boot CPU (during system suspend) and for executing timekeeping_resume() before turning on interrupts on the boot CPU (during system resume). However, since both of these functions ignore their arguments, the entire mechanism may be replaced with a struct syscore_ops object which is simpler. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
|
#
314ac371 |
|
14-Feb-2011 |
John Stultz <john.stultz@linaro.org> |
time: Extend get_xtime_and_monotonic_offset() to also return sleep Extend get_xtime_and_monotonic_offset to get_xtime_and_monotonic_and_sleep_offset(). CC: Jamie Lokier <jamie@shareable.org> CC: Thomas Gleixner <tglx@linutronix.de> CC: Alexander Shishkin <virtuoso@slind.org> CC: Arve Hjønnevåg <arve@android.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
abb3a4ea |
|
14-Feb-2011 |
John Stultz <john.stultz@linaro.org> |
time: Introduce get_monotonic_boottime and ktime_get_boottime This adds new functions that return the monotonic time since boot (in other words, CLOCK_MONOTONIC + suspend time). CC: Jamie Lokier <jamie@shareable.org> CC: Thomas Gleixner <tglx@linutronix.de> CC: Alexander Shishkin <virtuoso@slind.org> CC: Arve Hjønnevåg <arve@android.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
c528f7c6 |
|
01-Feb-2011 |
John Stultz <john.stultz@linaro.org> |
time: Introduce timekeeping_inject_offset This adds a kernel-internal timekeeping interface to add or subtract a fixed amount from CLOCK_REALTIME. This makes it so kernel users or interfaces trying to do so do not have to read the time, then add an offset and then call settimeofday(), which adds some extra error in comparision to just simply adding the offset in the kernel timekeeping core. Signed-off-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Richard Cochran <richard.cochran@omicron.at> LKML-Reference: <20110201134419.584311693@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
1e6d7679 |
|
01-Feb-2011 |
Richard Cochran <richard.cochran@omicron.at> |
time: Correct the *settime* parameters Both settimeofday() and clock_settime() promise with a 'const' attribute not to alter the arguments passed in. This patch adds the missing 'const' attribute into the various kernel functions implementing these calls. Signed-off-by: Richard Cochran <richard.cochran@omicron.at> Acked-by: John Stultz <johnstul@us.ibm.com> LKML-Reference: <20110201134417.545698637@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
f0af911a9 |
|
27-Jan-2011 |
Torben Hohn <torbenh@gmx.de> |
time: Provide xtime_update() xtime_update() takes xtime_lock write locked and calls do_timer(). Provided to replace the do_timer() calls in the architecture code. Signed-off-by: Torben Hohn <torbenh@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: johnstul@us.ibm.com Cc: yong.zhang0@gmail.com Cc: hch@infradead.org LKML-Reference: <20110127145910.23248.21379.stgit@localhost> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
79ecaf0d |
|
31-Jan-2011 |
Thomas Gleixner <tglx@linutronix.de> |
time: Remove unused __get_wall_to_monotonic() No users left. Remove it. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
48cf76f7 |
|
27-Jan-2011 |
Torben Hohn <torbenh@gmx.de> |
time: Provide get_xtime_and_monotonic_offset() The hrtimer code accesses timekeeping variables under xtime_lock. Provide a sensible accessor function and use it. [ tglx: Removed the conditionals, unused variable, fixed codingstyle and massaged changelog ] Signed-off-by: Torben Hohn <torbenh@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: johnstul@us.ibm.com Cc: yong.zhang0@gmail.com Cc: hch@infradead.org LKML-Reference: <20110127145905.23248.30458.stgit@localhost> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
871cf1e5 |
|
27-Jan-2011 |
Torben Hohn <torbenh@gmx.de> |
time: Move do_timer() to kernel/time/timekeeping.c do_timer() is primary timekeeping related. calc_global_load() is called from do_timer() as well, but that's more for historical reasons. [ tglx: Fixed up the calc_global_load() reject andmassaged changelog ] Signed-off-by: Torben Hohn <torbenh@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: johnstul@us.ibm.com Cc: yong.zhang0@gmail.com Cc: hch@infradead.org LKML-Reference: <20110127145855.23248.56933.stgit@localhost> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
e2c18e49 |
|
12-Jan-2011 |
Alexander Gordeev <lasaine@lvk.cs.msu.su> |
pps: capture MONOTONIC_RAW timestamps as well MONOTONIC_RAW clock timestamps are ideally suited for frequency calculation and also fit well into the original NTP hardpps design. Now phase and frequency can be adjusted separately: the former based on REALTIME clock and the latter based on MONOTONIC_RAW clock. A new function getnstime_raw_and_real is added to timekeeping subsystem to capture both timestamps at the same time and atomically. Signed-off-by: Alexander Gordeev <lasaine@lvk.cs.msu.su> Acked-by: John Stultz <johnstul@us.ibm.com> Cc: Rodolfo Giometti <giometti@enneenne.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
afa14e7c |
|
11-Jan-2011 |
H Hartley Sweeten <hartleys@visionengravers.com> |
timekeeping: Make local variables static Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com> Cc: John Stultz <johnstul@us.ibm.com> LKML-Reference: <0D753D10438DA54287A00B027084269764CE0E54B7@AUSP01VMBX24.collaborationhost.net> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
a386b5af |
|
20-Oct-2010 |
Kasper Pedersen <kkp2010@kasperkp.dk> |
time: Compensate for rounding on odd-frequency clocksources When the clocksource is not a multiple of HZ, the clock will be off. For acpi_pm, HZ=1000 the error is 127.111 ppm: The rounding of cycle_interval ends up generating a false error term in ntp_error accumulation since xtime_interval is not exactly 1/HZ. So, we subtract out the error caused by the rounding. This has been visible since 2.6.32-rc2 commit a092ff0f90cae22b2ac8028ecd2c6f6c1a9e4601 time: Implement logarithmic time accumulation That commit raised NTP_INTERVAL_FREQ and exposed the rounding error. testing tool: http://n1.taur.dk/permanent/testpmt.c Also tested with ntpd and a frequency counter. Signed-off-by: Kasper Pedersen <kkp2010@kasperkp.dk> Acked-by: john stultz <johnstul@us.ibm.com> Cc: John Kacur <jkacur@redhat.com> Cc: Clark Williams <williams@redhat.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
c7dcf87a |
|
13-Aug-2010 |
John Stultz <johnstul@us.ibm.com> |
time: Workaround gcc loop optimization that causes 64bit div errors Early 4.3 versions of gcc apparently aggressively optimize the raw time accumulation loop, replacing it with a divide. On 32bit systems, this causes the following link errors: undefined reference to `__umoddi3' undefined reference to `__udivdi3' The gcc issue has been fixed in 4.4 and greater. This patch replaces the accumulation loop with a do_div, as suggested by Linus. Signed-off-by: John Stultz <johnstul@us.ibm.com> CC: Jason Wessel <jason.wessel@windriver.com> CC: Larry Finger <Larry.Finger@lwfinger.net> CC: Ingo Molnar <mingo@elte.hu> CC: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
deda2e81 |
|
09-Aug-2010 |
Jason Wessel <jason.wessel@windriver.com> |
timekeeping: Fix overflow in rawtime tv_nsec on 32 bit archs The tv_nsec is a long and when added to the shifted interval it can wrap and become negative which later causes looping problems in the getrawmonotonic(). The edge case occurs when the system has slept for a short period of time of ~2 seconds. A trace printk of the values in this patch illustrate the problem: ftrace time stamp: log 43.716079: logarithmic_accumulation: raw: 3d0913 tv_nsec d687faa 43.718513: logarithmic_accumulation: raw: 3d0913 tv_nsec da588bd 43.722161: logarithmic_accumulation: raw: 3d0913 tv_nsec de291d0 46.349925: logarithmic_accumulation: raw: 7a122600 tv_nsec e1f9ae3 46.349930: logarithmic_accumulation: raw: 1e848980 tv_nsec 8831c0e3 The kernel starts looping at 46.349925 in the getrawmonotonic() due to the negative value from adding the raw value to tv_nsec. A simple solution is to accumulate into a u64, and then normalize it to a timespec_t. Signed-off-by: Jason Wessel <jason.wessel@windriver.com> [ Reworked variable names and simplified some of the code. - John ] Signed-off-by: John Stultz <johnstul@us.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
0fb86b06 |
|
13-Jul-2010 |
John Stultz <johnstul@us.ibm.com> |
timekeeping: Make xtime and wall_to_monotonic static This patch makes xtime and wall_to_monotonic static, as planned in Documentation/feature-removal-schedule.txt. This will allow for further cleanups to the timekeeping core. Signed-off-by: John Stultz <johnstul@us.ibm.com> LKML-Reference: <1279068988-21864-10-git-send-email-johnstul@us.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
8ab4351a |
|
13-Jul-2010 |
John Stultz <johnstul@us.ibm.com> |
hrtimer: Cleanup direct access to wall_to_monotonic Provides an accessor function to replace hrtimer.c's direct access of wall_to_monotonic. This will allow wall_to_monotonic to be made static as planned in Documentation/feature-removal-schedule.txt Signed-off-by: John Stultz <johnstul@us.ibm.com> LKML-Reference: <1279068988-21864-9-git-send-email-johnstul@us.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
7615856e |
|
13-Jul-2010 |
John Stultz <johnstul@us.ibm.com> |
timkeeping: Fix update_vsyscall to provide wall_to_monotonic offset update_vsyscall() did not provide the wall_to_monotoinc offset, so arch specific implementations tend to reference wall_to_monotonic directly. This limits future cleanups in the timekeeping core, so this patch fixes the update_vsyscall interface to provide wall_to_monotonic, allowing wall_to_monotonic to be made static as planned in Documentation/feature-removal-schedule.txt Signed-off-by: John Stultz <johnstul@us.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Anton Blanchard <anton@samba.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Tony Luck <tony.luck@intel.com> LKML-Reference: <1279068988-21864-7-git-send-email-johnstul@us.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
592913ec |
|
13-Jul-2010 |
John Stultz <johnstul@us.ibm.com> |
time: Kill off CONFIG_GENERIC_TIME Now that all arches have been converted over to use generic time via clocksources or arch_gettimeoffset(), we can remove the GENERIC_TIME config option and simplify the generic code. Signed-off-by: John Stultz <johnstul@us.ibm.com> LKML-Reference: <1279068988-21864-4-git-send-email-johnstul@us.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
ce3bf7ab |
|
13-Jul-2010 |
John Stultz <johnstul@us.ibm.com> |
time: Implement timespec_add After accidentally misusing timespec_add_safe, I wanted to make sure we don't accidently trip over that issue again, so I created a simple timespec_add() function which we can use to replace the instances of timespec_add_safe() that don't want the overflow detection. Signed-off-by: John Stultz <johnstul@us.ibm.com> LKML-Reference: <1279068988-21864-3-git-send-email-johnstul@us.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
6a867a39 |
|
06-Apr-2010 |
John Stultz <johnstul@us.ibm.com> |
time: Remove xtime_cache With the earlier logarithmic time accumulation patch, xtime will now always be within one "tick" of the current time, instead of possibly half a second off. This removes the need for the xtime_cache value, which always stored the time at the last interrupt, so this patch cleans that up removing the xtime_cache related code. This patch also addresses an issue with an earlier version of this change, where xtime_cache was normalizing xtime, which could in some cases be not valid (ie: tv_nsec == NSEC_PER_SEC). This is fixed by handling the edge case in update_wall_time(). Signed-off-by: John Stultz <johnstul@us.ibm.com> Cc: Petr Titěra <P.Titera@century.cz> LKML-Reference: <1270589451-30773-1-git-send-email-johnstul@us.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
830ec045 |
|
18-Mar-2010 |
John Stultz <johnstul@us.ibm.com> |
time: Fix accumulation bug triggered by long delay. The logarithmic accumulation done in the timekeeping has some overflow protection that limits the max shift value. That means it will take more then shift loops to accumulate all of the cycles. This causes the shift decrement to underflow, which causes the loop to never exit. The simplest fix would be simply to do a: if (shift) shift--; However that is not optimal, as we know the cycle offset is larger then the interval << shift, the above would make shift drop to zero, then we would be spinning for quite awhile accumulating at interval chunks at a time. Instead, this patch only decreases shift if the offset is smaller then cycle_interval << shift. This makes sure we accumulate using the largest chunks possible without overflowing tick_length, and limits the number of iterations through the loop. This issue was found and reported by Sonic Zhang, who also tested the fix. Many thanks your explanation and testing! Reported-by: Sonic Zhang <sonic.adi@gmail.com> Signed-off-by: John Stultz <johnstul@us.ibm.com> Tested-by: Sonic Zhang <sonic.adi@gmail.com> LKML-Reference: <1268948850-5225-1-git-send-email-johnstul@us.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
c93d89f3 |
|
27-Jan-2010 |
Jason Wang <jasowang@redhat.com> |
Export the symbol of getboottime and mmonotonic_to_bootbased Export getboottime and monotonic_to_bootbased in order to let them could be used by following patch. Cc: stable@kernel.org Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
|
#
c54a42b1 |
|
02-Feb-2010 |
Magnus Damm <damm@opensource.se> |
clocksource: add suspend callback Add a clocksource suspend callback. This callback can be used by the clocksource driver to shutdown and perform any kind of late suspend activities even though the clocksource driver itself is a non-sysdev driver. One example where this is useful is to fix the sh_cmt.c platform driver that today suspends using the platform bus and shuts down the clocksource too early. With this callback in place the sh_cmt driver will suspend using the clocksource and clockevent hooks and leave the platform device pm callbacks unused. Signed-off-by: Magnus Damm <damm@opensource.se> Cc: Paul Mundt <lethal@linux-sh.org> Cc: john stultz <johnstul@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
83f57a11 |
|
22-Dec-2009 |
Linus Torvalds <torvalds@linux-foundation.org> |
Revert "time: Remove xtime_cache" This reverts commit 7bc7d637452383d56ba4368d4336b0dde1bb476d, as requested by John Stultz. Quoting John: "Petr Titěra reported an issue where he saw odd atime regressions with 2.6.33 where there were a full second worth of nanoseconds in the nanoseconds field. He also reviewed the time code and narrowed down the problem: unhandled overflow of the nanosecond field caused by rounding up the sub-nanosecond accumulated time. Details: * At the end of update_wall_time(), we currently round up the sub-nanosecond portion of accumulated time when storing it into xtime. This was added to avoid time inconsistencies caused when the sub-nanosecond portion was truncated when storing into xtime. Unfortunately we don't handle the possible second overflow caused by that rounding. * Previously the xtime_cache code hid this overflow by normalizing the xtime value when storing into the xtime_cache. * We could try to handle the second overflow after the rounding up, but since this affects the timekeeping's internal state, this would further complicate the next accumulation cycle, causing small errors in ntp steering. As much as I'd like to get rid of it, the xtime_cache code is known to work. * The correct fix is really to include the sub-nanosecond portion in the timekeeping accessor function, so we don't need to round up at during accumulation. This would greatly simplify the accumulation code. Unfortunately, we can't do this safely until the last three non-GENERIC_TIME arches (sparc32, arm, cris) are converted (those patches are in -mm) and we kill off the spots where arches set xtime directly. This is all 2.6.34 material, so I think reverting the xtime_cache change is the best approach for now. Many thanks to Petr for both reporting and finding the issue!" Reported-by: Petr Titěra <P.Titera@century.cz> Requested-by: john stultz <johnstul@us.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
0696b711 |
|
16-Nov-2009 |
Lin Ming <ming.m.lin@intel.com> |
timekeeping: Fix clock_gettime vsyscall time warp Since commit 0a544198 "timekeeping: Move NTP adjusted clock multiplier to struct timekeeper" the clock multiplier of vsyscall is updated with the unmodified clock multiplier of the clock source and not with the NTP adjusted multiplier of the timekeeper. This causes user space observerable time warps: new CLOCK-warp maximum: 120 nsecs, 00000025c337c537 -> 00000025c337c4bf Add a new argument "mult" to update_vsyscall() and hand in the timekeeping internal NTP adjusted multiplier. Signed-off-by: Lin Ming <ming.m.lin@intel.com> Cc: "Zhang Yanmin" <yanmin_zhang@linux.intel.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Tony Luck <tony.luck@intel.com> LKML-Reference: <1258436990.17765.83.camel@minggr.sh.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
98962465 |
|
17-Aug-2009 |
Jon Hunter <jon-hunter@ti.com> |
nohz: Prevent clocksource wrapping during idle The dynamic tick allows the kernel to sleep for periods longer than a single tick, but it does not limit the sleep time currently. In the worst case the kernel could sleep longer than the wrap around time of the time keeping clock source which would result in losing track of time. Prevent this by limiting it to the safe maximum sleep time of the current time keeping clock source. The value is calculated when the clock source is registered. [ tglx: simplified the code a bit and massaged the commit msg ] Signed-off-by: Jon Hunter <jon-hunter@ti.com> Cc: John Stultz <johnstul@us.ibm.com> LKML-Reference: <1250617512-23567-2-git-send-email-jon-hunter@ti.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
d43c36dc |
|
07-Oct-2009 |
Alexey Dobriyan <adobriyan@gmail.com> |
headers: remove sched.h from interrupt.h After m68k's task_thread_info() doesn't refer to current, it's possible to remove sched.h from interrupt.h and not break m68k! Many thanks to Heiko Carstens for allowing this. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
|
#
7bc7d637 |
|
02-Oct-2009 |
John Stultz <johnstul@us.ibm.com> |
time: Remove xtime_cache With the prior logarithmic time accumulation patch, xtime will now always be within one "tick" of the current time, instead of possibly half a second off. This removes the need for the xtime_cache value, which always stored the time at the last interrupt, so this patch cleans that up removing the xtime_cache related code. This is a bit simpler, but still could use some wider testing. Signed-off-by: John Stultz <johnstul@us.ibm.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: John Kacur <jkacur@redhat.com> Cc: Clark Williams <williams@redhat.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Andrew Morton <akpm@linux-foundation.org> LKML-Reference: <1254525855.7741.95.camel@localhost.localdomain> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
a092ff0f |
|
02-Oct-2009 |
John Stultz <johnstul@us.ibm.com> |
time: Implement logarithmic time accumulation Accumulating one tick at a time works well unless we're using NOHZ. Then it can be an issue, since we may have to run through the loop a few thousand times, which can increase timer interrupt caused latency. The current solution was to accumulate in half-second intervals with NOHZ. This kept the number of loops down, however it did slightly change how we make NTP adjustments. While not an issue with NTPd users, as NTPd makes adjustments over a longer period of time, other adjtimex() users have noticed the half-second granularity with which we can apply frequency changes to the clock. For instance, if a application tries to apply a 100ppm frequency correction for 20ms to correct a 2us offset, with NOHZ they either get no correction, or a 50us correction. Now, there will always be some granularity error for applying frequency corrections. However with users sensitive to this error have seen a 50-500x increase with NOHZ compared to running without NOHZ. So I figured I'd try another approach then just simply increasing the interval. My approach is to consume the time interval logarithmically. This reduces the number of times through the loop needed keeping latency down, while still preserving the original granularity error for adjtimex() changes. Further, this change allows us to remove the xtime_cache code (patch to follow), as xtime is always within one tick of the current time, instead of the half-second updates it saw before. An earlier version of this patch has been shipping to x86 users in the RedHat MRG releases for awhile without issue, but I've reworked this version to be even more careful about avoiding possible overflows if the shift value gets too large. Signed-off-by: John Stultz <johnstul@us.ibm.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: John Kacur <jkacur@redhat.com> Cc: Clark Williams <williams@redhat.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Andrew Morton <akpm@linux-foundation.org> LKML-Reference: <1254525473.7741.88.camel@localhost.localdomain> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
36d47481 |
|
25-Aug-2009 |
Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com> |
timekeeping: Fix invalid getboottime() value Don't use timespec_add_safe() with wall_to_monotonic, because wall_to_monotonic has negative values which will cause overflow in timespec_add_safe(). That makes btime in /proc/stat invalid. Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: John Stultz <johnstul@us.ibm.com> Cc: Daniel Walker <dwalker@fifo99.com> LKML-Reference: <4A937FDE.4050506@ct.jp.nec.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
da15cfda |
|
19-Aug-2009 |
John Stultz <johnstul@us.ibm.com> |
time: Introduce CLOCK_REALTIME_COARSE After talking with some application writers who want very fast, but not fine-grained timestamps, I decided to try to implement new clock_ids to clock_gettime(): CLOCK_REALTIME_COARSE and CLOCK_MONOTONIC_COARSE which returns the time at the last tick. This is very fast as we don't have to access any hardware (which can be very painful if you're using something like the acpi_pm clocksource), and we can even use the vdso clock_gettime() method to avoid the syscall. The only trade off is you only get low-res tick grained time resolution. This isn't a new idea, I know Ingo has a patch in the -rt tree that made the vsyscall gettimeofday() return coarse grained time when the vsyscall64 sysctrl was set to 2. However this affects all applications on a system. With this method, applications can choose the proper speed/granularity trade-off for themselves. Signed-off-by: John Stultz <johnstul@us.ibm.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: nikolag@ca.ibm.com Cc: Darren Hart <dvhltc@us.ibm.com> Cc: arjan@infradead.org Cc: jonathan@jonmasters.org LKML-Reference: <1250734414.6897.5.camel@localhost.localdomain> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
23970e38 |
|
14-Aug-2009 |
Martin Schwidefsky <schwidefsky@de.ibm.com> |
timekeeping: Introduce read_boot_clock Add the new function read_boot_clock to get the exact time the system has been started. For architectures without support for exact boot time a new weak function is added that returns 0. Use the exact boot time to initialize wall_to_monotonic, or xtime if the read_boot_clock returned 0. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Acked-by: John Stultz <johnstul@us.ibm.com> Cc: Daniel Walker <dwalker@fifo99.com> LKML-Reference: <20090814134811.296703241@de.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
d4f587c6 |
|
14-Aug-2009 |
Martin Schwidefsky <schwidefsky@de.ibm.com> |
timekeeping: Increase granularity of read_persistent_clock() The persistent clock of some architectures (e.g. s390) have a better granularity than seconds. To reduce the delta between the host clock and the guest clock in a virtualized system change the read_persistent_clock function to return a struct timespec. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Acked-by: John Stultz <johnstul@us.ibm.com> Cc: Daniel Walker <dwalker@fifo99.com> LKML-Reference: <20090814134811.013873340@de.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
75c5158f |
|
14-Aug-2009 |
Martin Schwidefsky <schwidefsky@de.ibm.com> |
timekeeping: Update clocksource with stop_machine update_wall_time calls change_clocksource HZ times per second to check if a new clock source is available. In close to 100% of all calls there is no new clock. Replace the tick based check by an update done with stop_machine. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Acked-by: John Stultz <johnstul@us.ibm.com> Cc: Daniel Walker <dwalker@fifo99.com> LKML-Reference: <20090814134810.711836357@de.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
2ba2a305 |
|
14-Aug-2009 |
Martin Schwidefsky <schwidefsky@de.ibm.com> |
timekeeping: Add timekeeper read_clock helper functions Add timekeeper_read_clock_ntp and timekeeper_read_clock_raw and use them for getnstimeofday, ktime_get, ktime_get_ts and getrawmonotonic. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Acked-by: John Stultz <johnstul@us.ibm.com> Cc: Daniel Walker <dwalker@fifo99.com> LKML-Reference: <20090814134810.435105711@de.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
0a544198 |
|
14-Aug-2009 |
Martin Schwidefsky <schwidefsky@de.ibm.com> |
timekeeping: Move NTP adjusted clock multiplier to struct timekeeper The clocksource structure has two multipliers, the unmodified multiplier clock->mult_orig and the NTP corrected multiplier clock->mult. The NTP multiplier is misplaced in the struct clocksource, this is private information of the timekeeping code. Add the mult field to the struct timekeeper to contain the NTP corrected value, keep the unmodifed multiplier in clock->mult and remove clock->mult_orig. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Acked-by: John Stultz <johnstul@us.ibm.com> Cc: Daniel Walker <dwalker@fifo99.com> LKML-Reference: <20090814134810.149047645@de.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
23ce7211 |
|
14-Aug-2009 |
Martin Schwidefsky <schwidefsky@de.ibm.com> |
timekeeping: Add xtime_shift and ntp_error_shift to struct timekeeper The xtime_nsec value in the timekeeper structure is shifted by a few bits to improve precision. This happens to be the same value as the clock->shift. To improve readability add xtime_shift to the timekeeper and use it instead of the clock->shift. Likewise add ntp_error_shift and replace all (NTP_SCALE_SHIFT - clock->shift) expressions. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Acked-by: John Stultz <johnstul@us.ibm.com> Cc: Daniel Walker <dwalker@fifo99.com> LKML-Reference: <20090814134809.871899606@de.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
155ec602 |
|
14-Aug-2009 |
Martin Schwidefsky <schwidefsky@de.ibm.com> |
timekeeping: Introduce struct timekeeper Add struct timekeeper to keep the internal values timekeeping.c needs in regard to the currently selected clock source. This moves the timekeeping intervals, xtime_nsec and the ntp error value from struct clocksource to struct timekeeper. The raw_time is removed from the clocksource as well. It gets treated like xtime as a global variable. Eventually xtime raw_time should be moved to struct timekeeper. [ tglx: minor cleanup ] Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Acked-by: John Stultz <johnstul@us.ibm.com> Cc: Daniel Walker <dwalker@fifo99.com> LKML-Reference: <20090814134809.613209842@de.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
f1b82746 |
|
14-Aug-2009 |
Martin Schwidefsky <schwidefsky@de.ibm.com> |
clocksource: Cleanup clocksource selection If a non high-resolution clocksource is first set as override clock and then registered it becomes active even if the system is in one-shot mode. Move the override check from sysfs_override_clocksource to the clocksource selection. That fixes the bug and simplifies the code. The check in clocksource_register for double registration of the same clocksource is removed without replacement. To find the initial clocksource a new weak function in jiffies.c is defined that returns the jiffies clocksource. The architecture code can then override the weak function with a more suitable clocksource, e.g. the TOD clock on s390. [ tglx: Folded in a fix from John Stultz ] Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Acked-by: John Stultz <johnstul@us.ibm.com> Cc: Daniel Walker <dwalker@fifo99.com> LKML-Reference: <20090814134808.388024160@de.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
1be39679 |
|
14-Aug-2009 |
Martin Schwidefsky <schwidefsky@de.ibm.com> |
timekeeping: Move reset of cycle_last for tsc clocksource to tsc change_clocksource resets the cycle_last value to zero then sets it to a value read from the clocksource. The reset to zero is required only for the TSC clocksource to make the read_tsc function work after a resume. The reason is that the TSC read function uses cycle_last to detect backwards going TSCs. In the resume case cycle_last contains the TSC value from the last update before the suspend. On resume the TSC starts counting from 0 again and would trip over the cycle_last comparison. This is subtle and surprising. Move the reset to a resume function in the tsc code. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: John Stultz <johnstul@us.ibm.com> Cc: Daniel Walker <dwalker@fifo99.com> LKML-Reference: <20090814134808.142191175@de.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
a0f7d48b |
|
14-Aug-2009 |
Martin Schwidefsky <schwidefsky@de.ibm.com> |
timekeeping: Remove clocksource inline functions The three inline functions clocksource_read, clocksource_enable and clocksource_disable are simple wrappers of an indirect call plus the copy from and to the mult_orig value. The functions are exclusively used by the timekeeping code which has intimate knowledge of the clocksource anyway. Therefore remove the inline functions. No functional change. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Acked-by: John Stultz <johnstul@us.ibm.com> Cc: Daniel Walker <dwalker@fifo99.com> LKML-Reference: <20090814134807.903108946@de.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
31089c13 |
|
14-Aug-2009 |
John Stultz <johnstul@us.ibm.com> |
timekeeping: Introduce timekeeping_leap_insert Move the adjustment of xtime, wall_to_monotonic and the update of the vsyscall variables to the timekeeping code. Signed-off-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> LKML-Reference: <20090814134807.609730216@de.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
a40f262c |
|
07-Jul-2009 |
Thomas Gleixner <tglx@linutronix.de> |
timekeeping: Move ktime_get() functions to timekeeping.c The ktime_get() functions for GENERIC_TIME=n are still located in hrtimer.c. Move them to time/timekeeping.c where they belong. LKML-Reference: <new-submission> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
951ed4d3 |
|
07-Jul-2009 |
Martin Schwidefsky <schwidefsky@de.ibm.com> |
timekeeping: optimized ktime_get[_ts] for GENERIC_TIME=y The generic ktime_get function defined in kernel/hrtimer.c is suboptimial for GENERIC_TIME=y: 0) | ktime_get() { 0) | ktime_get_ts() { 0) | getnstimeofday() { 0) | read_tod_clock() { 0) 0.601 us | } 0) 1.938 us | } 0) | set_normalized_timespec() { 0) 0.602 us | } 0) 4.375 us | } 0) 5.523 us | } Overall there are two read_seqbegin/read_seqretry loops and a lot of unnecessary struct timespec calculations. ktime_get returns a nano second value which is the sum of xtime, wall_to_monotonic and the nano second delta from the clock source. ktime_get can be optimized for GENERIC_TIME=y. The new version only calls clocksource_read: 0) | ktime_get() { 0) | read_tod_clock() { 0) 0.610 us | } 0) 1.977 us | } It uses a single read_seqbegin/readseqretry loop and just adds everthing to a nano second value. ktime_get_ts is optimized in a similar fashion. [ tglx: added WARN_ON(timekeeping_suspended) as in getnstimeofday() ] Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Acked-by: john stultz <johnstul@us.ibm.com> LKML-Reference: <20090707112728.3005244d@skybase> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
dce48a84 |
|
11-Apr-2009 |
Thomas Gleixner <tglx@linutronix.de> |
sched, timers: move calc_load() to scheduler Dimitri Sivanich noticed that xtime_lock is held write locked across calc_load() which iterates over all online CPUs. That can cause long latencies for xtime_lock readers on large SMP systems. The load average calculation is an rough estimate anyway so there is no real need to protect the readers vs. the update. It's not a problem when the avenrun array is updated while a reader copies the values. Instead of iterating over all online CPUs let the scheduler_tick code update the number of active tasks shortly before the avenrun update happens. The avenrun update itself is handled by the CPU which calls do_timer(). [ Impact: reduce xtime_lock write locked section ] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra <peterz@infradead.org>
|
#
7d27558c |
|
01-May-2009 |
John Stultz <johnstul@us.ibm.com> |
timekeeping: create arch_gettimeoffset infrastructure Some arches don't supply their own clocksource. This is mainly the case in architectures that get their inter-tick times by reading the counter on their interval timer. Since these timers wrap every tick, they're not really useful as clocksources. Wrapping them to act like one is possible but not very efficient. So we provide a callout these arches can implement for use with the jiffies clocksource to provide finer then tick granular time. [ Impact: ease the migration to generic time keeping ] Signed-off-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
4614e6ad |
|
21-Apr-2009 |
Magnus Damm <damm@igel.co.jp> |
clocksource: add enable() and disable() callbacks Add enable() and disable() callbacks for clocksources. This allows us to put unused clocksources in power save mode. The functions clocksource_enable() and clocksource_disable() wrap the callbacks and are inserted in the timekeeping code to enable before use and disable after switching to a new clocksource. Signed-off-by: Magnus Damm <damm@igel.co.jp> Acked-by: John Stultz <johnstul@us.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
1c5745aa |
|
22-Dec-2008 |
Thomas Gleixner <tglx@linutronix.de> |
sched_clock: prevent scd->clock from moving backwards, take #2 Redo: 5b7dba4: sched_clock: prevent scd->clock from moving backwards which had to be reverted due to s2ram hangs: ca7e716: Revert "sched_clock: prevent scd->clock from moving backwards" ... this time with resume restoring GTOD later in the sequence taken into account as well. The "timekeeping_suspended" flag is not very nice but we cannot call into GTOD before it has been properly resumed and the scheduler will run very early in the resume sequence. Cc: <stable@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
6c9bacb4 |
|
01-Dec-2008 |
John Stultz <johnstul@us.ibm.com> |
time: catch xtime_nsec underflows and fix them Impact: fix time warp bug Alex Shi, along with Yanmin Zhang have been noticing occasional time inconsistencies recently. Through their great diagnosis, they found that the xtime_nsec value used in update_wall_time was occasionally going negative. After looking through the code for awhile, I realized we have the possibility for an underflow when three conditions are met in update_wall_time(): 1) We have accumulated a second's worth of nanoseconds, so we incremented xtime.tv_sec and appropriately decrement xtime_nsec. (This doesn't cause xtime_nsec to go negative, but it can cause it to be small). 2) The remaining offset value is large, but just slightly less then cycle_interval. 3) clocksource_adjust() is speeding up the clock, causing a corrective amount (compensating for the increase in the multiplier being multiplied against the unaccumulated offset value) to be subtracted from xtime_nsec. This can cause xtime_nsec to underflow. Unfortunately, since we notify the NTP subsystem via second_overflow() whenever we accumulate a full second, and this effects the error accumulation that has already occured, we cannot simply revert the accumulated second from xtime nor move the second accumulation to after the clocksource_adjust call without a change in behavior. This leaves us with (at least) two options: 1) Simply return from clocksource_adjust() without making a change if we notice the adjustment would cause xtime_nsec to go negative. This would work, but I'm concerned that if a large adjustment was needed (due to the error being large), it may be possible to get stuck with an ever increasing error that becomes too large to correct (since it may always force xtime_nsec negative). This may just be paranoia on my part. 2) Catch xtime_nsec if it is negative, then add back the amount its negative to both xtime_nsec and the error. This second method is consistent with how we've handled earlier rounding issues, and also has the benefit that the error being added is always in the oposite direction also always equal or smaller then the correction being applied. So the risk of a corner case where things get out of control is lessened. This patch fixes bug 11970, as tested by Yanmin Zhang http://bugzilla.kernel.org/show_bug.cgi?id=11970 Reported-by: alex.shi@intel.com Signed-off-by: John Stultz <johnstul@us.ibm.com> Acked-by: "Zhang, Yanmin" <yanmin_zhang@linux.intel.com> Tested-by: "Zhang, Yanmin" <yanmin_zhang@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
5cd1c9c5 |
|
22-Sep-2008 |
Roman Zippel <zippel@linux-m68k.org> |
timekeeping: fix rounding problem during clock update Due to a rounding problem during a clock update it's possible for readers to observe the clock jumping back by 1nsec. The following simplified example demonstrates the problem: cycle xtime 0 0 1000 999999.6 2000 1999999.2 3000 2999998.8 ... 1500 = 1499999.4 = 0.0 + 1499999.4 = 999999.6 + 499999.8 When reading the clock only the full nanosecond part is used, while timekeeping internally keeps nanosecond fractions. If the clock is now updated at cycle 1500 here, a nanosecond is missing due to the truncation. The simple fix is to round up the xtime value during the update, this also changes the distance to the reference time, but the adjustment will automatically take care that it stays under control. Signed-off-by: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
2d42244a |
|
20-Aug-2008 |
John Stultz <johnstul@us.ibm.com> |
clocksource: introduce CLOCK_MONOTONIC_RAW In talking with Josip Loncaric, and his work on clock synchronization (see btime.sf.net), he mentioned that for really close synchronization, it is useful to have access to "hardware time", that is a notion of time that is not in any way adjusted by the clock slewing done to keep close time sync. Part of the issue is if we are using the kernel's ntp adjusted representation of time in order to measure how we should correct time, we can run into what Paul McKenney aptly described as "Painting a road using the lines we're painting as the guide". I had been thinking of a similar problem, and was trying to come up with a way to give users access to a purely hardware based time representation that avoided users having to know the underlying frequency and mask values needed to deal with the wide variety of possible underlying hardware counters. My solution is to introduce CLOCK_MONOTONIC_RAW. This exposes a nanosecond based time value, that increments starting at bootup and has no frequency adjustments made to it what so ever. The time is accessed from userspace via the posix_clock_gettime() syscall, passing CLOCK_MONOTONIC_RAW as the clock_id. Signed-off-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
9a055117 |
|
20-Aug-2008 |
Roman Zippel <zippel@linux-m68k.org> |
clocksource: introduce clocksource_forward_now() To keep the raw monotonic patch simple first introduce clocksource_forward_now(), which takes care of the offset since the last update_wall_time() call and adds it to the clock, so there is no need anymore to deal with it explicitly at various places, which need to make significant changes to the clock. This is also gets rid of the timekeeping_suspend_nsecs, instead of waiting until resume, the value is accumulated during suspend. In the end there is only a single user of __get_nsec_offset() left, so I integrated it back to getnstimeofday(). Signed-off-by: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
7dffa3c6 |
|
01-May-2008 |
Roman Zippel <zippel@linux-m68k.org> |
ntp: handle leap second via timer Remove the leap second handling from second_overflow(), which doesn't have to check for it every second anymore. With CONFIG_NO_HZ this also makes sure the leap second is handled close to the full second. Additionally this makes it possible to abort a leap second properly by resetting the STA_INS/STA_DEL status bits. Signed-off-by: Roman Zippel <zippel@linux-m68k.org> Cc: john stultz <johnstul@us.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
8383c423 |
|
01-May-2008 |
Roman Zippel <zippel@linux-m68k.org> |
ntp: remove current_tick_length() current_tick_length used to do a little more, but now it just returns tick_length, which we can also access directly at the few places, where it's needed. Signed-off-by: Roman Zippel <zippel@linux-m68k.org> Cc: john stultz <johnstul@us.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
7fc5c784 |
|
01-May-2008 |
Roman Zippel <zippel@linux-m68k.org> |
ntp: rename TICK_LENGTH_SHIFT to NTP_SCALE_SHIFT As TICK_LENGTH_SHIFT is used for more than just the tick length, the name isn't quite approriate anymore, so this renames it to NTP_SCALE_SHIFT. Signed-off-by: Roman Zippel <zippel@linux-m68k.org> Cc: john stultz <johnstul@us.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
d8bb6f4c |
|
01-Apr-2008 |
Thomas Gleixner <tglx@linutronix.de> |
x86: tsc prevent time going backwards We already catch most of the TSC problems by sanity checks, but there is a subtle bug which has been in the code forever. This can cause time jumps in the range of hours. This was reported in: http://lkml.org/lkml/2007/8/23/96 and http://lkml.org/lkml/2008/3/31/23 I was able to reproduce the problem with a gettimeofday loop test on a dual core and a quad core machine which both have sychronized TSCs. The TSCs seems not to be perfectly in sync though, but the kernel is not able to detect the slight delta in the sync check. Still there exists an extremly small window where this delta can be observed with a real big time jump. So far I was only able to reproduce this with the vsyscall gettimeofday implementation, but in theory this might be observable with the syscall based version as well. CPU 0 updates the clock source variables under xtime/vyscall lock and CPU1, where the TSC is slighty behind CPU0, is reading the time right after the seqlock was unlocked. The clocksource reference data was updated with the TSC from CPU0 and the value which is read from TSC on CPU1 is less than the reference data. This results in a huge delta value due to the unsigned subtraction of the TSC value and the reference value. This algorithm can not be changed due to the support of wrapping clock sources like pm timer. The huge delta is converted to nanoseconds and added to xtime, which is then observable by the caller. The next gettimeofday call on CPU1 will show the correct time again as now the TSC has advanced above the reference value. To prevent this TSC specific wreckage we need to compare the TSC value against the reference value and return the latter when it is larger than the actual TSC value. I pondered to mark the TSC unstable when the readout is smaller than the reference value, but this would render an otherwise good and fast clocksource unusable without a real good reason. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
92896bd9 |
|
24-Mar-2008 |
Linus Torvalds <torvalds@linux-foundation.org> |
Don't 'printk()' while holding xtime lock for writing The printk() can deadlock because it can wake up klogd(), and task enqueueing will try to read the time in order to set a hrtimer. Reported-by: Marcin Slusarz <marcin.slusarz@gmail.com> Debugged-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
10a398d0 |
|
04-Mar-2008 |
Roman Zippel <zippel@linux-m68k.org> |
time: remove obsolete CLOCK_TICK_ADJUST The first version of the ntp_interval/tick_length inconsistent usage patch was recently merged as bbe4d18ac2e058c56adb0cd71f49d9ed3216a405 http://git.kernel.org/gitweb.cgi?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=bbe4d18ac2e058c56adb0cd71f49d9ed3216a405 While the fix did greatly improve the situation, it was correctly pointed out by Roman that it does have a small bug: If the users change clocksources after the system has been running and NTP has made corrections, the correctoins made against the old clocksource will be applied against the new clocksource, causing error. The second attempt, which corrects the issue in the NTP_INTERVAL_LENGTH definition has also made it up-stream as commit e13a2e61dd5152f5499d2003470acf9c838eab84 http://git.kernel.org/gitweb.cgi?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=e13a2e61dd5152f5499d2003470acf9c838eab84 Roman has correctly pointed out that CLOCK_TICK_ADJUST is calculated based on the PIT's frequency, and isn't really relevant to non-PIT driven clocksources (that is, clocksources other then jiffies and pit). This patch reverts both of those changes, and simply removes CLOCK_TICK_ADJUST. This does remove the granularity error correction for users of PIT and Jiffies clocksource users, but the granularity error but for the majority of users, it should be within the 500ppm range NTP can accommodate for. For systems that have granularity errors greater then 500ppm, the "ntp_tick_adj=" boot option can be used to compensate. [johnstul@us.ibm.com: provided changelog] [mattilinnanvuori@yahoo.com: maek ntp_tick_adj static] Signed-off-by: Roman Zippel <zippel@linux-m68k.org> Acked-by: john stultz <johnstul@us.ibm.com> Signed-off-by: Matti Linnanvuori <mattilinnanvuori@yahoo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: mingo@elte.hu Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
3eb05676 |
|
08-Feb-2008 |
Li Zefan <lizf@cn.fujitsu.com> |
time: fix typo in comments Fix typo in comments. BTW: I have to fix coding style in arch/ia64/kernel/time.c also, otherwise checkpatch.pl will be complaining. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: john stultz <johnstul@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
cf4fc6cb |
|
08-Feb-2008 |
Li Zefan <lizf@cn.fujitsu.com> |
timekeeping: rename timekeeping_is_continuous to timekeeping_valid_for_hres Function timekeeping_is_continuous() no longer checks flag CLOCK_IS_CONTINUOUS, and it checks CLOCK_SOURCE_VALID_FOR_HRES now. So rename the function accordingly. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: john stultz <johnstul@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
1001d0a9 |
|
01-Feb-2008 |
Thomas Gleixner <tglx@linutronix.de> |
timekeeping: update xtime_cache when time(zone) changes xtime_cache needs to be updated whenever xtime and or wall_to_monotic are changed. Otherwise users of xtime_cache might see a stale (and in the case of timezone changes utterly wrong) value until the next update happens. Fixup the obvious places, which miss this update. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: John Stultz <johnstul@us.ibm.com> Tested-by: Dhaval Giani <dhaval@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
bbe4d18a |
|
30-Jan-2008 |
John Stultz <johnstul@us.ibm.com> |
NTP: correct inconsistent ntp interval/tick_length usage I recently noticed on one of my boxes that when synched with an NTP server, the drift value reported for the system was ~283ppm. While in some cases, clock hardware can be that bad, it struck me as unusual as the system was using the acpi_pm clocksource, which is one of the more trustworthy and accurate clocksources on x86 hardware. I brought up another system and let it sync to the same NTP server, and I noticed a similar 280some ppm drift. In looking at the code, I found that the acpi_pm's constant frequency was being computed correctly at boot-up, however once the system was up, even without the ntp daemon running, the clocksource's frequency was being modified by the clocksource_adjust() function. Digging deeper, I realized that in the code that keeps track of how much the clocksource is skewing from the ntp desired time, we were using different lengths to establish how long an time interval was. The clocksource was being setup with the following interval: NTP_INTERVAL_LENGTH = NSEC_PER_SEC/NTP_INTERVAL_FREQ While the ntp code was using the tick_length_base value: tick_length_base ~= (tick_usec * NSEC_PER_USEC * USER_HZ) /NTP_INTERVAL_FREQ The subtle difference is: (tick_usec * NSEC_PER_USEC * USER_HZ) != NSEC_PER_SEC This difference in calculation was causing the clocksource correction code to apply a correction factor to the clocksource so the two intervals were the same, however this results in the actual frequency of the clocksource to be made incorrect. I believe this difference would affect all clocksources, although to differing degrees depending on the clocksource resolution. The issue was introduced when my HZ free ntp patch landed in 2.6.21-rc1, so my apologies for the mistake, and for not noticing it until now. The following patch, corrects the clocksource's initialization code so it uses the same interval length as the code in ntp.c. After applying this patch, the drift value for the same system went from ~283ppm to only 2.635ppm. I believe this patch to be good, however it does affect all arches and I've only tested on x86, so some caution is advised. I do think it would be a likely candidate for a stable 2.6.24.x release. Any thoughts or feedback would be appreciated. Signed-off-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
efd9ac86 |
|
30-Jan-2008 |
Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com> |
time: fold __get_realtime_clock_ts() into getnstimeofday() - getnstimeofday() was just a wrapper around __get_realtime_clock_ts() - Replace calls to __get_realtime_clock_ts() by calls to getnstimeofday() - Fix bogus reference to get_realtime_clock_ts(), which never existed Signed-off-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com> Cc: john stultz <johnstul@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
af5ca3f4 |
|
19-Dec-2007 |
Kay Sievers <kay.sievers@vrfy.org> |
Driver core: change sysdev classes to use dynamic kobject names All kobjects require a dynamically allocated name now. We no longer need to keep track if the name is statically assigned, we can just unconditionally free() all kobject names on cleanup. Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
#
ba2a631b |
|
17-Oct-2007 |
Adrian Bunk <bunk@kernel.org> |
kernel/time/timekeeping.c: cleanups - remove the no longer required __attribute__((weak)) of xtime_lock - remove the following no longer used EXPORT_SYMBOL's: - xtime - xtime_lock Signed-off-by: Adrian Bunk <bunk@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: john stultz <johnstul@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
f20bf612 |
|
16-Oct-2007 |
Ingo Molnar <mingo@elte.hu> |
time: introduce xtime_seconds improve performance of sys_time(). sys_time() returns time in seconds, but it does so by calling do_gettimeofday() and then returning the tv_sec portion of the GTOD time. But the data structure "xtime", which is updated by every timer/scheduler tick, already offers HZ granularity time. the patch improves the sysbench oltp macrobenchmark by 4-5% on an AMD dual-core system: v2.6.23: #threads 1: transactions: 4073 (407.23 per sec.) 2: transactions: 8530 (852.81 per sec.) 3: transactions: 8321 (831.88 per sec.) 4: transactions: 8407 (840.58 per sec.) 5: transactions: 8070 (806.74 per sec.) v2.6.23 + sys_time-speedup.patch: 1: transactions: 4281 (428.09 per sec.) 2: transactions: 8910 (890.85 per sec.) 3: transactions: 8659 (865.79 per sec.) 4: transactions: 8676 (867.34 per sec.) 5: transactions: 8532 (852.91 per sec.) and by 4-5% on an Intel dual-core system too: 2.6.23: 1: transactions: 4560 (455.94 per sec.) 2: transactions: 10094 (1009.30 per sec.) 3: transactions: 9755 (975.36 per sec.) 4: transactions: 9859 (985.78 per sec.) 5: transactions: 9701 (969.72 per sec.) 2.6.23 + sys_time-speedup.patch: 1: transactions: 4779 (477.84 per sec.) 2: transactions: 10103 (1010.14 per sec.) 3: transactions: 10141 (1013.93 per sec.) 4: transactions: 10371 (1036.89 per sec.) 5: transactions: 10178 (1017.50 per sec.) (the more CPUs the system has, the more speedup this patch gives for this particular workload.) Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
6a669ee8 |
|
16-Sep-2007 |
Thomas Gleixner <tglx@linutronix.de> |
timekeeping: Prevent time going backwards on resume Timekeeping resume adjusts xtime by adding the slept time in seconds and resets the reference value of the clock source (clock->cycle_last). clock->cycle last is used to calculate the delta between the last xtime update and the readout of the clock source in __get_nsec_offset(). xtime plus the offset is the current time. The resume code ignores the delta which had already elapsed between the last xtime update and the actual time of suspend. If the suspend time is short, then we can see time going backwards on resume. Suspend: offs_s = clock->read() - clock->cycle_last; now = xtime + offs_s; timekeeping_suspend_time = read_rtc(); Resume: sleep_time = read_rtc() - timekeeping_suspend_time; xtime.tv_sec += sleep_time; clock->cycle_last = clock->read(); offs_r = clock->read() - clock->cycle_last; now = xtime + offs_r; if sleep_time_seconds == 0 and offs_r < offs_s, then time goes backwards. Fix this by storing the offset from the last xtime update and add it to xtime during resume, when we reset clock->cycle_last: sleep_time = read_rtc() - timekeeping_suspend_time; xtime.tv_sec += sleep_time; xtime += offs_s; /* Fixup xtime offset at suspend time */ clock->cycle_last = clock->read(); offs_r = clock->read() - clock->cycle_last; now = xtime + offs_r; Thanks to Marcelo for tracking this down on the OLPC and providing the necessary details to analyze the root cause. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: John Stultz <johnstul@us.ibm.com> Cc: Tosatti <marcelo@kvack.org>
|
#
3be90950 |
|
16-Sep-2007 |
Thomas Gleixner <tglx@linutronix.de> |
timekeeping: access rtc outside of xtime lock Lockdep complains about the access of rtc in timekeeping_suspend inside the interrupt disabled region of the write locked xtime lock. Move the access outside. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: John Stultz <johnstul@us.ibm.com>
|
#
17c38b74 |
|
24-Jul-2007 |
John Stultz <johnstul@us.ibm.com> |
Cache xtime every call to update_wall_time This avoids xtime lag seen with dynticks, because while 'xtime' itself is still not updated often, we keep a 'xtime_cache' variable around that contains the approximate real-time that _is_ updated each time we do a 'update_wall_time()', and is thus never off by more than one tick. IOW, this restores the original semantics for 'xtime' users, as long as you use the proper abstraction functions (ie 'current_kernel_time()' or 'get_seconds()' depending on whether you want a timespec or just the seconds field). [ Updated Patch. As penance for my sins I've also yanked another #ifdef that was added to avoid the xtime lag w/ hrtimers. ] Signed-off-by: John Stultz <johnstul@us.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
2c6b47de |
|
24-Jul-2007 |
John Stultz <johnstul@us.ibm.com> |
Cleanup non-arch xtime uses, use get_seconds() or current_kernel_time(). This avoids use of the kernel-internal "xtime" variable directly outside of the actual time-related functions. Instead, use the helper functions that we already have available to us. This doesn't actually change any behaviour, but this will allow us to fix the fact that "xtime" isn't updated very often with CONFIG_NO_HZ (because much of the realtime information is maintained as separate offsets to 'xtime'), which has caused interfaces that use xtime directly to get a time that is out of sync with the real-time clock by up to a third of a second or so. Signed-off-by: John Stultz <johnstul@us.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
1f564ad6 |
|
18-Jul-2007 |
Bob Picco <bob.picco@hp.com> |
[IA64] remove time interpolator Remove time_interpolator code (This is generic code, but only user was ia64. It has been superseded by the CONFIG_GENERIC_TIME code). Signed-off-by: Bob Picco <bob.picco@hp.com> Signed-off-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: Peter Keilty <peter.keilty@hp.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
|
#
71120f18 |
|
19-Jul-2007 |
Thomas Gleixner <tglx@linutronix.de> |
timekeeping: fixup shadow variable argument clocksource_adjust() has a clock argument, which shadows the file global clock variable. Fix this up. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: john stultz <johnstul@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
7c3f1a57 |
|
16-Jul-2007 |
Tomas Janousek <tjanouse@redhat.com> |
Introduce boot based time The commits 411187fb05cd11676b0979d9fbf3291db69dbce2 (GTOD: persistent clock support) c1d370e167d66b10bca3b602d3740405469383de (i386: use GTOD persistent clock support) changed the monotonic time so that it no longer jumps after resume, but it's not possible to use it for boot time and process start time calculations then. Also, the uptime no longer increases during suspend. I add a variable to track the wall_to_monotonic changes, a function to get the real boot time and a function to get the boot based time from the monotonic one. [akpm@linux-foundation.org: remove exports, add comment] Signed-off-by: Tomas Janousek <tjanouse@redhat.com> Cc: Tomas Smetana <tsmetana@redhat.com> Cc: John Stultz <johnstul@us.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
d10ff3fb |
|
14-May-2007 |
Thomas Gleixner <tglx@linutronix.de> |
timekeeping fix patch got mis-applied The time keeping code move to kernel/time/timekeeping.c broke the clocksource resume logic patch, which got applied to the old file by a fuzzy application. Fix it up and move the clocksource_resume() call to the appropriate place. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> [ tssk, tssk, everybody should use --fuzz=0 ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
8524070b |
|
08-May-2007 |
John Stultz <johnstul@us.ibm.com> |
Move timekeeping code to timekeeping.c Move the timekeeping code out of kernel/timer.c and into kernel/time/timekeeping.c. I made no cleanups or other changes in transit. [akpm@linux-foundation.org: build fix] Signed-off-by: John Stultz <johnstul@us.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|