History log of /linux-master/kernel/time/clockevents.c
Revision Date Author Comments
# 49f1ff50 04-Feb-2024 Ricardo B. Marliere <ricardo@marliere.net>

clockevents: Make clockevents_subsys const

Now that the driver core can properly handle constant struct bus_type,
move the clockevents_subsys variable to be a constant structure as well,
placing it into read-only memory which can not be modified at runtime.

Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ricardo B. Marliere <ricardo@marliere.net>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lore.kernel.org/r/20240204-bus_cleanup-time-v1-2-207ec18e24b8@marliere.net


# ebe11732 02-Nov-2022 Lukas Bulwahn <lukas.bulwahn@gmail.com>

clockevents: Repair kernel-doc for clockevent_delta2ns()

Since the introduction of clockevents, i.e., commit d316c57ff6bf
("clockevents: add core functionality"), there has been a mismatch between
the function and the kernel-doc comment for clockevent_delta2ns().

Hence, ./scripts/kernel-doc -none kernel/time/clockevents.c warns about it.

Adjust the kernel-doc comment for clockevent_delta2ns() for make W=1
happiness.

Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20221102091048.15068-1-lukas.bulwahn@gmail.com


# 2966a991 31-Mar-2022 Jakob Koschel <jakobkoschel@gmail.com>

clockevents: Use dedicated list iterator variable

To move the list iterator variable into the list_for_each_entry_*()
macro in the future it should be avoided to use the list iterator
variable after the loop body.

To *never* use the list iterator variable after the loop it was
concluded to use a separate iterator variable.

Signed-off-by: Jakob Koschel <jakobkoschel@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/CAHk-=wgRr_D8CB-D9Kg-c=EHreAsk5SqXPwr9Y7k9sA6cWXJ6w@mail.gmail.com/
Link: https://lore.kernel.org/r/20220331215707.883957-1-jakobkoschel@gmail.com


# 4e82d2e2 09-Jun-2021 Baokun Li <libaokun1@huawei.com>

clockevents: Use list_move() instead of list_del()/list_add()

Simplify the code.

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20210609070242.1322450-1-libaokun1@huawei.com


# 64ab7071 07-Jun-2021 Baokun Li <libaokun1@huawei.com>

clockevents: Add missing parameter documentation

Add the missing documentation for the @cpu parameter of
tick_cleanup_dead_cpu().

Signed-off-by: Baokun Li <libaokun1@huawei.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20210608024305.2750999-1-libaokun1@huawei.com


# 1fa98d96 23-May-2021 YueHaibing <yuehaibing@huawei.com>

clockevents: Use DEVICE_ATTR_[RO|WO] macros

Use the DEVICE_ATTR_[RO|WO] helpers instead of plain DEVICE_ATTR, which
makes the code a bit shorter and easier to read.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20210523065825.19684-1-yuehaibing@huawei.com


# 1b72d432 21-Mar-2019 Thomas Gleixner <tglx@linutronix.de>

tick: Remove outgoing CPU from broadcast masks

Valentin reported that unplugging a CPU occasionally results in a warning
in the tick broadcast code which is triggered when an offline CPU is in the
broadcast mask.

This happens because the outgoing CPU is not removing itself from the
broadcast masks, especially not from the broadcast_force_mask. The removal
happens on the control CPU after the outgoing CPU is dead. It's a long
standing issue, but the warning is harmless.

Rework the hotplug mechanism so that the outgoing CPU removes itself from
the broadcast masks after disabling interrupts and removing itself from the
online mask.

Reported-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Valentin Schneider <valentin.schneider@arm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1903211540180.1784@nanos.tec.linutronix.de


# f49c174b 31-Oct-2018 Thomas Gleixner <tglx@linutronix.de>

hrtimers/tick/clockevents: Remove sloppy license references

"For licencing details see kernel-base/COPYING" and similar license
references have no value over the SPDX identifier. Remove them.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: John Stultz <john.stultz@linaro.org>
Acked-by: Corey Minyard <cminyard@mvista.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Peter Anvin <hpa@zytor.com>
Cc: Russell King <rmk+kernel@armlinux.org.uk>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Nicolas Pitre <nicolas.pitre@linaro.org>
Cc: David Riley <davidriley@chromium.org>
Cc: Colin Cross <ccross@android.com>
Cc: Mark Brown <broonie@kernel.org>
Link: https://lkml.kernel.org/r/20181031182252.963632760@linutronix.de


# 35728b82 31-Oct-2018 Thomas Gleixner <tglx@linutronix.de>

time: Add SPDX license identifiers

Update the time(r) core files files with the correct SPDX license
identifier based on the license text in the file itself. The SPDX
identifier is a legally binding shorthand, which can be used instead of the
full boiler plate text.

This work is based on a script and data from Philippe Ombredanne, Kate
Stewart and myself. The data has been created with two independent license
scanners and manual inspection.

The following files do not contain any direct license information and have
been omitted from the big initial SPDX changes:

timeconst.bc: The .bc files were not touched
time.c, timer.c, timekeeping.c: Licence was deduced from EXPORT_SYMBOL_GPL

As those files do not contain direct license references they fall under the
project license, i.e. GPL V2 only.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: John Stultz <john.stultz@linaro.org>
Acked-by: Corey Minyard <cminyard@mvista.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Russell King <rmk+kernel@armlinux.org.uk>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Nicolas Pitre <nicolas.pitre@linaro.org>
Cc: David Riley <davidriley@chromium.org>
Cc: Colin Cross <ccross@android.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Link: https://lkml.kernel.org/r/20181031182252.879109557@linutronix.de


# 58c5fc2b 31-Oct-2018 Thomas Gleixner <tglx@linutronix.de>

time: Remove useless filenames in top level comments

Remove the pointless filenames in the top level comments. They have no
value at all and just occupy space. While at it tidy up some of the
comments and remove a stale one.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Nicolas Pitre <nico@linaro.org>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: John Stultz <john.stultz@linaro.org>
Acked-by: Corey Minyard <cminyard@mvista.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Peter Anvin <hpa@zytor.com>
Cc: Russell King <rmk+kernel@armlinux.org.uk>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: David Riley <davidriley@chromium.org>
Cc: Colin Cross <ccross@android.com>
Cc: Mark Brown <broonie@kernel.org>
Link: https://lkml.kernel.org/r/20181031182252.794898238@linutronix.de


# 7d9df98b 03-Nov-2018 Yangtao Li <tiny.windzz@gmail.com>

clockevents: Remove unnecessary unlikely()

WARN_ON() and WARN_ON_ONCE() already contains an unlikely(), so it's not
necessary to use unlikely.

Signed-off-by: Yangtao Li <tiny.windzz@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20181104023104.2572-1-tiny.windzz@gmail.com


# fbfa9260 10-Jul-2018 Sudeep Holla <sudeep.holla@arm.com>

clockevents: Warn if cpu_all_mask is used as cpumask

Using cpu_all_mask in clockevents cpumask may result in issues while
comparing multiple clockevent devices to choose the preferred one.

On one of the platforms with 2 system (i.e. non per-CPU) timers with
different ratings, having cpu_all_mask for one of the device resulted in a
boot hang due to a endless loop in clockevents_notify_released() as both
were clocksources were selected as preferred.

In order to prevent such issues in the future, warn if any clockevent
driver sets cpu_all_mask as it's cpumask and just override it to use
cpu_possible_mask. All the existing occurrences of cpu_all_mask are already
replaced with cpu_possible_mask.

Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lkml.kernel.org/r/1531308264-24220-3-git-send-email-sudeep.holla@arm.com


# 3a29ddb1 19-Oct-2017 James Hogan <jhogan@kernel.org>

clockevents: Retry programming min delta up to 10 times

When CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=n, the call path
hrtimer_reprogram -> clockevents_program_event ->
clockevents_program_min_delta will not retry if the clock event driver
returns -ETIME.

If the driver could not satisfy the program_min_delta for any reason, the
lack of a retry means the CPU may not receive a tick interrupt, potentially
until the counter does a full period. This leads to rcu_sched timeout
messages as the stalled CPU is detected by other CPUs, and other issues if
the CPU is holding locks or other resources at the point at which it
stalls.

There have been a couple of observed mechanisms through which a clock event
driver could not satisfy the requested min_delta and return -ETIME.

With the MIPS GIC driver, the shared execution resource within MT cores
means inconventient latency due to execution of instructions from other
hardware threads in the core, within gic_next_event, can result in an event
being set in the past.

Additionally under virtualisation it is possible to get unexpected latency
during a clockevent device's set_next_event() callback which can make it
return -ETIME even for a delta based on min_delta_ns.

It isn't appropriate to use MIN_ADJUST in the virtualisation case as
occasional hypervisor induced high latency will cause min_delta_ns to
quickly increase to the maximum.

Instead, borrow the retry pattern from the MIN_ADJUST case, but without
making adjustments. Retry up to 10 times, each time increasing the
attempted delta by min_delta, before giving up.

[ Matt: Reworked the loop and made retry increase the delta. ]

Signed-off-by: James Hogan <jhogan@kernel.org>
Signed-off-by: Matt Redfearn <matt.redfearn@mips.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mips@linux-mips.org
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: "Martin Schwidefsky" <schwidefsky@de.ibm.com>
Cc: James Hogan <james.hogan@mips.com>
Link: https://lkml.kernel.org/r/1508422643-6075-1-git-send-email-matt.redfearn@mips.com


# 0695bd99 06-Feb-2017 Nicolai Stange <nicstange@gmail.com>

clockevents: Make clockevents_config() static

A clockevent device's rate should be configured before or at registration
and changed afterwards through clockevents_update_freq() only.

For the configuration at registration, we already have
clockevents_config_and_register().

Right now, there are no clockevents_config() users outside of the
clockevents core.

To mitigiate the risk of drivers errorneously reconfiguring their rates
through clockevents_config() *after* device registration, make
clockevents_config() static.

Signed-off-by: Nicolai Stange <nicstange@gmail.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>


# 2456e855 25-Dec-2016 Thomas Gleixner <tglx@linutronix.de>

ktime: Get rid of the union

ktime is a union because the initial implementation stored the time in
scalar nanoseconds on 64 bit machine and in a endianess optimized timespec
variant for 32bit machines. The Y2038 cleanup removed the timespec variant
and switched everything to scalar nanoseconds. The union remained, but
become completely pointless.

Get rid of the union and just keep ktime_t as simple typedef of type s64.

The conversion was done with coccinelle and some manual mopping up.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>


# 775be506 17-Jun-2016 Ben Dooks <ben.dooks@codethink.co.uk>

clockevents: Make clockevents_subsys static

The clockevents_subsys struct is used for sysfs support and
is not declared or used outside the file it is defined in.
Fix the following warning by making it static:

kernel/time/clockevents.c:648:17: warning: symbol 'clockevents_subsys' was not declared. Should it be static?

Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
Cc: linux-kernel@lists.codethink.co.uk
Link: http://lkml.kernel.org/r/1466178974-7105-1-git-send-email-ben.dooks@codethink.co.uk
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# eef7635a 10-Sep-2015 Viresh Kumar <viresh.kumar@linaro.org>

clockevents: Remove unused set_mode() callback

All users are migrated to the per-state callbacks, get rid of the
unused interface and the core support code.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: linaro-kernel@lists.linaro.org
Cc: John Stultz <john.stultz@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/fd60de14cf6d125489c031207567bb255ad946f6.1441943991.git.viresh.kumar@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 7c4a976c 07-Jul-2015 Viresh Kumar <viresh.kumar@linaro.org>

clockevents: Allow set-state callbacks to be optional

Its mandatory for the drivers to provide set_state_{oneshot|periodic}()
(only if related modes are supported) and set_state_shutdown() callbacks
today, if they are implementing the new set-state interface.

But this leads to unnecessary noop callbacks for drivers which don't
want to implement them. Over that, it will lead to a full function call
for nothing really useful.

Lets make all set-state callbacks optional.

Suggested-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: http://lkml.kernel.org/r/1436256875-15562-1-git-send-email-daniel.lezcano@linaro.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# a9d20988 17-Jun-2015 Viresh Kumar <viresh.kumar@linaro.org>

clockevents: Check state instead of mode in suspend/resume path

CLOCK_EVT_MODE_* macros are present for backward compatibility (as most
of the drivers are still using old ->set_mode() interface).

These macro's shouldn't be used anymore in code, that is common to both
driver interfaces, i.e. ->set_mode() and ->set_state_*().

Drivers implementing ->set_state_*() interface, which have their
clkevt->mode set to 0 (clkevt device structures are normally globally
defined), will not participate in suspend/resume as they will always be
marked as UNUSED.

Fix this by checking state of the clockevent device instead of mode,
which is updated for both the interfaces.

Fixes: ac34ad27fc16 ("clockevents: Do not suspend/resume if unused")
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: linaro-kernel@lists.linaro.org
Cc: alexandre.belloni@free-electrons.com
Cc: sylvain.rochet@finsecur.com
Link: http://lkml.kernel.org/r/a1964eef6e8a47d02b1ff9083c6c91f73f0ff643.1434537215.git.viresh.kumar@linaro.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 051ebd10 02-Jun-2015 Thomas Gleixner <tglx@linutronix.de>

clockevents: Use set/get state helper functions

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>


# d7eb231c 02-Jun-2015 Thomas Gleixner <tglx@linutronix.de>

clockevents: Provide functions to set and get the state

We want to rename dev->state, so provide proper get and set
functions. Rename clockevents_set_state() to
clockevents_switch_state() to avoid confusion.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>


# 472c4a94 21-May-2015 Viresh Kumar <viresh.kumar@linaro.org>

clockevents: Use helpers to check the state of a clockevent device

Use accessor functions to check the state of clockevent devices in
core code.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: linaro-kernel@lists.linaro.org
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/fa2b9869fd17f210eaa156ec2b594efd0230b6c7.1432192527.git.viresh.kumar@linaro.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# ac34ad27 16-Jan-2015 Alexandre Belloni <alexandre.belloni@bootlin.com>

clockevents: Do not suspend/resume if unused

There is no point in calling suspend/resume for unused clockevents as
they are already stopped and disabled.

This is really important for AT91 as the hardware is a trainwreck and
takes ages to synchronize.

Reported-by: Sylvain Rochet <sylvain.rochet@finsecur.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Nicolas Ferre <nicolas.ferre@atmel.com>
Cc: Boris Brezillon <boris.brezillon@free-electrons.com>
Cc: Maxime Ripard <maxime.ripard@free-electrons.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lkml.kernel.org/r/1421399151-26800-1-git-send-email-alexandre.belloni@free-electrons.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# d2540875 02-Apr-2015 Viresh Kumar <viresh.kumar@linaro.org>

clockevents: Stop unused clockevent devices

To avoid getting spurious interrupts on a tickless CPU, clockevent
device can now be stopped by switching to ONESHOT_STOPPED state.

The natural place for handling this transition is tick_program_event().

On 'expires == KTIME_MAX', we skip programming the event and so we need
to fix such call sites as well, to always call tick_program_event()
irrespective of the expires value.

Once the clockevent device is required again, check if it was earlier
put into ONESHOT_STOPPED state. If yes, switch its state to ONESHOT
before programming its event.

To make sure we haven't missed any corner case, add a WARN() for the
case where we try to reprogram clockevent device while we aren't
configured in ONESHOT_STOPPED state.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: linaro-kernel@lists.linaro.org
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Kevin Hilman <khilman@linaro.org>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/5146b07be7f0bc497e0ebae036590ec2fa73e540.1428031396.git.viresh.kumar@linaro.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 8fff52fd 02-Apr-2015 Viresh Kumar <viresh.kumar@linaro.org>

clockevents: Introduce CLOCK_EVT_STATE_ONESHOT_STOPPED state

When no timers/hrtimers are pending, the expiry time is set to a
special value: 'KTIME_MAX'. This normally happens with
NO_HZ_{IDLE|FULL} in both LOWRES/HIGHRES modes.

When 'expiry == KTIME_MAX', we either cancel the 'tick-sched' hrtimer
(NOHZ_MODE_HIGHRES) or skip reprogramming clockevent device
(NOHZ_MODE_LOWRES). But, the clockevent device is already
reprogrammed from the tick-handler for next tick.

As the clock event device is programmed in ONESHOT mode it will at
least fire one more time (unnecessarily). Timers on few
implementations (like arm_arch_timer, etc.) only support PERIODIC mode
and their drivers emulate ONESHOT over that. Which means that on these
platforms we will get spurious interrupts periodically (at last
programmed interval rate, normally tick rate).

In order to avoid spurious interrupts, the clockevent device should be
stopped or its interrupts should be masked.

A simple (yet hacky) solution to get this fixed could be: update
hrtimer_force_reprogram() to always reprogram clockevent device and
update clockevent drivers to STOP generating events (or delay it to
max time) when 'expires' is set to KTIME_MAX. But the drawback here is
that every clockevent driver has to be hacked for this particular case
and its very easy for new ones to miss this.

However, Thomas suggested to add an optional state ONESHOT_STOPPED to
solve this problem: lkml.org/lkml/2014/5/9/508.

This patch adds support for ONESHOT_STOPPED state in clockevents
core. It will only be available to drivers that implement the
state-specific callbacks instead of the legacy ->set_mode() callback.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Preeti U. Murthy <preeti@linux.vnet.ibm.com>
Cc: linaro-kernel@lists.linaro.org
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Kevin Hilman <khilman@linaro.org>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/b8b383a03ac07b13312c16850b5106b82e4245b5.1428031396.git.viresh.kumar@linaro.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 149aabcc 09-Apr-2015 Viresh Kumar <viresh.kumar@linaro.org>

clockevents: Shutdown detached clockevent device

A clockevent device is marked DETACHED when it is replaced by another
clockevent device.

The device is shutdown properly for drivers that implement legacy
->set_mode() callback, as we call ->set_mode() for CLOCK_EVT_MODE_UNUSED
as well.

But for the new per-state callback interface, we skip shutting down the
device, as we thought its an internal state change. That wasn't correct.

The effect is that the device is left programmed in oneshot or periodic
mode.

Fall-back to 'case CLOCK_EVT_STATE_SHUTDOWN', to shutdown the device.

Fixes: bd624d75db21 "clockevents: Introduce mode specific callbacks"
Reported-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: linaro-kernel@lists.linaro.org
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/eef0a91c51b74d4e52c8e5a95eca27b5a0563f07.1428650683.git.viresh.kumar@linaro.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# a49b116d 02-Apr-2015 Thomas Gleixner <tglx@linutronix.de>

clockevents: Cleanup dead cpu explicitely

clockevents_notify() is a leftover from the early design of the
clockevents facility. It's really not a notification mechanism,
it's a multiplex call. We are way better off to have explicit
calls instead of this monstrosity.

Split out the cleanup function for a dead cpu and invoke it
directly from the cpu down code. Make it conditional on
CPU_HOTPLUG as well.

Temporary change, will be refined in the future.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[ Rebased, added clockevents_notify() removal ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1735025.raBZdQHM3m@vostro.rjw.lan
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 52c063d1 02-Apr-2015 Thomas Gleixner <tglx@linutronix.de>

clockevents: Make tick handover explicit

clockevents_notify() is a leftover from the early design of the
clockevents facility. It's really not a notification mechanism,
it's a multiplex call. We are way better off to have explicit
calls instead of this monstrosity.

Split out the tick_handover call and invoke it explicitely from
the hotplug code. Temporary solution will be cleaned up in later
patches.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[ Rebase ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/1658173.RkEEILFiQZ@vostro.rjw.lan
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# ffa48c0d 02-Apr-2015 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

clockevents: Remove broadcast oneshot control leftovers

Now that all users are converted over to explicit calls into the
clockevents state machine, remove the notification chain leftovers.

Original-from: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Stultz <john.stultz@linaro.org>
Link: http://lkml.kernel.org/r/14018863.NQUzkFuafr@vostro.rjw.lan
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 1fe5d5c3 02-Apr-2015 Thomas Gleixner <tglx@linutronix.de>

clockevents: Provide explicit broadcast oneshot control functions

clockevents_notify() is a leftover from the early design of the
clockevents facility. It's really not a notification mechanism,
it's a multiplex call. We are way better off to have explicit
calls instead of this monstrosity.

Split out the broadcast oneshot control into a separate function
and provide inline helpers. Switch clockevents_notify() over.
This will go away once all callers are converted.

This also gets rid of the nested locking of clockevents_lock and
broadcast_lock. The broadcast oneshot control functions do not
require clockevents_lock. Only the managing functions
(setup/shutdown/suspend/resume of the broadcast device require
clockevents_lock.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Alexandre Courbot <gnurou@gmail.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Len Brown <lenb@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Warren <swarren@wwwdotorg.org>
Cc: Thierry Reding <thierry.reding@gmail.com>
Cc: Tony Lindgren <tony@atomide.com>
Link: http://lkml.kernel.org/r/13000649.8qZuEDV0OA@vostro.rjw.lan
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 89feddbf 02-Apr-2015 Thomas Gleixner <tglx@linutronix.de>

clockevents: Remove the broadcast control leftovers

All users converted. Remove the notify leftovers.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/2076318.76XJZ8QYP3@vostro.rjw.lan
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 592a438f 02-Apr-2015 Thomas Gleixner <tglx@linutronix.de>

clockevents: Provide explicit broadcast control functions

clockevents_notify() is a leftover from the early design of the
clockevents facility. It's really not a notification mechanism,
it's a multiplex call. We are way better off to have explicit
calls instead of this monstrosity.

Split out the broadcast control into a separate function and
provide inline helpers. Switch clockevents_notify() over. This
will go away once all callers are converted.

This also gets rid of the nested locking of clockevents_lock and
broadcast_lock. The broadcast control functions do not require
clockevents_lock. Only the managing functions
(setup/shutdown/suspend/resume of the broadcast device require
clockevents_lock.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Len Brown <lenb@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tony Lindgren <tony@atomide.com>
Link: http://lkml.kernel.org/r/8086559.ttsuS0n1Xr@vostro.rjw.lan
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 4ffee521 25-Mar-2015 Thomas Gleixner <tglx@linutronix.de>

clockevents: Make suspend/resume calls explicit

clockevents_notify() is a leftover from the early design of the
clockevents facility. It's really not a notification mechanism,
it's a multiplex call.

We are way better off to have explicit calls instead of this
monstrosity. Split out the suspend/resume() calls and invoke
them directly from the call sites.

No locking required at this point because these calls happen
with interrupts disabled and a single cpu online.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[ Rebased on top of 4.0-rc5. ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/713674030.jVm1qaHuPf@vostro.rjw.lan
[ Rebased on top of latest timers/core. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# db6f672e 25-Mar-2015 Thomas Gleixner <tglx@linutronix.de>

clockevents: Remove extra local_irq_save() in clockevents_exchange_device()

Called with 'clockevents_lock' held and interrupts disabled
already.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/51005827.yXt5tjZMBs@vostro.rjw.lan
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 9f083b74 25-Mar-2015 Thomas Gleixner <tglx@linutronix.de>

clockevents: Remove CONFIG_GENERIC_CLOCKEVENTS_BUILD

This option was for simpler migration to the clock events code.
Most architectures have been converted and the option has been
disfunctional as a standalone option for quite some time. Remove
it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/5021859.jl9OC1medj@vostro.rjw.lan
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# de81e64b 27-Feb-2015 Viresh Kumar <viresh.kumar@linaro.org>

clockevents: Don't validate dev->mode against CLOCK_EVT_MODE_UNUSED for new interface

It was a requirement in the legacy interface that drivers must
initialize ->mode field to 'CLOCK_EVT_MODE_UNUSED'. This field
isn't used anymore by the new interface and so should be only
checked for the legacy interface.

Probably it can be dropped as well as core doesn't rely on it
anymore, but lets keep it to support legacy interface.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Kevin Hilman <khilman@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: linaro-kernel@lists.linaro.org
Cc: linaro-networking@linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lkml.kernel.org/r/c6604fa1a77fe1fc8dcab87769857228fb1dadd5.1425037853.git.viresh.kumar@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 77e32c89 27-Feb-2015 Viresh Kumar <viresh.kumar@linaro.org>

clockevents: Manage device's state separately for the core

'enum clock_event_mode' is used for two purposes today:

- to pass mode to the driver of clockevent device::set_mode().

- for managing state of the device for clockevents core.

For supporting new modes/states we have moved away from the
legacy set_mode() callback to new per-mode/state callbacks. New
modes/states shouldn't be exposed to the legacy (now OBSOLOTE)
callbacks and so we shouldn't add new states to 'enum
clock_event_mode'.

Lets have separate enums for the two use cases mentioned above.
Keep using the earlier enum for legacy set_mode() callback and
mark it OBSOLETE. And add another enum to clearly specify the
possible states of a clockevent device.

This also renames the newly added per-mode callbacks to reflect
state changes.

We haven't got rid of 'mode' member of 'struct
clock_event_device' as it is used by some of the clockevent
drivers and it would automatically die down once we migrate
those drivers to the new interface. It ('mode') is only updated
now for the drivers using the legacy interface.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Kevin Hilman <khilman@linaro.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: linaro-kernel@lists.linaro.org
Cc: linaro-networking@linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lkml.kernel.org/r/b6b0143a8a57bd58352ad35e08c25424c879c0cb.1425037853.git.viresh.kumar@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 554ef387 27-Feb-2015 Viresh Kumar <viresh.kumar@linaro.org>

clockevents: Handle tick device's resume separately

Upcoming patch will redefine possible states of a clockevent
device. The RESUME mode is a special case only for tick's
clockevent devices. In future it can be replaced by ->resume()
callback already available for clockevent devices.

Lets handle it separately so that clockevents_set_mode() only
handles states valid across all devices. This also renames
set_mode_resume() to tick_resume() to make it more explicit.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Kevin Hilman <khilman@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: linaro-kernel@lists.linaro.org
Cc: linaro-networking@linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lkml.kernel.org/r/c1b0112410870f49e7bf06958e1483eac6c15e20.1425037853.git.viresh.kumar@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 32a15832 27-Feb-2015 Vitaly Kuznetsov <vkuznets@redhat.com>

clockevents: export clockevents_unbind_device instead of clockevents_unbind

It looks like clockevents_unbind is being exported by mistake as:
- it is static;
- it is not listed in include/linux/clockchips.h;
- EXPORT_SYMBOL_GPL(clockevents_unbind) follows clockevents_unbind_device()
implementation.

I think clockevents_unbind_device should be exported instead. This is going to
be used to teardown Hyper-V clockevent devices on module unload.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# bd624d75 12-Feb-2015 Viresh Kumar <viresh.kumar@linaro.org>

clockevents: Introduce mode specific callbacks

It is not possible for the clockevents core to know which modes (other than
those with a corresponding feature flag) are supported by a particular
implementation. And drivers are expected to handle transition to all modes
elegantly, as ->set_mode() would be issued for them unconditionally.

Now, adding support for a new mode complicates things a bit if we want to use
the legacy ->set_mode() callback. We need to closely review all clockevents
drivers to see if they would break on addition of a new mode. And after such
reviews, it is found that we have to do non-trivial changes to most of the
drivers [1].

Introduce mode-specific set_mode_*() callbacks, some of which the drivers may or
may not implement. A missing callback would clearly convey the message that the
corresponding mode isn't supported.

A driver may still choose to keep supporting the legacy ->set_mode() callback,
but ->set_mode() wouldn't be supporting any new modes beyond RESUME. If a driver
wants to benefit from using a new mode, it would be required to migrate to
the mode specific callbacks.

The legacy ->set_mode() callback and the newly introduced mode-specific
callbacks are mutually exclusive. Only one of them should be supported by the
driver.

Sanity check is done at the time of registration to distinguish between optional
and required callbacks and to make error recovery and handling simpler. If the
legacy ->set_mode() callback is provided, all mode specific ones would be
ignored by the core but a warning is thrown if they are present.

Call sites calling ->set_mode() directly are also updated to use
__clockevents_set_mode() instead, as ->set_mode() may not be available anymore
for few drivers.

[1] https://lkml.org/lkml/2014/12/9/605
[2] https://lkml.org/lkml/2015/1/23/255

Suggested-by: Thomas Gleixner <tglx@linutronix.de> [2]
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Kevin Hilman <khilman@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: linaro-kernel@lists.linaro.org
Cc: linaro-networking@linaro.org
Link: http://lkml.kernel.org/r/792d59a40423f0acffc9bb0bec9de1341a06fa02.1423788565.git.viresh.kumar@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 10632008 20-Oct-2014 Thomas Gleixner <tglx@linutronix.de>

clockevents: Prevent shift out of bounds

Andrey reported that on a kernel with UBSan enabled he found:

UBSan: Undefined behaviour in ../kernel/time/clockevents.c:75:34

I guess it should be 1ULL here instead of 1U:
(!ismax || evt->mult <= (1U << evt->shift)))

That's indeed the correct solution because shift might be 32.

Reported-by: Andrey Ryabinin <a.ryabinin@samsung.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 504d5874 31-Jul-2014 Jan Kara <jack@suse.cz>

timer: Fix lock inversion between hrtimer_bases.lock and scheduler locks

clockevents_increase_min_delta() calls printk() from under
hrtimer_bases.lock. That causes lock inversion on scheduler locks because
printk() can call into the scheduler. Lockdep puts it as:

======================================================
[ INFO: possible circular locking dependency detected ]
3.15.0-rc8-06195-g939f04b #2 Not tainted
-------------------------------------------------------
trinity-main/74 is trying to acquire lock:
(&port_lock_key){-.....}, at: [<811c60be>] serial8250_console_write+0x8c/0x10c

but task is already holding lock:
(hrtimer_bases.lock){-.-...}, at: [<8103caeb>] hrtimer_try_to_cancel+0x13/0x66

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #5 (hrtimer_bases.lock){-.-...}:
[<8104a942>] lock_acquire+0x92/0x101
[<8142f11d>] _raw_spin_lock_irqsave+0x2e/0x3e
[<8103c918>] __hrtimer_start_range_ns+0x1c/0x197
[<8107ec20>] perf_swevent_start_hrtimer.part.41+0x7a/0x85
[<81080792>] task_clock_event_start+0x3a/0x3f
[<810807a4>] task_clock_event_add+0xd/0x14
[<8108259a>] event_sched_in+0xb6/0x17a
[<810826a2>] group_sched_in+0x44/0x122
[<81082885>] ctx_sched_in.isra.67+0x105/0x11f
[<810828e6>] perf_event_sched_in.isra.70+0x47/0x4b
[<81082bf6>] __perf_install_in_context+0x8b/0xa3
[<8107eb8e>] remote_function+0x12/0x2a
[<8105f5af>] smp_call_function_single+0x2d/0x53
[<8107e17d>] task_function_call+0x30/0x36
[<8107fb82>] perf_install_in_context+0x87/0xbb
[<810852c9>] SYSC_perf_event_open+0x5c6/0x701
[<810856f9>] SyS_perf_event_open+0x17/0x19
[<8142f8ee>] syscall_call+0x7/0xb

-> #4 (&ctx->lock){......}:
[<8104a942>] lock_acquire+0x92/0x101
[<8142f04c>] _raw_spin_lock+0x21/0x30
[<81081df3>] __perf_event_task_sched_out+0x1dc/0x34f
[<8142cacc>] __schedule+0x4c6/0x4cb
[<8142cae0>] schedule+0xf/0x11
[<8142f9a6>] work_resched+0x5/0x30

-> #3 (&rq->lock){-.-.-.}:
[<8104a942>] lock_acquire+0x92/0x101
[<8142f04c>] _raw_spin_lock+0x21/0x30
[<81040873>] __task_rq_lock+0x33/0x3a
[<8104184c>] wake_up_new_task+0x25/0xc2
[<8102474b>] do_fork+0x15c/0x2a0
[<810248a9>] kernel_thread+0x1a/0x1f
[<814232a2>] rest_init+0x1a/0x10e
[<817af949>] start_kernel+0x303/0x308
[<817af2ab>] i386_start_kernel+0x79/0x7d

-> #2 (&p->pi_lock){-.-...}:
[<8104a942>] lock_acquire+0x92/0x101
[<8142f11d>] _raw_spin_lock_irqsave+0x2e/0x3e
[<810413dd>] try_to_wake_up+0x1d/0xd6
[<810414cd>] default_wake_function+0xb/0xd
[<810461f3>] __wake_up_common+0x39/0x59
[<81046346>] __wake_up+0x29/0x3b
[<811b8733>] tty_wakeup+0x49/0x51
[<811c3568>] uart_write_wakeup+0x17/0x19
[<811c5dc1>] serial8250_tx_chars+0xbc/0xfb
[<811c5f28>] serial8250_handle_irq+0x54/0x6a
[<811c5f57>] serial8250_default_handle_irq+0x19/0x1c
[<811c56d8>] serial8250_interrupt+0x38/0x9e
[<810510e7>] handle_irq_event_percpu+0x5f/0x1e2
[<81051296>] handle_irq_event+0x2c/0x43
[<81052cee>] handle_level_irq+0x57/0x80
[<81002a72>] handle_irq+0x46/0x5c
[<810027df>] do_IRQ+0x32/0x89
[<8143036e>] common_interrupt+0x2e/0x33
[<8142f23c>] _raw_spin_unlock_irqrestore+0x3f/0x49
[<811c25a4>] uart_start+0x2d/0x32
[<811c2c04>] uart_write+0xc7/0xd6
[<811bc6f6>] n_tty_write+0xb8/0x35e
[<811b9beb>] tty_write+0x163/0x1e4
[<811b9cd9>] redirected_tty_write+0x6d/0x75
[<810b6ed6>] vfs_write+0x75/0xb0
[<810b7265>] SyS_write+0x44/0x77
[<8142f8ee>] syscall_call+0x7/0xb

-> #1 (&tty->write_wait){-.....}:
[<8104a942>] lock_acquire+0x92/0x101
[<8142f11d>] _raw_spin_lock_irqsave+0x2e/0x3e
[<81046332>] __wake_up+0x15/0x3b
[<811b8733>] tty_wakeup+0x49/0x51
[<811c3568>] uart_write_wakeup+0x17/0x19
[<811c5dc1>] serial8250_tx_chars+0xbc/0xfb
[<811c5f28>] serial8250_handle_irq+0x54/0x6a
[<811c5f57>] serial8250_default_handle_irq+0x19/0x1c
[<811c56d8>] serial8250_interrupt+0x38/0x9e
[<810510e7>] handle_irq_event_percpu+0x5f/0x1e2
[<81051296>] handle_irq_event+0x2c/0x43
[<81052cee>] handle_level_irq+0x57/0x80
[<81002a72>] handle_irq+0x46/0x5c
[<810027df>] do_IRQ+0x32/0x89
[<8143036e>] common_interrupt+0x2e/0x33
[<8142f23c>] _raw_spin_unlock_irqrestore+0x3f/0x49
[<811c25a4>] uart_start+0x2d/0x32
[<811c2c04>] uart_write+0xc7/0xd6
[<811bc6f6>] n_tty_write+0xb8/0x35e
[<811b9beb>] tty_write+0x163/0x1e4
[<811b9cd9>] redirected_tty_write+0x6d/0x75
[<810b6ed6>] vfs_write+0x75/0xb0
[<810b7265>] SyS_write+0x44/0x77
[<8142f8ee>] syscall_call+0x7/0xb

-> #0 (&port_lock_key){-.....}:
[<8104a62d>] __lock_acquire+0x9ea/0xc6d
[<8104a942>] lock_acquire+0x92/0x101
[<8142f11d>] _raw_spin_lock_irqsave+0x2e/0x3e
[<811c60be>] serial8250_console_write+0x8c/0x10c
[<8104e402>] call_console_drivers.constprop.31+0x87/0x118
[<8104f5d5>] console_unlock+0x1d7/0x398
[<8104fb70>] vprintk_emit+0x3da/0x3e4
[<81425f76>] printk+0x17/0x19
[<8105bfa0>] clockevents_program_min_delta+0x104/0x116
[<8105c548>] clockevents_program_event+0xe7/0xf3
[<8105cc1c>] tick_program_event+0x1e/0x23
[<8103c43c>] hrtimer_force_reprogram+0x88/0x8f
[<8103c49e>] __remove_hrtimer+0x5b/0x79
[<8103cb21>] hrtimer_try_to_cancel+0x49/0x66
[<8103cb4b>] hrtimer_cancel+0xd/0x18
[<8107f102>] perf_swevent_cancel_hrtimer.part.60+0x2b/0x30
[<81080705>] task_clock_event_stop+0x20/0x64
[<81080756>] task_clock_event_del+0xd/0xf
[<81081350>] event_sched_out+0xab/0x11e
[<810813e0>] group_sched_out+0x1d/0x66
[<81081682>] ctx_sched_out+0xaf/0xbf
[<81081e04>] __perf_event_task_sched_out+0x1ed/0x34f
[<8142cacc>] __schedule+0x4c6/0x4cb
[<8142cae0>] schedule+0xf/0x11
[<8142f9a6>] work_resched+0x5/0x30

other info that might help us debug this:

Chain exists of:
&port_lock_key --> &ctx->lock --> hrtimer_bases.lock

Possible unsafe locking scenario:

CPU0 CPU1
---- ----
lock(hrtimer_bases.lock);
lock(&ctx->lock);
lock(hrtimer_bases.lock);
lock(&port_lock_key);

*** DEADLOCK ***

4 locks held by trinity-main/74:
#0: (&rq->lock){-.-.-.}, at: [<8142c6f3>] __schedule+0xed/0x4cb
#1: (&ctx->lock){......}, at: [<81081df3>] __perf_event_task_sched_out+0x1dc/0x34f
#2: (hrtimer_bases.lock){-.-...}, at: [<8103caeb>] hrtimer_try_to_cancel+0x13/0x66
#3: (console_lock){+.+...}, at: [<8104fb5d>] vprintk_emit+0x3c7/0x3e4

stack backtrace:
CPU: 0 PID: 74 Comm: trinity-main Not tainted 3.15.0-rc8-06195-g939f04b #2
00000000 81c3a310 8b995c14 81426f69 8b995c44 81425a99 8161f671 8161f570
8161f538 8161f559 8161f538 8b995c78 8b142bb0 00000004 8b142fdc 8b142bb0
8b995ca8 8104a62d 8b142fac 000016f2 81c3a310 00000001 00000001 00000003
Call Trace:
[<81426f69>] dump_stack+0x16/0x18
[<81425a99>] print_circular_bug+0x18f/0x19c
[<8104a62d>] __lock_acquire+0x9ea/0xc6d
[<8104a942>] lock_acquire+0x92/0x101
[<811c60be>] ? serial8250_console_write+0x8c/0x10c
[<811c6032>] ? wait_for_xmitr+0x76/0x76
[<8142f11d>] _raw_spin_lock_irqsave+0x2e/0x3e
[<811c60be>] ? serial8250_console_write+0x8c/0x10c
[<811c60be>] serial8250_console_write+0x8c/0x10c
[<8104af87>] ? lock_release+0x191/0x223
[<811c6032>] ? wait_for_xmitr+0x76/0x76
[<8104e402>] call_console_drivers.constprop.31+0x87/0x118
[<8104f5d5>] console_unlock+0x1d7/0x398
[<8104fb70>] vprintk_emit+0x3da/0x3e4
[<81425f76>] printk+0x17/0x19
[<8105bfa0>] clockevents_program_min_delta+0x104/0x116
[<8105cc1c>] tick_program_event+0x1e/0x23
[<8103c43c>] hrtimer_force_reprogram+0x88/0x8f
[<8103c49e>] __remove_hrtimer+0x5b/0x79
[<8103cb21>] hrtimer_try_to_cancel+0x49/0x66
[<8103cb4b>] hrtimer_cancel+0xd/0x18
[<8107f102>] perf_swevent_cancel_hrtimer.part.60+0x2b/0x30
[<81080705>] task_clock_event_stop+0x20/0x64
[<81080756>] task_clock_event_del+0xd/0xf
[<81081350>] event_sched_out+0xab/0x11e
[<810813e0>] group_sched_out+0x1d/0x66
[<81081682>] ctx_sched_out+0xaf/0xbf
[<81081e04>] __perf_event_task_sched_out+0x1ed/0x34f
[<8104416d>] ? __dequeue_entity+0x23/0x27
[<81044505>] ? pick_next_task_fair+0xb1/0x120
[<8142cacc>] __schedule+0x4c6/0x4cb
[<81047574>] ? trace_hardirqs_off_caller+0xd7/0x108
[<810475b0>] ? trace_hardirqs_off+0xb/0xd
[<81056346>] ? rcu_irq_exit+0x64/0x77

Fix the problem by using printk_deferred() which does not call into the
scheduler.

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: stable@vger.kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# da7e6f45 07-Feb-2014 Preeti U Murthy <preeti@linux.vnet.ibm.com>

time: Change the return type of clockevents_notify() to integer

The broadcast framework can potentially be made use of by archs which do not have an
external clock device as well. Then, it is required that one of the CPUs need
to handle the broadcasting of wakeup IPIs to the CPUs in deep idle. As a
result its local timers should remain functional all the time. For such
a CPU, the BROADCAST_ENTER notification has to fail indicating that its clock
device cannot be shutdown. To make way for this support, change the return
type of tick_broadcast_oneshot_control() and hence clockevents_notify() to
indicate such scenarios.

Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: deepthi@linux.vnet.ibm.com
Cc: paulmck@linux.vnet.ibm.com
Cc: fweisbec@gmail.com
Cc: paulus@samba.org
Cc: srivatsa.bhat@linux.vnet.ibm.com
Cc: svaidy@linux.vnet.ibm.com
Cc: peterz@infradead.org
Cc: benh@kernel.crashing.org
Cc: rafael.j.wysocki@intel.com
Cc: linuxppc-dev@lists.ozlabs.org
Link: http://lkml.kernel.org/r/20140207080606.17187.78306.stgit@preeti.in.ibm.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# fe79a9ba 03-Feb-2014 Soren Brinkmann <soren.brinkmann@xilinx.com>

clockevents: Adjust timer interval when frequency changes

clockevent devices in periodic mode are not updated when the frequency
of the device changes. Issue a dev->set_mode() callback which forces
the device to reevaluate the timer settings.

Signed-off-by: Soren Brinkmann <soren.brinkmann@xilinx.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Michal Simek <michal.simek@xilinx.com>
Link: http://lkml.kernel.org/r/1391466877-28908-3-git-send-email-soren.brinkmann@xilinx.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 627ee794 03-Feb-2014 Thomas Gleixner <tglx@linutronix.de>

clockevents: Serialize calls to clockevents_update_freq() in the core

We can identify the broadcast device in the core and serialize all
callers including interrupts on a different CPU against the update.
Also, disabling interrupts is moved into the core allowing callers to
leave interrutps enabled when calling clockevents_update_freq().

Signed-off-by: Soren Brinkmann <soren.brinkmann@xilinx.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: Soeren Brinkmann <soren.brinkmann@xilinx.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Michal Simek <michal.simek@xilinx.com>
Link: http://lkml.kernel.org/r/1391466877-28908-2-git-send-email-soren.brinkmann@xilinx.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 97b94106 24-Sep-2013 Thomas Gleixner <tglx@linutronix.de>

clockevents: Sanitize ticks to nsec conversion

Marc Kleine-Budde pointed out, that commit 77cc982 "clocksource: use
clockevents_config_and_register() where possible" caused a regression
for some of the converted subarchs.

The reason is, that the clockevents core code converts the minimal
hardware tick delta to a nanosecond value for core internal
usage. This conversion is affected by integer math rounding loss, so
the backwards conversion to hardware ticks will likely result in a
value which is less than the configured hardware limitation. The
affected subarchs used their own workaround (SIGH!) which got lost in
the conversion.

The solution for the issue at hand is simple: adding evt->mult - 1 to
the shifted value before the integer divison in the core conversion
function takes care of it. But this only works for the case where for
the scaled math mult/shift pair "mult <= 1 << shift" is true. For the
case where "mult > 1 << shift" we can apply the rounding add only for
the minimum delta value to make sure that the backward conversion is
not less than the given hardware limit. For the upper bound we need to
omit the rounding add, because the backwards conversion is always
larger than the original latch value. That would violate the upper
bound of the hardware device.

Though looking closer at the details of that function reveals another
bogosity: The upper bounds check is broken as well. Checking for a
resulting "clc" value greater than KTIME_MAX after the conversion is
pointless. The conversion does:

u64 clc = (latch << evt->shift) / evt->mult;

So there is no sanity check for (latch << evt->shift) exceeding the
64bit boundary. The latch argument is "unsigned long", so on a 64bit
arch the handed in argument could easily lead to an unnoticed shift
overflow. With the above rounding fix applied the calculation before
the divison is:

u64 clc = (latch << evt->shift) + evt->mult - 1;

So we need to make sure, that neither the shift nor the rounding add
is overflowing the u64 boundary.

[ukl: move assignment to rnd after eventually changing mult, fix build
issue and correct comment with the right math]

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Russell King - ARM Linux <linux@arm.linux.org.uk>
Cc: Marc Kleine-Budde <mkl@pengutronix.de>
Cc: nicolas.ferre@atmel.com
Cc: Marc Pignat <marc.pignat@hevs.ch>
Cc: john.stultz@linaro.org
Cc: kernel@pengutronix.de
Cc: Ronald Wahl <ronald.wahl@raritan.com>
Cc: LAK <linux-arm-kernel@lists.infradead.org>
Cc: Ludovic Desroches <ludovic.desroches@atmel.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1380052223-24139-1-git-send-email-u.kleine-koenig@pengutronix.de
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>


# 891292a7 11-Oct-2013 Patrick Palka <patrick@parcs.ath.cx>

time: Fix signedness bug in sysfs_get_uname() and its callers

sysfs_get_uname() is erroneously declared as returning size_t even
though it may return a negative value, specifically -EINVAL. Its
callers then check whether its return value is less than zero and indeed
that is never the case for size_t.

This patch changes sysfs_get_uname() to return ssize_t and makes sure
its callers use ssize_t accordingly.

Signed-off-by: Patrick Palka <patrick@parcs.ath.cx>
[jstultz: Didn't apply cleanly, as a similar partial fix was also applied
so had to resolve the collisions]
Signed-off-by: John Stultz <john.stultz@linaro.org>


# 03e13cf5 25-Apr-2013 Thomas Gleixner <tglx@linutronix.de>

clockevents: Implement unbind functionality

Provide a sysfs interface to allow unbinding of clockevent
devices. The device is unbound if it is unused or if there is a
replacement device available. Unbinding of broadcast devices is not
supported as we don't want to foster that nonsense. If no replacement
device is available the unbind returns -EBUSY. Unbind is available
from the kernel and through sysfs, which is necessary to drop the
module refcount.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Magnus Damm <magnus.damm@gmail.com>
Link: http://lkml.kernel.org/r/20130425143436.499216659@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 501f8670 25-Apr-2013 Thomas Gleixner <tglx@linutronix.de>

clockevents: Provide sysfs interface

Provide a simple sysfs interface for the clockevent devices. Show the
current active clockevent device.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Magnus Damm <magnus.damm@gmail.com>
Link: http://lkml.kernel.org/r/20130425143436.371634778@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# ccf33d68 25-Apr-2013 Thomas Gleixner <tglx@linutronix.de>

clockevents: Add module refcount

We want to be able to remove clockevent modules as well. Add a
refcount so we don't remove a module with an active clock event
device.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Magnus Damm <magnus.damm@gmail.com>
Link: http://lkml.kernel.org/r/20130425143436.307435149@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 8c53daf6 25-Apr-2013 Thomas Gleixner <tglx@linutronix.de>

clockevents: Move the tick_notify() switch case to clockevents_notify()

No need to call another function and have duplicated cases.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Magnus Damm <magnus.damm@gmail.com>
Link: http://lkml.kernel.org/r/20130425143436.235746557@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 7172a286 25-Apr-2013 Thomas Gleixner <tglx@linutronix.de>

clockevents: Get rid of the notifier chain

7+ years and still a single user. Kill it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Magnus Damm <magnus.damm@gmail.com>
Link: http://lkml.kernel.org/r/20130425143436.098520211@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# c35ef95c 12-Jan-2013 Shawn Guo <shawn.guo@linaro.org>

clockevents: export clockevents_config_and_register for module use

clockevents_config_and_register is a handy helper for clockevent
drivers, some of which might support module build, so export the symbol.

Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Olof Johansson <olof@lixom.net>


# adc78e6b 05-Aug-2012 Rafael J. Wysocki <rjw@rjwysocki.net>

timekeeping: Add suspend and resume of clock event devices

Some clock event devices, for example such that belong to PM domains,
need to be handled in a spcial way during the timekeeping suspend
and resume (which takes place in the system core, or "syscore",
stages of system power transitions) in analogy with clock sources.

Introduce .suspend() and .resume() callbacks for clock event devices
that will be executed by timekeeping_suspend/_resume(), respectively,
next the the clock sources' .suspend() and .resume() callbacks.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>


# e5400321 09-May-2012 Magnus Damm <magnus.damm@gmail.com>

clockevents: Make clockevents_config() a global symbol

Make clockevents_config() into a global symbol to allow it to be used
by compiled-in clockevent drivers. This is needed by drivers that want
to update the timer frequency after registration time.

Signed-off-by: Magnus Damm <damm@opensource.se>
Tested-by: Simon Horman <horms@verge.net.au>
Cc: arnd@arndb.de
Cc: johnstul@us.ibm.com
Cc: rjw@sisk.pl
Cc: lethal@linux-sh.org
Cc: gregkh@linuxfoundation.org
Cc: olof@lixom.net
Cc: Magnus Damm <magnus.damm@gmail.com>
Link: http://lkml.kernel.org/r/20120509143934.27521.46553.sendpatchset@w520
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 3b87487a 30-Dec-2011 Linus Torvalds <torvalds@linux-foundation.org>

Revert "clockevents: Set noop handler in clockevents_exchange_device()"

This reverts commit de28f25e8244c7353abed8de0c7792f5f883588c.

It results in resume problems for various people. See for example

http://thread.gmane.org/gmane.linux.kernel/1233033
http://thread.gmane.org/gmane.linux.kernel/1233389
http://thread.gmane.org/gmane.linux.kernel/1233159
http://thread.gmane.org/gmane.linux.kernel/1227868/focus=1230877

and the fedora and ubuntu bug reports

https://bugzilla.redhat.com/show_bug.cgi?id=767248
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/904569

which got bisected down to the stable version of this commit.

Reported-by: Jonathan Nieder <jrnieder@gmail.com>
Reported-by: Phil Miller <mille121@illinois.edu>
Reported-by: Philip Langdale <philipl@overt.org>
Reported-by: Tim Gardner <tim.gardner@canonical.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Greg KH <gregkh@suse.de>
Cc: stable@kernel.org # for stable kernels that applied the original
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 7239f65c 21-Dec-2011 Kay Sievers <kay.sievers@vrfy.org>

clockevents: remove sysdev.h

This isn't needed in the clockevents.c file, and the header file is
going away soon, so just remove the #include

Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>


# de28f25e 02-Dec-2011 Thomas Gleixner <tglx@linutronix.de>

clockevents: Set noop handler in clockevents_exchange_device()

If a device is shutdown, then there might be a pending interrupt,
which will be processed after we reenable interrupts, which causes the
original handler to be run. If the old handler is the (broadcast)
periodic handler the shutdown state might hang the kernel completely.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org


# 65516f8a 23-Aug-2011 Martin Schwidefsky <schwidefsky@de.ibm.com>

clockevents: Add direct ktime programming function

There is at least one architecture (s390) with a sane clockevent device
that can be programmed with the equivalent of a ktime. No need to create
a delta against the current time, the ktime can be used directly.

A new clock device function 'set_next_ktime' is introduced that is called
with the unmodified ktime for the timer if the clock event device has the
CLOCK_EVT_FEAT_KTIME bit set.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: john stultz <johnstul@us.ibm.com>
Link: http://lkml.kernel.org/r/20110823133142.815350967@de.ibm.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# d1748302 23-Aug-2011 Martin Schwidefsky <schwidefsky@de.ibm.com>

clockevents: Make minimum delay adjustments configurable

The automatic increase of the min_delta_ns of a clockevents device
should be done in the clockevents code as the minimum delay is an
attribute of the clockevents device.

In addition not all architectures want the automatic adjustment, on a
massively virtualized system it can happen that the programming of a
clock event fails several times in a row because the virtual cpu has
been rescheduled quickly enough. In that case the minimum delay will
erroneously be increased with no way back. The new config symbol
GENERIC_CLOCKEVENTS_MIN_ADJUST is used to enable the automatic
adjustment. The config option is selected only for x86.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: john stultz <johnstul@us.ibm.com>
Link: http://lkml.kernel.org/r/20110823133142.494157493@de.ibm.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 1b054b67 03-Jun-2011 Thomas Gleixner <tglx@linutronix.de>

clockevents: Handle empty cpumask gracefully

For UP it's stupid to request an initialized cpumask for the clock
event devices. Though we need the mask set even on UP to avoid a
horrible ifdeffery especially in the broadcast code.

For SMP we can at least try to survive with a warning and set the
cpumask of the cpu we're running on. That gives a decent chance to
bring the machine up and retrieve the debug info.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Walleij <linus.walleij@linaro.org
Cc: Lee Jones <lee.jones@linaro.org>
Cc: Russell King - ARM Linux <linux@arm.linux.org.uk>
Cc: Stephen Boyd <sboyd@codeaurora.org>


# c0e299b1 20-May-2011 Thomas Gleixner <tglx@linutronix.de>

clockevents/source: Use u64 to make 32bit happy

unsigned long is not 64bit on 32bit machine.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 80b816b7 18-May-2011 Thomas Gleixner <tglx@linutronix.de>

clockevents: Provide interface to reconfigure an active clock event device

Some ARM SoCs have clock event devices which have their frequency
modified due to frequency scaling. Provide an interface which allows
to reconfigure an active device. After reconfiguration reprogram the
current pending event.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: LAK <linux-arm-kernel@lists.infradead.org>
Cc: John Stultz <john.stultz@linaro.org>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
Link: http://lkml.kernel.org/r/%3C20110518210136.437459958%40linutronix.de%3E


# 57f0fcbe 18-May-2011 Thomas Gleixner <tglx@linutronix.de>

clockevents: Provide combined configure and register function

All clockevent devices have the same open coded initialization
functions. Provide an interface which does all necessary
initialization in the core code.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <john.stultz@linaro.org>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
Link: http://lkml.kernel.org/r/%3C20110518210136.331975870%40linutronix.de%3E


# e2830b5c 27-Jan-2011 Torben Hohn <torbenh@gmx.de>

time: Make do_timer() and xtime_lock local to kernel/time/

All callers of do_timer() are converted to xtime_update(). The only
users of xtime_lock are in kernel/time/. Make both local to
kernel/time/ and remove them from the global header files.

[ tglx: Reuse tick-internal.h instead of creating another local header
file. Massaged changelog ]

Signed-off-by: Torben Hohn <torbenh@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: johnstul@us.ibm.com
Cc: yong.zhang0@gmail.com
Cc: hch@infradead.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# ea9d8e3f 06-Jan-2010 Xiaotian Feng <dfeng@redhat.com>

clockevent: Don't remove broadcast device when cpu is dead

Marc reported that the BUG_ON in clockevents_notify() triggers on his
system. This happens because the kernel tries to remove an active
clock event device (used for broadcasting) from the device list.

The handling of devices which can be used as per cpu device and as a
global broadcast device is suboptimal.

The simplest solution for now (and for stable) is to check whether the
device is used as global broadcast device, but this needs to be
revisited.

[ tglx: restored the cpuweight check and massaged the changelog ]

Reported-by: Marc Dionne <marc.c.dionne@gmail.com>
Tested-by: Marc Dionne <marc.c.dionne@gmail.com>
Signed-off-by: Xiaotian Feng <dfeng@redhat.com>
LKML-Reference: <1262834564-13033-1-git-send-email-dfeng@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@kernel.org


# b5f91da0 07-Dec-2009 Thomas Gleixner <tglx@linutronix.de>

clockevents: Convert to raw_spinlock

Convert locks which cannot be sleeping locks in preempt-rt to
raw_spinlocks.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Ingo Molnar <mingo@elte.hu>


# bb6eddf7 10-Dec-2009 Thomas Gleixner <tglx@linutronix.de>

clockevents: Prevent clockevent_devices list corruption on cpu hotplug

Xiaotian Feng triggered a list corruption in the clock events list on
CPU hotplug and debugged the root cause.

If a CPU registers more than one per cpu clock event device, then only
the active clock event device is removed on CPU_DEAD. The unused
devices are kept in the clock events device list.

On CPU up the clock event devices are registered again, which means
that we list_add an already enqueued list_head. That results in list
corruption.

Resolve this by removing all devices which are associated to the dead
CPU on CPU_DEAD.

Reported-by: Xiaotian Feng <dfeng@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Xiaotian Feng <dfeng@redhat.com>
Cc: stable@kernel.org


# 8e1a928a 16-Oct-2009 H Hartley Sweeten <hartleys@visionengravers.com>

clockevents: Add missing include to pacify sparse

Include "tick-internal.h" in order to pick up the extern function
prototype for clockevents_shutdown(). This quiets the following sparse
build noise:

warning: symbol 'clockevents_shutdown' was not declared. Should it be static?

Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
LKML-Reference: <BD79186B4FD85F4B8E60E381CAEE190901E24550@mi8nycmail19.Mi8.com>
Reviewed-by: Yong Zhang <yong.zhang0@gmail.com>
Cc: johnstul@us.ibm.com
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 97813f2f 17-Aug-2009 Jon Hunter <jon-hunter@ti.com>

nohz: Allow 32-bit machines to sleep for more than 2.15 seconds

In the dynamic tick code, "max_delta_ns" (member of the
"clock_event_device" structure) represents the maximum sleep time
that can occur between timer events in nanoseconds.

The variable, "max_delta_ns", is defined as an unsigned long
which is a 32-bit integer for 32-bit machines and a 64-bit
integer for 64-bit machines (if -m64 option is used for gcc).
The value of max_delta_ns is set by calling the function
"clockevent_delta2ns()" which returns a maximum value of LONG_MAX.
For a 32-bit machine LONG_MAX is equal to 0x7fffffff and in
nanoseconds this equates to ~2.15 seconds. Hence, the maximum
sleep time for a 32-bit machine is ~2.15 seconds, where as for
a 64-bit machine it will be many years.

This patch changes the type of max_delta_ns to be "u64" instead of
"unsigned long" so that this variable is a 64-bit type for both 32-bit
and 64-bit machines. It also changes the maximum value returned by
clockevent_delta2ns() to KTIME_MAX. Hence this allows a 32-bit
machine to sleep for longer than ~2.15 seconds. Please note that this
patch also changes "min_delta_ns" to be "u64" too and although this is
unnecessary, it makes the patch simpler as it avoids to fixup all
callers of clockevent_delta2ns().

[ tglx: changed "unsigned long long" to u64 as we use this data type
through out the time code ]

Signed-off-by: Jon Hunter <jon-hunter@ti.com>
Cc: John Stultz <johnstul@us.ibm.com>
LKML-Reference: <1250617512-23567-3-git-send-email-jon-hunter@ti.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# f833bab8 17-Aug-2009 Suresh Siddha <suresh.b.siddha@intel.com>

clockevent: Prevent dead lock on clockevents_lock

Currently clockevents_notify() is called with interrupts enabled at
some places and interrupts disabled at some other places.

This results in a deadlock in this scenario.

cpu A holds clockevents_lock in clockevents_notify() with irqs enabled
cpu B waits for clockevents_lock in clockevents_notify() with irqs disabled
cpu C doing set_mtrr() which will try to rendezvous of all the cpus.

This will result in C and A come to the rendezvous point and waiting
for B. B is stuck forever waiting for the spinlock and thus not
reaching the rendezvous point.

Fix the clockevents code so that clockevents_lock is taken with
interrupts disabled and thus avoid the above deadlock.

Also call lapic_timer_propagate_broadcast() on the destination cpu so
that we avoid calling smp_call_function() in the clockevents notifier
chain.

This issue left us wondering if we need to change the MTRR rendezvous
logic to use stop machine logic (instead of smp_call_function) or add
a check in spinlock debug code to see if there are other spinlocks
which gets taken under both interrupts enabled/disabled conditions.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: "Pallipadi Venkatesh" <venkatesh.pallipadi@intel.com>
Cc: "Brown Len" <len.brown@intel.com>
LKML-Reference: <1250544899.2709.210.camel@sbs-t61.sc.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 6ff7041d 10-Jul-2009 Thomas Gleixner <tglx@linutronix.de>

hrtimer: Fix migration expiry check

The timer migration expiry check should prevent the migration of a
timer to another CPU when the timer expires before the next event is
scheduled on the other CPU. Migrating the timer might delay it because
we can not reprogram the clock event device on the other CPU. But the
code implementing that check has two flaws:

- for !HIGHRES the check compares the expiry value with the clock
events device expiry value which is wrong for CLOCK_REALTIME based
timers.

- the check is racy. It holds the hrtimer base lock of the target CPU,
but the clock event device expiry value can be modified
nevertheless, e.g. by an timer interrupt firing.

The !HIGHRES case is easy to fix as we can enqueue the timer on the
cpu which was selected by the load balancer. It runs the idle
balancing code once per jiffy anyway. So the maximum delay for the
timer is the same as when we keep the tick on the current cpu going.

In the HIGHRES case we can get the next expiry value from the hrtimer
cpu_base of the target CPU and serialize the update with the cpu_base
lock. This moves the lock section in hrtimer_interrupt() so we can set
next_event to KTIME_MAX while we are handling the expired timers and
set it to the next expiry value after we handled the timers under the
base lock. While the expired timers are processed timer migration is
blocked because the expiry time of the timer is always <= KTIME_MAX.

Also remove the now useless clockevents_get_next_event() function.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# eea08f32 15-Apr-2009 Arun R Bharadwaj <arun@linux.vnet.ibm.com>

timers: Logic to move non pinned timers

* Arun R Bharadwaj <arun@linux.vnet.ibm.com> [2009-04-16 12:11:36]:

This patch migrates all non pinned timers and hrtimers to the current
idle load balancer, from all the idle CPUs. Timers firing on busy CPUs
are not migrated.

While migrating hrtimers, care should be taken to check if migrating
a hrtimer would result in a latency or not. So we compare the expiry of the
hrtimer with the next timer interrupt on the target cpu and migrate the
hrtimer only if it expires *after* the next interrupt on the target cpu.
So, added a clockevents_get_next_event() helper function to return the
next_event on the target cpu's clock_event_device.

[ tglx: cleanups and simplifications ]

Signed-off-by: Arun R Bharadwaj <arun@linux.vnet.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# c81fc2c3 30-Apr-2009 Magnus Damm <magnus.damm@gmail.com>

clockevent: export register_device and delta2ns

Export the following symbols using EXPORT_SYMBOL_GPL:
- clockevent_delta2ns
- clockevents_register_device

This allows us to build SuperH clockevent and clocksource
drivers as modules, see drivers/clocksource/sh_*.c

[ Impact: allow modular build of clockevent drivers ]

Signed-off-by: Magnus Damm <damm@igel.co.jp>
LKML-Reference: <20090501055247.8286.64067.sendpatchset@rx1.opensource.se>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 2d68259d 16-Jan-2009 Magnus Damm <magnus.damm@gmail.com>

clockevents: let set_mode() setup delta information

Allow the set_mode() clockevent callback to decide and fill in delta
details such as shift, mult, max_delta_ns and min_delta_ns.

With this change the clockevent can be registered without delta details
which allows us to keep the parent clock disabled until the clockevent
gets setup using set_mode().

Letting set_mode() fill in or update delta details allows us to save
power by disabling the parent clock while the clockevent is unused.
This may however make the parent clock rate change, so next time the
clockevent gets enabled we need let set_mode() to update the detla
details accordingly. Doing it at registration time is not enough.

Furthermore, the delta details seem unused in the case of periodic-only
clockevent drivers, so this change also allows registration of such
drivers without the delta details filled in.

Signed-off-by: Magnus Damm <damm@igel.co.jp>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 320ab2b0 13-Dec-2008 Rusty Russell <rusty@rustcorp.com.au>

cpumask: convert struct clock_event_device to cpumask pointers.

Impact: change calling convention of existing clock_event APIs

struct clock_event_timer's cpumask field gets changed to take pointer,
as does the ->broadcast function.

Another single-patch change. For safety, we BUG_ON() in
clockevents_register_device() if it's not set.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Ingo Molnar <mingo@elte.hu>


# 2344abbc 16-Sep-2008 Thomas Gleixner <tglx@linutronix.de>

clockevents: make device shutdown robust

The device shut down does not cleanup the next_event variable of the
clock event device. So when the device is reactivated the possible
stale next_event value can prevent the device to be reprogrammed as it
claims to wait on a event already.

This is the root cause of the resurfacing suspend/resume problem,
where systems need key press to come back to life.

Fix this by setting next_event to KTIME_MAX when the device is shut
down. Use a separate function for shutdown which takes care of that
and only keep the direct set mode call in the broadcast code, where we
can not touch the next_event value.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 7c1e7689 03-Sep-2008 Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>

clockevents: prevent clockevent event_handler ending up handler_noop

There is a ordering related problem with clockevents code, due to which
clockevents_register_device() called after tickless/highres switch
will not work. The new clockevent ends up with clockevents_handle_noop as
event handler, resulting in no timer activity.

The problematic path seems to be

* old device already has hrtimer_interrupt as the event_handler
* new clockevent device registers with a higher rating
* tick_check_new_device() is called
* clockevents_exchange_device() gets called
* old->event_handler is set to clockevents_handle_noop
* tick_setup_device() is called for the new device
* which sets new->event_handler using the old->event_handler which is noop.

Change the ordering so that new device inherits the proper handler.

This does not have any issue in normal case as most likely all the clockevent
devices are setup before the highres switch. But, can potentially be affecting
some corner case where HPET force detect happens after the highres switch.
This was a problem with HPET in MSI mode code that we have been experimenting
with.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 3eb05676 08-Feb-2008 Li Zefan <lizf@cn.fujitsu.com>

time: fix typo in comments

Fix typo in comments.

BTW: I have to fix coding style in arch/ia64/kernel/time.c also, otherwise
checkpatch.pl will be complaining.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: john stultz <johnstul@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 0b858e6f 08-Feb-2008 Li Zefan <lizf@cn.fujitsu.com>

clockevent: simplify list operations

list_for_each_safe() suffices here.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: john stultz <johnstul@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 45fe4fe1 30-Jan-2008 Ingo Molnar <mingo@elte.hu>

x86: make clockevents more robust

detect zero event-device multiplicators - they then cause
division-by-zero crashes if a clockevent has been initialized
incorrectly.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 167b1de3 07-Dec-2007 Thomas Gleixner <tglx@linutronix.de>

clockevents: warn once when program_event() is called with negative expiry

The hrtimer problem with large relative timeouts resulting in a
negative expiry time went unnoticed as there is no check in the
clockevents_program_event() code. Put a check there with a WARN_ONCE
to avoid such problems in the future.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# de68d9b1 12-Oct-2007 Thomas Gleixner <tglx@linutronix.de>

clockevents: Allow build w/o run-tine usage for migration purposes

Migration aid to allow preparatory patches which introduce not yet
used parts of clock events code.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>


# 6ddfca95 10-Aug-2007 Miao Xie <miaox@cn.fujitsu.com>

timer: remove clockevents_unregister_notifier

I find a function(clockevents_unregister_notifier) which is not called by
anything in tree.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 78c1b065 16-Jul-2007 Andi Kleen <ak@linux.intel.com>

Remove clockevents_{release,request}_device

Not called by anything in tree.

Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: john stultz <johnstul@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 291bc047 26-Mar-2007 Thomas Gleixner <tglx@linutronix.de>

[PATCH] clockevents: remove bad designed sysfs support for now

The current sysfs support of clockevents does not obey the "only one
value per file" rule.

The real fix is not 2.6.21 material. Therefor remove the sysfs support
for now.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# d316c57f 16-Feb-2007 Thomas Gleixner <tglx@linutronix.de>

[PATCH] clockevents: add core functionality

Architectures register their clock event devices, in the clock events core.
Users of the clockevents core can get clock event devices for their use. The
clockevents core code provides notification mechanisms for various clock
related management events.

This allows to control the clock event devices without the architectures
having to worry about the details of function assignment. This is also a
preliminary for high resolution timers and dynamic ticks to allow the core
code to control the clock functionality without intrusive changes to the
architecture code.

[Fixes-by: Ingo Molnar <mingo@elte.hu>]
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>