History log of /linux-master/include/linux/cpufreq.h
Revision Date Author Comments
# 0f289828 28-Jan-2024 Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>

cpufreq: do not open-code of_phandle_args_equal()

Use newly added of_phandle_args_equal() helper to compare two
of_phandle_args.

Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20240129115216.96479-3-krzysztof.kozlowski@linaro.org
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>


# 838a4772 18-Jan-2024 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Move dev_pm_opp_{init|free}_cpufreq_table() to pm_opp.h

Move the declaration of functions defined in the OPP core to pm_opp.h.
These were added to cpufreq.h as it was the only user of the APIs, but
that was a mistake perhaps. Fix it.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>


# d394abcb 27-Feb-2024 Shivnandan Kumar <quic_kshivnan@quicinc.com>

cpufreq: Limit resolving a frequency to policy min/max

Resolving a frequency to an efficient one should not transgress
policy->max (which can be set for thermal reason) and policy->min.

Currently, there is possibility where scaling_cur_freq can exceed
scaling_max_freq when scaling_max_freq is an inefficient frequency.

Add a check to ensure that resolving a frequency will respect
policy->min/max.

Cc: All applicable <stable@vger.kernel.org>
Fixes: 1f39fa0dccff ("cpufreq: Introducing CPUFREQ_RELATION_E")
Signed-off-by: Shivnandan Kumar <quic_kshivnan@quicinc.com>
[ rjw: Whitespace adjustment, changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 88debc69 22-Feb-2024 Pierre Gondois <pierre.gondois@arm.com>

cpufreq: Remove references to 10ms min sampling rate

A minimum sampling rate value of 10ms was introduced in:
commit cef9615a853e ("[CPUFREQ] ondemand: Uncouple minimal sampling rate from HZ in NO_HZ case")

The use of this value was removed in:
commit ed4676e25463 ("cpufreq: Replace "max_transition_latency" with "dynamic_switching"")

Remove:
- a comment referencing this value
- an unused macro associated to this value

Signed-off-by: Pierre Gondois <pierre.gondois@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 9c4a13a0 19-Jan-2024 Meng Li <li.meng@amd.com>

ACPI: cpufreq: Add highest perf change notification

Platform firmware sends notify 0x85 to inform the OS that the highest
performance of a CPU has changed.

This will be used by the AMD P-state driver to update the ranking of
preferred cores and set the priority of cores accordingly.

Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Perry Yuan <perry.yuan@amd.com>
Signed-off-by: Meng Li <li.meng@amd.com>
Link: https://uefi.org/specs/ACPI/6.5/05_ACPI_Software_Programming_Model.html#processor-device-notification-values
[ rjw: New subject, changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 599457ba 11-Dec-2023 Vincent Guittot <vincent.guittot@linaro.org>

cpufreq: Use the fixed and coherent frequency for scaling capacity

cpuinfo.max_freq can change at runtime because of boost as an example. This
implies that the value could be different from the frequency that has been
used to compute the capacity of a CPU.

The new arch_scale_freq_ref() returns a fixed and coherent frequency
that can be used to compute the capacity for a given frequency.

[ Also fix a arch_set_freq_scale() newline style wart in <linux/cpufreq.h>. ]

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Lukasz Luba <lukasz.luba@arm.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
Link: https://lore.kernel.org/r/20231211104855.558096-3-vincent.guittot@linaro.org


# e7a1b32e 05-Oct-2023 Pierre Gondois <pierre.gondois@arm.com>

cpufreq: Rebuild sched-domains when removing cpufreq driver

The Energy Aware Scheduler (EAS) relies on the schedutil governor.
When moving to/from the schedutil governor, sched domains must be
rebuilt to allow re-evaluating the enablement conditions of EAS.
This is done through sched_cpufreq_governor_change().

Having a cpufreq governor assumes a cpufreq driver is running.
Inserting/removing a cpufreq driver should trigger a re-evaluation
of EAS enablement conditions, avoiding to see EAS enabled when
removing a running cpufreq driver.

Rebuild the sched domains in schedutil's sugov_init()/sugov_exit(),
allowing to check EAS's enablement condition whenever schedutil
governor is initialized/exited from.
Move relevant code up in schedutil.c to avoid a split and conditional
function declaration.
Rename sched_cpufreq_governor_change() to sugov_eas_rebuild_sd().

Signed-off-by: Pierre Gondois <pierre.gondois@arm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 218a06a7 22-Aug-2023 Jie Zhan <zhanjie9@hisilicon.com>

cpufreq: Support per-policy performance boost

The boost control currently applies to the whole system. However, users
may prefer to boost a subset of cores in order to provide prioritized
performance to workloads running on the boosted cores.

Enable per-policy boost by adding a 'boost' sysfs interface under each
policy path. This can be found at:

/sys/devices/system/cpu/cpufreq/policy<*>/boost

Same to the global boost switch, writing 1/0 to the per-policy 'boost'
enables/disables boost on a cpufreq policy respectively.

The user view of global and per-policy boost controls should be:

1. Enabling global boost initially enables boost on all policies, and
per-policy boost can then be enabled or disabled individually, given that
the platform does support so.

2. Disabling global boost makes the per-policy boost interface illegal.

Signed-off-by: Jie Zhan <zhanjie9@hisilicon.com>
Reviewed-by: Wei Xu <xuwei5@hisilicon.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# a436ae94 15-Aug-2023 Liao Chang <liaochang1@huawei.com>

cpufreq: Use clamp() helper macro to improve the code readability

The valid values of policy.{min, max} should be between 'min' and 'max',
so use clamp() helper macro to makes cpufreq_verify_within_limits() easier
to follow.

Signed-off-by: Liao Chang <liaochang1@huawei.com>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 6a4fec4f 14-Aug-2023 Liao Chang <liaochang1@huawei.com>

cpufreq: cppc: cppc_cpufreq_get_rate() returns zero in all error cases.

The cpufreq framework used to use the zero of return value to reflect
the cppc_cpufreq_get_rate() had failed to get current frequecy and treat
all positive integer to be succeed. Since cppc_get_perf_ctrs() returns a
negative integer in error case, so it is better to convert the value to
zero as the return value of cppc_cpufreq_get_rate().

Signed-off-by: Liao Chang <liaochang1@huawei.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>


# b4a11fa3 29-May-2023 Wyes Karny <wyes.karny@amd.com>

cpufreq: Fail driver register if it has adjust_perf without fast_switch

If fast_switch_possible flag is set by the scaling driver, the governor
is free to select fast_switch function even if adjust_perf is set. Some
scaling drivers which use adjust_perf don't set fast_switch thinking
that the governor would never fall back to fast_switch. But the governor
can fall back to fast_switch even in runtime if frequency invariance is
disabled due to some reason. This could crash the kernel if the driver
didn't set the fast_switch function pointer.

Therefore, fail driver registration if it has adjust_perf without
fast_switch.

Suggested-by: Rafael J. Wysocki <rafael@kernel.org>
Suggested-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Wyes Karny <wyes.karny@amd.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 21bb32b1 29-Mar-2023 Rob Herring <robh@kernel.org>

cpufreq: Adjust includes to remove of_device.h

Now that of_cpu_device_node_get() is defined in of.h, of_device.h is just
implicitly including other includes, and is no longer needed. Adjust the
include files with what was implicitly included by of_device.h (cpu.h and
of.h) and drop including of_device.h.

Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
Link: https://lore.kernel.org/r/20230329-dt-cpu-header-cleanups-v1-14-581e2605fe47@kernel.org
Signed-off-by: Rob Herring <robh@kernel.org>


# a038895e 03-Apr-2023 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: drivers with target_index() must set freq_table

Since the cpufreq core directly uses freq_table, for cpufreq drivers
that set their target_index() callback, make it mandatory for them to
set the same.

Since this is set per policy and normally from policy->init(), do this
from cpufreq_table_validate_and_sort() which gets called right after
->init().

Reported-by: Yajun Deng <yajun.deng@linux.dev>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# dd329e1e 07-Feb-2023 Uwe Kleine-König <u.kleine-koenig@pengutronix.de>

cpufreq: Make cpufreq_unregister_driver() return void

All but a few drivers ignore the return value of
cpufreq_unregister_driver(). Those few that don't only call it after
cpufreq_register_driver() succeeded, in which case the call doesn't
fail.

Make the function return no value and add a WARN_ON for the case that
the function is called in an invalid situation (i.e. without a previous
successful call to cpufreq_register_driver()).

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Acked-by: Florian Fainelli <f.fainelli@gmail.com> # brcmstb-avs-cpufreq.c
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# d182dc6d 23-Oct-2022 Hector Martin <marcan@marcan.st>

cpufreq: Generalize of_perf_domain_get_sharing_cpumask phandle format

of_perf_domain_get_sharing_cpumask currently assumes a 1-argument
phandle format, and directly returns the argument. Generalize this to
return the full of_phandle_args, so it can be used by drivers which use
other phandle styles (e.g. separate nodes). This also requires changing
the CPU sharing match to compare the full args structure.

Also, make sure to of_node_put(args.np) (the original code was leaking a
reference).

Signed-off-by: Hector Martin <marcan@marcan.st>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>


# 7d84c1eb 15-Apr-2022 Thomas Gleixner <tglx@linutronix.de>

x86/aperfmperf: Replace aperfmperf_get_khz()

The frequency invariance infrastructure provides the APERF/MPERF samples
already. Utilize them for the cpu frequency display in /proc/cpuinfo.

The sample is considered valid for 20ms. So for idle or isolated NOHZ full
CPUs the function returns 0, which is matching the previous behaviour.

This gets rid of the mass IPIs and a delay of 20ms for stabilizing observed
by Eric when reading /proc/cpuinfo.

Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Paul E. McKenney <paulmck@kernel.org>
Link: https://lore.kernel.org/r/20220415161206.875029458@linutronix.de


# ae265086 23-Jan-2022 Kevin Hao <haokexin@gmail.com>

cpufreq: Move to_gov_attr_set() to cpufreq.h

So it can be reused by other codes.

Signed-off-by: Kevin Hao <haokexin@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 4f774c4a 27-Jan-2022 Bjorn Andersson <bjorn.andersson@linaro.org>

cpufreq: Reintroduce ready() callback

This effectively revert '4bf8e582119e ("cpufreq: Remove ready()
callback")', in order to reintroduce the ready callback.

This is needed in order to be able to leave the thermal pressure
interrupts in the Qualcomm CPUfreq driver disabled during
initialization, so that it doesn't fire while related_cpus are still 0.

Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
[ Viresh: Added the Chinese translation as well and updated commit msg ]
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>


# 4a08e327 04-Oct-2021 Hector.Yuan <hector.yuan@mediatek.com>

cpufreq: Fix parameter in parse_perf_domain()

Pass cpu to parse_perf_domain() instead of pcpu.

Fixes: 8486a32dd484 ("cpufreq: Add of_perf_domain_get_sharing_cpumask")
Signed-off-by: Hector.Yuan <hector.yuan@mediatek.com>
[ Viresh: Massaged changelog ]
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>


# 3598b30b 20-Oct-2021 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq: Fix typo in cpufreq.h

s/internale/internal/

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>


# b894d20e 08-Sep-2021 Vincent Donnefort <vincent.donnefort@arm.com>

cpufreq: Use CPUFREQ_RELATION_E in DVFS governors

Let the governors schedutil, conservative and ondemand to work, if possible
on efficient frequencies only.

Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 1f39fa0d 08-Sep-2021 Vincent Donnefort <vincent.donnefort@arm.com>

cpufreq: Introducing CPUFREQ_RELATION_E

This newly introduced flag can be applied by a governor to a CPUFreq
relation, when looking for a frequency within the policy table. The
resolution would then only walk through efficient frequencies.

Even with the flag set, the policy max limit will still be honoured. If no
efficient frequencies can be found within the limits of the policy, an
inefficient one would be returned.

Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 442d24a5 08-Sep-2021 Vincent Donnefort <vincent.donnefort@arm.com>

cpufreq: Add an interface to mark inefficient frequencies

Some SoCs such as the sd855 have OPPs within the same policy whose cost is
higher than others with a higher frequency. Those OPPs are inefficients
and it might be interesting for a governor to not use them.

cpufreq_table_set_inefficient() allows the caller to identify a specified
frequency as being inefficient. Inefficient frequencies are only supported
on sorted tables.

Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 8486a32d 03-Sep-2021 Hector.Yuan <hector.yuan@mediatek.com>

cpufreq: Add of_perf_domain_get_sharing_cpumask

Add of_perf_domain_get_sharing_cpumask function to group cpu
to specific performance domain.

Signed-off-by: Hector.Yuan <hector.yuan@mediatek.com>
[ Viresh: create separate routine parse_perf_domain() and always set the
cpumask. ]
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>


# 4bf8e582 01-Sep-2021 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Remove ready() callback

This isn't used anymore, get rid of it.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# c17495b0 09-Aug-2021 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Add callback to register with energy model

Many cpufreq drivers register with the energy model for each policy and
do exactly the same thing. Follow the footsteps of thermal-cooling, to
get it done from the cpufreq core itself.

Provide a new callback, which will be called, if present, by the cpufreq
core at the right moment (more on that in the code's comment). Also
provide a generic implementation that uses dev_pm_opp_of_register_em().

This also allows us to register with the EM at a later point of time,
compared to ->init(), from where the EM core can access cpufreq policy
directly using cpufreq_cpu_get() type of helpers and perform other work,
like marking few frequencies inefficient, this will be done separately.

Reviewed-by: Quentin Perret <qperret@google.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>


# b3beca76 29-Jun-2021 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Remove ->resolve_freq()

Commit e3c062360870 ("cpufreq: add cpufreq_driver_resolve_freq()")
introduced this callback, back in 2016, for drivers that provide the
->target() callback.

The kernel hasn't seen a single user of it in the past 5 years and
it is not likely to be used any time soon.

Remove it for now.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
[ rjw: Changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 3e0f897f 22-Jun-2021 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Remove the ->stop_cpu() driver callback

Now that all users of ->stop_cpu() have been migrated to using other
callbacks, drop it from the core.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
[ rjw: Minor edits in the subject and changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 2f053186 01-Feb-2021 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Remove unused flag CPUFREQ_PM_NO_WARN

This flag is set by one of the drivers but it isn't used in the code
otherwise. Remove the unused flag and update the driver.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 5ae4a4b4 01-Feb-2021 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Remove CPUFREQ_STICKY flag

During cpufreq driver's registration, if the ->init() callback for all
the CPUs fail then there is not much point in keeping the driver around
as it will only account for more of unnecessary noise, for example
cpufreq core will try to suspend/resume the driver which never got
registered properly.

The removal of such a driver is avoided if the driver carries the
CPUFREQ_STICKY flag. This was added way back [1] in 2004 and perhaps no
one should ever need it now. A lot of drivers do set this flag, probably
because they just copied it from other drivers.

This was added earlier for some platforms [2] because their cpufreq
drivers were getting registered before the CPUs were registered with
subsys framework. And hence they used to fail.

The same isn't true anymore though. The current code flow in the kernel
is:

start_kernel()
-> kernel_init()
-> kernel_init_freeable()
-> do_basic_setup()
-> driver_init()
-> cpu_dev_init()
-> subsys_system_register() //For CPUs

-> do_initcalls()
-> cpufreq_register_driver()

Clearly, the CPUs will always get registered with subsys framework
before any cpufreq driver can get probed. Remove the flag and update the
relevant drivers.

Link: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git/commit/include/linux/cpufreq.h?id=7cc9f0d9a1ab04cedc60d64fd8dcf7df224a3b4d # [1]
Link: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git/commit/arch/arm/mach-sa1100/cpu-sa1100.c?id=f59d3bbe35f6268d729f51be82af8325d62f20f5 # [2]
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# ee2cc427 14-Dec-2020 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq: Add special-purpose fast-switching callback for drivers

First off, some cpufreq drivers (eg. intel_pstate) can pass hints
beyond the current target frequency to the hardware and there are no
provisions for doing that in the cpufreq framework. In particular,
today the driver has to assume that it should not allow the frequency
to fall below the one requested by the governor (or the required
capacity may not be provided) which may not be the case and which may
lead to excessive energy usage in some scenarios.

Second, the hints passed by these drivers to the hardware need not be
in terms of the frequency, so representing the utilization numbers
coming from the scheduler as frequency before passing them to those
drivers is not really useful.

Address the two points above by adding a special-purpose replacement
for the ->fast_switch callback, called ->adjust_perf, allowing the
governor to pass abstract performance level (rather than frequency)
values for the minimum (required) and target (desired) performance
along with the CPU capacity to compare them to.

Also update the schedutil governor to use the new callback instead
of ->fast_switch if present and if the utilization mertics are
frequency-invariant (that is requisite for the direct mapping
between the utilization and the CPU performance levels to be a
reasonable approximation).

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>


# ea9364bb 10-Nov-2020 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq: Add strict_target to struct cpufreq_policy

Add a new field to be set when the CPUFREQ_GOV_STRICT_TARGET flag is
set for the current governor to struct cpufreq_policy, so that the
drivers needing to check CPUFREQ_GOV_STRICT_TARGET do not have to
access the governor object during every frequency transition.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>


# 218f6687 10-Nov-2020 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq: Introduce CPUFREQ_GOV_STRICT_TARGET

Introduce a new governor flag, CPUFREQ_GOV_STRICT_TARGET, for the
governors that want the target frequency to be set exactly to the
given value without leaving any room for adjustments on the hardware
side and set this flag for the powersave and performance governors.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>


# 9a2a9ebc 10-Nov-2020 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq: Introduce governor flags

A new cpufreq governor flag will be added subsequently, so replace
the bool dynamic_switching fleid in struct cpufreq_governor with a
flags field and introduce CPUFREQ_GOV_DYNAMIC_SWITCHING to set for
the "dynamic switching" governors instead of it.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>


# 56a7ff75 22-Oct-2020 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq: Drop restore_freq from struct cpufreq_policy

The restore_freq field in struct cpufreq_policy is only used by
__target_index() in one place and a local variable in that function
may as well be used instead of it, so drop it and modify
__target_index() accordingly.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>


# a62f68f5 23-Oct-2020 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq: Introduce cpufreq_driver_test_flags()

Add a helper function to test the flags of the cpufreq driver in use
againt a given flags mask.

In particular, this will be needed to test the
CPUFREQ_NEED_UPDATE_LIMITS cpufreq driver flag in the schedutil
governor.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 1c534352 23-Oct-2020 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq: Introduce CPUFREQ_NEED_UPDATE_LIMITS driver flag

Generally, a cpufreq driver may need to update some internal upper
and lower frequency boundaries on policy max and min changes,
respectively, but currently this does not work if the target
frequency does not change along with the policy limit.

Namely, if the target frequency does not change along with the
policy min or max, the "target_freq == policy->cur" check in
__cpufreq_driver_target() prevents driver callbacks from being
invoked and they do not even have a chance to update the
corresponding internal boundary.

This particularly affects the "powersave" and "performance"
governors that always set the target frequency to one of the
policy limits and it never changes when the other limit is updated.

To allow cpufreq the drivers needing to update internal frequency
boundaries on policy limits changes to avoid this issue, introduce
a new driver flag, CPUFREQ_NEED_UPDATE_LIMITS, that (when set) will
neutralize the check mentioned above.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>


# a20b7053 24-Sep-2020 Ionela Voinescu <ionela.voinescu@arm.com>

cpufreq,arm,arm64: restructure definitions of arch_set_freq_scale()

Compared to other arch_* functions, arch_set_freq_scale() has an atypical
weak definition that can be replaced by a strong architecture specific
implementation.

The more typical support for architectural functions involves defining
an empty stub in a header file if the symbol is not already defined in
architecture code. Some examples involve:
- #define arch_scale_freq_capacity topology_get_freq_scale
- #define arch_scale_freq_invariant topology_scale_freq_invariant
- #define arch_scale_cpu_capacity topology_get_cpu_scale
- #define arch_update_cpu_topology topology_update_cpu_topology
- #define arch_scale_thermal_pressure topology_get_thermal_pressure
- #define arch_set_thermal_pressure topology_set_thermal_pressure

Bring arch_set_freq_scale() in line with these functions by renaming it to
topology_set_freq_scale() in the arch topology driver, and by defining the
arch_set_freq_scale symbol to point to the new function for arm and arm64.

While there are other users of the arch_topology driver, this patch defines
arch_set_freq_scale for arm and arm64 only, due to their existing
definitions of arch_scale_freq_capacity. This is the getter function of the
frequency invariance scale factor and without a getter function, the
setter function - arch_set_freq_scale() has not purpose.

Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Sudeep Holla <sudeep.holla@arm.com> (BL_SWITCHER and topology parts)
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# ecddc3a0 01-Sep-2020 Valentin Schneider <valentin.schneider@arm.com>

arch_topology, cpufreq: constify arch_* cpumasks

The passed cpumask arguments to arch_set_freq_scale() and
arch_freq_counters_available() are only iterated over, so reflect this
in the prototype. This also allows to pass system cpumasks like
cpu_online_mask without getting a warning.

Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 874f6353 01-Sep-2020 Ionela Voinescu <ionela.voinescu@arm.com>

cpufreq: report whether cpufreq supports Frequency Invariance (FI)

Now that the update of the FI scale factor is done in cpufreq core for
selected functions - target(), target_index() and fast_switch(),
we can provide feedback to the task scheduler and architecture code
on whether cpufreq supports FI.

For this purpose provide an external function to expose whether the
cpufreq drivers support FI, by using a static key.

The logic behind the enablement of cpufreq-based invariance is as
follows:
- cpufreq-based invariance is disabled by default
- cpufreq-based invariance is enabled if any of the callbacks
above is implemented while the unsupported setpolicy() is not

The cpufreq_supports_freq_invariance() function only returns whether
cpufreq is instrumented with the arch_set_freq_scale() calls that
result in support for frequency invariance. Due to the lack of knowledge
on whether the implementation of arch_set_freq_scale() actually results
in the setting of a scale factor based on cpufreq information, it is up
to the architecture code to ensure the setting and provision of the
scale factor to the scheduler.

Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 30b8e6b2 26-Aug-2020 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Use WARN_ON_ONCE() for invalid relation

The relation can't be invalid here, so if it turns out to be invalid,
just WARN_ON_ONCE() and return 0.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# f6ebbcf0 06-Aug-2020 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq: intel_pstate: Implement passive mode with HWP enabled

Allow intel_pstate to work in the passive mode with HWP enabled and
make it set the HWP minimum performance limit (HWP floor) to the
P-state value given by the target frequency supplied by the cpufreq
governor, so as to prevent the HWP algorithm and the CPU scheduler
from working against each other, at least when the schedutil governor
is in use, and update the intel_pstate documentation accordingly.

Among other things, this allows utilization clamps to be taken
into account, at least to a certain extent, when intel_pstate is
in use and makes it more likely that sufficient capacity for
deadline tasks will be provided.

After this change, the resulting behavior of an HWP system with
intel_pstate in the passive mode should be close to the behavior
of the analogous non-HWP system with intel_pstate in the passive
mode, except that the HWP algorithm is generally allowed to make the
CPU run at a frequency above the floor P-state set by intel_pstate in
the entire available range of P-states, while without HWP a CPU can
run in a P-state above the requested one if the latter falls into the
range of turbo P-states (referred to as the turbo range) or if the
P-states of all CPUs in one package are coordinated with each other
at the hardware level.

[Note that in principle the HWP floor may not be taken into account
by the processor if it falls into the turbo range, in which case the
processor has a license to choose any P-state, either below or above
the HWP floor, just like a non-HWP processor in the case when the
target P-state falls into the turbo range.]

With this change applied, intel_pstate in the passive mode assumes
complete control over the HWP request MSR and concurrent changes of
that MSR (eg. via the direct MSR access interface) are overridden by
it.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>


# 292072c3 29-Jul-2020 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: cached_resolved_idx can not be negative

It is not possible for cached_resolved_idx to be invalid here as the
cpufreq core always sets index to a positive value.

Change its type to unsigned int and fix qcom usage a bit.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>


# 10dd8573 29-Jun-2020 Quentin Perret <qperret@google.com>

cpufreq: Register governors at core_initcall

Currently, most CPUFreq governors are registered at the core_initcall
time when the given governor is the default one, and the module_init
time otherwise.

In preparation for letting users specify the default governor on the
kernel command line, change all of them to be registered at the
core_initcall unconditionally, as it is already the case for the
schedutil and performance governors. This will allow us to assume
that builtin governors have been registered before the built-in
CPUFreq drivers probe.

And since all governors have similar init/exit patterns now, introduce
two new macros, cpufreq_governor_{init,exit}(), to factorize the code.

Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
[ rjw: Changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# cf6fada7 29-May-2020 Xiongfeng Wang <wangxiongfeng2@huawei.com>

cpufreq: change '.set_boost' to act on one policy

Macro 'for_each_active_policy()' is defined internally. To avoid some
cpufreq driver needing this macro to iterate over all the policies in
'.set_boost' callback, we redefine '.set_boost' to act on only one
policy and pass the policy as an argument.

'cpufreq_boost_trigger_state()' iterates over all the policies to set
boost for the system.

This is preparation for adding SW BOOST support for CPPC.

To protect Boost enable/disable by sysfs from CPU online/offline,
add 'cpu_hotplug_lock' before calling '.set_boost' for each CPU.

Also move the lock from 'set_boost()' to 'store_cpb()' in
acpi_cpufreq.

Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com>
Suggested-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
[ rjw: Subject & changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 2909438d 13-May-2020 Wang Wenhu <wenhu.wang@vivo.com>

cpufreq: fix minor typo in struct cpufreq_driver doc comment

Delete the duplicate "to", possibly double-typed.

Signed-off-by: Wang Wenhu <wenhu.wang@vivo.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# bbce8eaa 05-Mar-2020 Ionela Voinescu <ionela.voinescu@arm.com>

cpufreq: add function to get the hardware max frequency

Add weak function to return the hardware maximum frequency of a CPU,
with the default implementation returning cpuinfo.max_freq, which is
the best information we can generically get from the cpufreq framework.

The default can be overwritten by a strong function in platforms
that want to provide an alternative implementation, with more accurate
information, obtained either from hardware or firmware.

Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>


# 183edb20 03-Feb-2020 Yangtao Li <tiny.windzz@gmail.com>

cpufreq: Make cpufreq_global_kobject static

The cpufreq_global_kobject is only used internally by cpufreq.c
after commit 2361be236662 ("cpufreq: Don't create empty
/sys/devices/system/cpu/cpufreq directory").

Make it static.

Signed-off-by: Yangtao Li <tiny.windzz@gmail.com>
[ rjw: Add empty line after cpufreq_global_kobject definition ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 1e4f63ae 26-Jan-2020 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq: Avoid creating excessively large stack frames

In the process of modifying a cpufreq policy, the cpufreq core makes
a copy of it including all of the internals which is stored on the
CPU stack. Because struct cpufreq_policy is relatively large, this
may cause the size of the stack frame to exceed the 2 KB limit and
so the GCC complains when -Wframe-larger-than= is used.

In fact, it is not necessary to copy the entire policy structure
in order to modify it, however.

First, because cpufreq_set_policy() obtains the min and max policy
limits from frequency QoS now, it is not necessary to pass the limits
to it from the callers. The only things that need to be passed to it
from there are the new governor pointer or (if there is a built-in
governor in the driver) the "policy" value representing the governor
choice. They both can be passed as individual arguments, though, so
make cpufreq_set_policy() take them this way and rework its callers
accordingly. This avoids making copies of cpufreq policies in the
callers of cpufreq_set_policy().

Second, cpufreq_set_policy() still needs to pass the new policy
data to the ->verify() callback of the cpufreq driver whose task
is to sanitize the min and max policy limits. It still does not
need to make a full copy of struct cpufreq_policy for this purpose,
but it needs to pass a few items from it to the driver in case they
are needed (different drivers have different needs in that respect
and all of them have to be covered). For this reason, introduce
struct cpufreq_policy_data to hold copies of the members of
struct cpufreq_policy used by the existing ->verify() driver
callbacks and pass a pointer to a temporary structure of that
type to ->verify() (instead of passing a pointer to full struct
cpufreq_policy to it).

While at it, notice that intel_pstate and longrun don't really need
to verify the "policy" value in struct cpufreq_policy, so drop those
check from them to avoid copying "policy" into struct
cpufreq_policy_data (which allows it to be slightly smaller).

Also while at it fix up white space in a couple of places and make
cpufreq_set_policy() static (as it can be so).

Fixes: 3000ce3c52f8 ("cpufreq: Use per-policy frequency QoS")
Link: https://lore.kernel.org/linux-pm/CAMuHMdX6-jb1W8uC2_237m8ctCpsnGp=JCxqt8pCWVqNXHmkVg@mail.gmail.com
Reported-by: kbuild test robot <lkp@intel.com>
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: 5.4+ <stable@vger.kernel.org> # 5.4+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>


# 85572c2c 11-Dec-2019 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq: Avoid leaving stale IRQ work items during CPU offline

The scheduler code calling cpufreq_update_util() may run during CPU
offline on the target CPU after the IRQ work lists have been flushed
for it, so the target CPU should be prevented from running code that
may queue up an IRQ work item on it at that point.

Unfortunately, that may not be the case if dvfs_possible_from_any_cpu
is set for at least one cpufreq policy in the system, because that
allows the CPU going offline to run the utilization update callback
of the cpufreq governor on behalf of another (online) CPU in some
cases.

If that happens, the cpufreq governor callback may queue up an IRQ
work on the CPU running it, which is going offline, and the IRQ work
may not be flushed after that point. Moreover, that IRQ work cannot
be flushed until the "offlining" CPU goes back online, so if any
other CPU calls irq_work_sync() to wait for the completion of that
IRQ work, it will have to wait until the "offlining" CPU is back
online and that may not happen forever. In particular, a system-wide
deadlock may occur during CPU online as a result of that.

The failing scenario is as follows. CPU0 is the boot CPU, so it
creates a cpufreq policy and becomes the "leader" of it
(policy->cpu). It cannot go offline, because it is the boot CPU.
Next, other CPUs join the cpufreq policy as they go online and they
leave it when they go offline. The last CPU to go offline, say CPU3,
may queue up an IRQ work while running the governor callback on
behalf of CPU0 after leaving the cpufreq policy because of the
dvfs_possible_from_any_cpu effect described above. Then, CPU0 is
the only online CPU in the system and the stale IRQ work is still
queued on CPU3. When, say, CPU1 goes back online, it will run
irq_work_sync() to wait for that IRQ work to complete and so it
will wait for CPU3 to go back online (which may never happen even
in principle), but (worse yet) CPU0 is waiting for CPU1 at that
point too and a system-wide deadlock occurs.

To address this problem notice that CPUs which cannot run cpufreq
utilization update code for themselves (for example, because they
have left the cpufreq policies that they belonged to), should also
be prevented from running that code on behalf of the other CPUs that
belong to a cpufreq policy with dvfs_possible_from_any_cpu set and so
in that case the cpufreq_update_util_data pointer of the CPU running
the code must not be NULL as well as for the CPU which is the target
of the cpufreq utilization update in progress.

Accordingly, change cpufreq_this_cpu_can_update() into a regular
function in kernel/sched/cpufreq.c (instead of a static inline in a
header file) and make it check the cpufreq_update_util_data pointer
of the local CPU if dvfs_possible_from_any_cpu is set for the target
cpufreq policy.

Also update the schedutil governor to do the
cpufreq_this_cpu_can_update() check in the non-fast-switch
case too to avoid the stale IRQ work issues.

Fixes: 99d14d0e16fa ("cpufreq: Process remote callbacks from any CPU if the platform permits")
Link: https://lore.kernel.org/linux-pm/20191121093557.bycvdo4xyinbc5cb@vireshk-i7/
Reported-by: Anson Huang <anson.huang@nxp.com>
Tested-by: Anson Huang <anson.huang@nxp.com>
Cc: 4.14+ <stable@vger.kernel.org> # 4.14+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Tested-by: Peng Fan <peng.fan@nxp.com> (i.MX8QXP-MEK)
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 3000ce3c 15-Oct-2019 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq: Use per-policy frequency QoS

Replace the CPU device PM QoS used for the management of min and max
frequency constraints in cpufreq (and its users) with per-policy
frequency QoS to avoid problems with cpufreq policies covering
more then one CPU.

Namely, a cpufreq driver is registered with the subsys interface
which calls cpufreq_add_dev() for each CPU, starting from CPU0, so
currently the PM QoS notifiers are added to the first CPU in the
policy (i.e. CPU0 in the majority of cases).

In turn, when the cpufreq driver is unregistered, the subsys interface
doing that calls cpufreq_remove_dev() for each CPU, starting from CPU0,
and the PM QoS notifiers are only removed when cpufreq_remove_dev() is
called for the last CPU in the policy, say CPUx, which as a rule is
not CPU0 if the policy covers more than one CPU. Then, the PM QoS
notifiers cannot be removed, because CPUx does not have them, and
they are still there in the device PM QoS notifiers list of CPU0,
which prevents new PM QoS notifiers from being registered for CPU0
on the next attempt to register the cpufreq driver.

The same issue occurs when the first CPU in the policy goes offline
before unregistering the driver.

After this change it does not matter which CPU is the policy CPU at
the driver registration time and whether or not it is online all the
time, because the frequency QoS is per policy and not per CPU.

Fixes: 67d874c3b2c6 ("cpufreq: Register notifiers with the PM QoS framework")
Reported-by: Dmitry Osipenko <digetx@gmail.com>
Tested-by: Dmitry Osipenko <digetx@gmail.com>
Reported-by: Sudeep Holla <sudeep.holla@arm.com>
Tested-by: Sudeep Holla <sudeep.holla@arm.com>
Diagnosed-by: Viresh Kumar <viresh.kumar@linaro.org>
Link: https://lore.kernel.org/linux-pm/5ad2624194baa2f53acc1f1e627eb7684c577a19.1562210705.git.viresh.kumar@linaro.org/T/#md2d89e95906b8c91c15f582146173dce2e86e99f
Link: https://lore.kernel.org/linux-pm/20191017094612.6tbkwoq4harsjcqv@vireshk-i7/T/#m30d48cc23b9a80467fbaa16e30f90b3828a5a29b
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>


# df0eea44 23-Jul-2019 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Remove CPUFREQ_ADJUST and CPUFREQ_NOTIFY policy notifier events

No driver makes reference to these events now, remove them and the code
related to them.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 6a149036 23-Jul-2019 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Add policy create/remove notifiers back

This effectively reverts some changes made by commit f9f41e3ef99
("cpufreq: Remove policy create/remove notifiers").

We have a new use case for policy create/remove notifiers (for
allocating/freeing QoS requests per policy), so add them back.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
[ rjw: Subject & changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# c4dcc8a1 15-Jul-2019 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Make cpufreq_generic_init() return void

It always returns 0 (success) and its return type should really be void.

Over that, many drivers have added error handling code based on its
return value, which is not required at all.

Change its return type to void and update all the callers.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 18c49926 05-Jul-2019 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Add QoS requests for userspace constraints

This implements QoS requests to manage userspace configuration of min
and max frequency.

Reviewed-by: Matthias Kaehlcke <mka@chromium.org>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Tested-by: syzbot <syzbot+de771ae9390dffed7266@syzkaller.appspotmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# c57b25bd 04-Jul-2019 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: intel_pstate: Reuse refresh_frequency_limits()

The implementation of intel_pstate_update_max_freq() is quite similar to
refresh_frequency_limits(), lets reuse it.

Finding minimum of policy->user_policy.max and policy->cpuinfo.max_freq
in intel_pstate_update_max_freq() is redundant as cpufreq_set_policy()
will call the ->verify() callback of intel-pstate driver, which will do
this comparison anyway and so dropping it from
intel_pstate_update_max_freq() doesn't harm.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 67d874c3 08-Jul-2019 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Register notifiers with the PM QoS framework

Register notifiers for min/max frequency constraints with the PM QoS
framework. The constraints are also taken into consideration in
cpufreq_set_policy().

This also relocates cpufreq_policy_put_kobj() as it is required to be
called from cpufreq_policy_alloc() now.

refresh_frequency_limits() is updated to avoid calling
cpufreq_set_policy() for inactive policies and handle_update() is
updated to have proper locking in place.

No constraints are added until now though.

Reviewed-by: Matthias Kaehlcke <mka@chromium.org>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Tested-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# bcc61569 25-Jun-2019 Daniel Lezcano <daniel.lezcano@linaro.org>

cpufreq: Move the IS_ENABLED(CPU_THERMAL) macro into a stub

cpufreq_online() and cpufreq_offline() [un]register the driver as
a cooling device. This is done if the driver is flagged as a cooling
device in addition with an IS_ENABLED() check to compile out the branching
code.

Group this test in a stub function added in the cpufreq header instead
of having the IS_ENABLED() in the code.

Suggested-by: Rafael J. Wysocki <rafael@kernel.org>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# d2912cb1 04-Jun-2019 Thomas Gleixner <tglx@linutronix.de>

treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500

Based on 2 normalized pattern(s):

this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license version 2 as
published by the free software foundation

this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license version 2 as
published by the free software foundation #

extracted by the scancode license scanner the SPDX license identifier

GPL-2.0-only

has been chosen to replace the boilerplate/reference in 4122 file(s).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Enrico Weigelt <info@metux.net>
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Allison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190604081206.933168790@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# df24014a 29-Apr-2019 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Call transition notifier only once for each policy

Currently, the notifiers are called once for each CPU of the policy->cpus
cpumask. It would be more optimal if the notifier can be called only
once and all the relevant information be provided to it. Out of the 23
drivers that register for the transition notifiers today, only 4 of them
do per-cpu updates and the callback for the rest can be called only once
for the policy without any impact.

This would also avoid multiple function calls to the notifier callbacks
and reduce multiple iterations of notifier core's code (which does
locking as well).

This patch adds pointer to the cpufreq policy to the struct
cpufreq_freqs, so the notifier callback has all the information
available to it with a single call. The five drivers which perform
per-cpu updates are updated to use the cpufreq policy. The freqs->cpu
field is redundant now and is removed.

Acked-by: David S. Miller <davem@davemloft.net> (sparc)
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 9083e498 25-Mar-2019 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq: intel_pstate: Update max frequency on global turbo changes

While the cpuinfo.max_freq value doesn't really matter for
intel_pstate in the active mode, in the passive mode it is used by
governors as the maximum physical frequency of the CPU and the
results of governor computations generally depend on it. Also it
is made available to user space via sysfs and it should match the
current HW configuration.

For this reason, make intel_pstate update cpuinfo.max_freq for all
CPUs if it detects a global change of turbo frequency settings from
"disable" to "enable" or the other way associated with a _PPC change
notification from the platform firmware.

Note that policy_is_inactive(), cpufreq_cpu_acquire(),
cpufreq_cpu_release(), and cpufreq_set_policy() need to be made
available to it for this purpose.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=200759
Reported-by: Gabriele Mazzotta <gabriele.mzt@gmail.com>
Tested-by: Gabriele Mazzotta <gabriele.mzt@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>


# 5a25e3f7 25-Mar-2019 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq: intel_pstate: Driver-specific handling of _PPC updates

In some cases, the platform firmware disables or enables turbo
frequencies for all CPUs globally before triggering a _PPC change
notification for one of them. Obviously, that global change affects
all CPUs, not just the notified one, and it needs to be acted upon by
cpufreq.

The intel_pstate driver is able to detect such global changes of
the settings, but it also needs to update policy limits for all
CPUs if that happens, in particular if turbo frequencies are
enabled globally - to allow them to be used.

For this reason, introduce a new cpufreq driver callback to be
invoked on _PPC notifications, if present, instead of simply
calling cpufreq_update_policy() for the notified CPU and make
intel_pstate use it to trigger policy updates for all CPUs
in the system if global settings change.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=200759
Reported-by: Gabriele Mazzotta <gabriele.mzt@gmail.com>
Tested-by: Gabriele Mazzotta <gabriele.mzt@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>


# 91a12e91 12-Feb-2019 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Allow light-weight tear down and bring up of CPUs

The cpufreq core doesn't remove the cpufreq policy anymore on CPU
offline operation, rather that happens when the CPU device gets
unregistered from the kernel. This allows faster recovery when the CPU
comes back online. This is also very useful during system wide
suspend/resume where we offline all non-boot CPUs during suspend and
then bring them back on resume.

This commit takes the same idea a step ahead to allow drivers to do
light weight tear-down and bring-up during CPU offline and online
operations.

A new set of callbacks is introduced, online/offline(). online() gets
called when the first CPU of an inactive policy is brought up and
offline() gets called when all the CPUs of a policy are offlined.

The existing init/exit() callback get called on policy
creation/destruction. They also get called instead of online/offline()
callbacks if the online/offline() callbacks aren't provided.

This also moves around some code to get executed only for the new-policy
case going forward.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 5c238a8b 29-Jan-2019 Amit Kucheria <amit.kucheria@linaro.org>

cpufreq: Auto-register the driver as a thermal cooling device if asked

All cpufreq drivers do similar things to register as a cooling device.
Provide a cpufreq driver flag so drivers can just ask the cpufreq core
to register the cooling device on their behalf. This allows us to get
rid of duplicated code in the drivers.

In order to allow this, we add a struct thermal_cooling_device pointer
to struct cpufreq_policy so that drivers don't need to store it in a
private data structure.

Suggested-by: Stephen Boyd <swboyd@chromium.org>
Suggested-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Amit Kucheria <amit.kucheria@linaro.org>
Reviewed-by: Matthias Kaehlcke <mka@chromium.org>
Tested-by: Matthias Kaehlcke <mka@chromium.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 625c85a6 24-Jan-2019 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Use struct kobj_attribute instead of struct global_attr

The cpufreq_global_kobject is created using kobject_create_and_add()
helper, which assigns the kobj_type as dynamic_kobj_ktype and show/store
routines are set to kobj_attr_show() and kobj_attr_store().

These routines pass struct kobj_attribute as an argument to the
show/store callbacks. But all the cpufreq files created using the
cpufreq_global_kobject expect the argument to be of type struct
attribute. Things work fine currently as no one accesses the "attr"
argument. We may not see issues even if the argument is used, as struct
kobj_attribute has struct attribute as its first element and so they
will both get same address.

But this is logically incorrect and we should rather use struct
kobj_attribute instead of struct global_attr in the cpufreq core and
drivers and the show/store callbacks should take struct kobj_attribute
as argument instead.

This bug is caught using CFI CLANG builds in android kernel which
catches mismatch in function prototypes for such callbacks.

Reported-by: Donghee Han <dh.han@samsung.com>
Reported-by: Sangkyu Kim <skwith.kim@samsung.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 8321be6a 21-Jan-2019 Amit Kucheria <amit.kucheria@linaro.org>

cpufreq: Replace open-coded << with BIT()

Minor clean-up to use BIT() and keep checkpatch happy. Clean up the
comment formatting while we're at it to make it easier to read.

Signed-off-by: Amit Kucheria <amit.kucheria@linaro.org>
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 531b5c9f 03-Dec-2018 Quentin Perret <qperret@qperret.net>

sched/topology: Make Energy Aware Scheduling depend on schedutil

Energy Aware Scheduling (EAS) is designed with the assumption that
frequencies of CPUs follow their utilization value. When using a CPUFreq
governor other than schedutil, the chances of this assumption being true
are small, if any. When schedutil is being used, EAS' predictions are at
least consistent with the frequency requests. Although those requests
have no guarantees to be honored by the hardware, they should at least
guide DVFS in the right direction and provide some hope in regards to the
EAS model being accurate.

To make sure EAS is only used in a sane configuration, create a strong
dependency on schedutil being used. Since having sugov compiled-in does
not provide that guarantee, make CPUFreq call a scheduler function on
governor changes hence letting it rebuild the scheduling domains, check
the governors of the online CPUs, and enable/disable EAS accordingly.

Signed-off-by: Quentin Perret <quentin.perret@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: adharmap@codeaurora.org
Cc: chris.redpath@arm.com
Cc: currojerez@riseup.net
Cc: dietmar.eggemann@arm.com
Cc: edubezval@gmail.com
Cc: gregkh@linuxfoundation.org
Cc: javi.merino@kernel.org
Cc: joel@joelfernandes.org
Cc: juri.lelli@redhat.com
Cc: morten.rasmussen@arm.com
Cc: patrick.bellasi@arm.com
Cc: pkondeti@codeaurora.org
Cc: skannan@codeaurora.org
Cc: smuckle@google.com
Cc: srinivas.pandruvada@linux.intel.com
Cc: thara.gopinath@linaro.org
Cc: tkjos@google.com
Cc: valentin.schneider@arm.com
Cc: vincent.guittot@linaro.org
Cc: viresh.kumar@linaro.org
Link: https://lkml.kernel.org/r/20181203095628.11858-9-quentin.perret@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 03639978 22-May-2018 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Rename cpufreq_can_do_remote_dvfs()

This routine checks if the CPU running this code belongs to the policy
of the target CPU or if not, can it do remote DVFS for it remotely. But
the current name of it implies as if it is only about doing remote
updates.

Rename it to make it more relevant.

Suggested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 2dd0df84 03-Apr-2018 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Drop cpufreq_table_validate_and_show()

This isn't used anymore. Remove the helper and update documentation
accordingly.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# d417e069 21-Feb-2018 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Validate frequency table in the core

By design, cpufreq drivers are responsible for calling
cpufreq_frequency_table_cpuinfo() from their ->init()
callbacks to validate the frequency table.

However, if a cpufreq driver is buggy and fails to do so properly, it
lead to unexpected behavior of the driver or the cpufreq core at a
later point in time. It would be better if the core could
validate the frequency table during driver initialization.

To that end, introduce cpufreq_table_validate_and_sort() and make
the cpufreq core call it right after invoking the ->init() callback
of the driver and destroy the cpufreq policy if the table is invalid.

For the time being the validation of the table happens twice, once
from the driver and then from the core. The individual drivers will
be updated separately to drop table validation if they don't need it
for other reasons.

The frequency table is marked "sorted" or "unsorted" by the new helper
now instead of in cpufreq_table_validate_and_show(), as it should only
be done after validating the table (which the drivers won't do going
forward).

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
[ rjw: Subject/changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# ffd81dcf 29-Jan-2018 Dominik Brodowski <linux@dominikbrodowski.net>

cpufreq: Add and use cpufreq_for_each_{valid_,}entry_idx()

Pointer subtraction is slow and tedious. Therefore, replace all instances
where cpufreq_for_each_{valid_,}entry loops contained such substractions
with an iteration macro providing an index to the frequency_table entry.

Suggested-by: Al Viro <viro@ZenIV.linux.org.uk>
Link: http://lkml.kernel.org/r/20180120020237.GM13338@ZenIV.linux.org.uk
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 7d5905dc 14-Nov-2017 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

x86 / CPU: Always show current CPU frequency in /proc/cpuinfo

After commit 890da9cf0983 (Revert "x86: do not use cpufreq_quick_get()
for /proc/cpuinfo "cpu MHz"") the "cpu MHz" number in /proc/cpuinfo
on x86 can be either the nominal CPU frequency (which is constant)
or the frequency most recently requested by a scaling governor in
cpufreq, depending on the cpufreq configuration. That is somewhat
inconsistent and is different from what it was before 4.13, so in
order to restore the previous behavior, make it report the current
CPU frequency like the scaling_cur_freq sysfs file in cpufreq.

To that end, modify the /proc/cpuinfo implementation on x86 to use
aperfmperf_snapshot_khz() to snapshot the APERF and MPERF feedback
registers, if available, and use their values to compute the CPU
frequency to be reported as "cpu MHz".

However, do that carefully enough to avoid accumulating delays that
lead to unacceptable access times for /proc/cpuinfo on systems with
many CPUs. Run aperfmperf_snapshot_khz() once on all CPUs
asynchronously at the /proc/cpuinfo open time, add a single delay
upfront (if necessary) at that point and simply compute the current
frequency while running show_cpuinfo() for each individual CPU.

Also, to avoid slowing down /proc/cpuinfo accesses too much, reduce
the default delay between consecutive APERF and MPERF reads to 10 ms,
which should be sufficient to get large enough numbers for the
frequency computation in all cases.

Fixes: 890da9cf0983 (Revert "x86: do not use cpufreq_quick_get() for /proc/cpuinfo "cpu MHz"")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Ingo Molnar <mingo@kernel.org>


# e7d5459d 26-Sep-2017 Dietmar Eggemann <dietmar.eggemann@arm.com>

cpufreq: provide default frequency-invariance setter function

Frequency-invariant accounting support based on the ratio of current
frequency and maximum supported frequency is an optional feature an arch
can implement.

Since there are cpufreq drivers (e.g. cpufreq-dt) which can be build for
different arch's a default implementation of the frequency-invariance
setter function arch_set_freq_scale() is needed.

This default implementation is an empty weak function which will be
overwritten by a strong function in case the arch provides one.

The setter function passes the cpumask of related (to the frequency
change) cpus (online and offline cpus), the (new) current frequency and
the maximum supported frequency.

Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# d6344d4b 04-Aug-2017 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq: Simplify cpufreq_can_do_remote_dvfs()

The if () in cpufreq_can_do_remote_dvfs() is superfluous, so drop
it and simply return the value of the expression under it.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 99d14d0e 27-Jul-2017 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Process remote callbacks from any CPU if the platform permits

On many platforms, CPUs can do DVFS across cpufreq policies. i.e CPU
from policy-A can change frequency of CPUs belonging to policy-B.

This is quite common in case of ARM platforms where we don't
configure any per-cpu register.

Add a flag to identify such platforms and update
cpufreq_can_do_remote_dvfs() to allow remote callbacks if this flag is
set.

Also enable the flag for cpufreq-dt driver which is used only on ARM
platforms currently.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Saravana Kannan <skannan@codeaurora.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 674e7541 27-Jul-2017 Viresh Kumar <viresh.kumar@linaro.org>

sched: cpufreq: Allow remote cpufreq callbacks

With Android UI and benchmarks the latency of cpufreq response to
certain scheduling events can become very critical. Currently, callbacks
into cpufreq governors are only made from the scheduler if the target
CPU of the event is the same as the current CPU. This means there are
certain situations where a target CPU may not run the cpufreq governor
for some time.

One testcase to show this behavior is where a task starts running on
CPU0, then a new task is also spawned on CPU0 by a task on CPU1. If the
system is configured such that the new tasks should receive maximum
demand initially, this should result in CPU0 increasing frequency
immediately. But because of the above mentioned limitation though, this
does not occur.

This patch updates the scheduler core to call the cpufreq callbacks for
remote CPUs as well.

The schedutil, ondemand and conservative governors are updated to
process cpufreq utilization update hooks called for remote CPUs where
the remote CPU is managed by the cpufreq policy of the local CPU.

The intel_pstate driver is updated to always reject remote callbacks.

This is tested with couple of usecases (Android: hackbench, recentfling,
galleryfling, vellamo, Ubuntu: hackbench) on ARM hikey board (64 bit
octa-core, single policy). Only galleryfling showed minor improvements,
while others didn't had much deviation.

The reason being that this patch only targets a corner case, where
following are required to be true to improve performance and that
doesn't happen too often with these tests:

- Task is migrated to another CPU.
- The task has high demand, and should take the target CPU to higher
OPPs.
- And the target CPU doesn't call into the cpufreq governor until the
next tick.

Based on initial work from Steve Muckle.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Saravana Kannan <skannan@codeaurora.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# fe829ed8 19-Jul-2017 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Add CPUFREQ_NO_AUTO_DYNAMIC_SWITCHING cpufreq driver flag

The policy->transition_latency field is used for multiple purposes
today and its not straight forward at all. This is how it is used:

A. Set the correct transition_latency value.

B. Set it to CPUFREQ_ETERNAL because:
1. We don't want automatic dynamic switching (with
ondemand/conservative) to happen at all.
2. We don't know the transition latency.

This patch handles the B.1. case in a more readable way. A new flag for
the cpufreq drivers is added to disallow use of cpufreq governors which
have dynamic_switching flag set.

All the current cpufreq drivers which are setting transition_latency
unconditionally to CPUFREQ_ETERNAL are updated to use it. They don't
need to set transition_latency anymore.

There shouldn't be any functional change after this patch.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# ed4676e2 19-Jul-2017 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Replace "max_transition_latency" with "dynamic_switching"

There is no limitation in the ondemand or conservative governors which
disallow the transition_latency to be greater than 10 ms.

The max_transition_latency field is rather used to disallow automatic
dynamic frequency switching for platforms which didn't wanted these
governors to run.

Replace max_transition_latency with a boolean (dynamic_switching) and
check for transition_latency == CPUFREQ_ETERNAL along with that. This
makes it pretty straight forward to read/understand now.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# aa7519af 19-Jul-2017 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Use transition_delay_us for legacy governors as well

The policy->transition_delay_us field is used only by the schedutil
governor currently, and this field describes how fast the driver wants
the cpufreq governor to change CPUs frequency. It should rather be a
common thing across all governors, as it doesn't have any schedutil
dependency here.

Create a new helper cpufreq_policy_transition_delay_us() to get the
transition delay across all governors.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 2d045036 19-Jul-2017 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: governor: Drop min_sampling_rate

The cpufreq core and governors aren't supposed to set a limit on how
fast we want to try changing the frequency. This is currently done for
the legacy governors with help of min_sampling_rate.

At worst, we may end up setting the sampling rate to a value lower than
the rate at which frequency can be changed and then one of the CPUs in
the policy will be only changing frequency for ever.

But that is something for the user to decide and there is no need to
have special handling for such cases in the core. Leave it for the user
to figure out.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# f8475cef 23-Jun-2017 Len Brown <len.brown@intel.com>

x86: use common aperfmperf_khz_on_cpu() to calculate KHz using APERF/MPERF

The goal of this change is to give users a uniform and meaningful
result when they read /sys/...cpufreq/scaling_cur_freq
on modern x86 hardware, as compared to what they get today.

Modern x86 processors include the hardware needed
to accurately calculate frequency over an interval --
APERF, MPERF, and the TSC.

Here we provide an x86 routine to make this calculation
on supported hardware, and use it in preference to any
driver driver-specific cpufreq_driver.get() routine.

MHz is computed like so:

MHz = base_MHz * delta_APERF / delta_MPERF

MHz is the average frequency of the busy processor
over a measurement interval. The interval is
defined to be the time between successive invocations
of aperfmperf_khz_on_cpu(), which are expected to to
happen on-demand when users read sysfs attribute
cpufreq/scaling_cur_freq.

As with previous methods of calculating MHz,
idle time is excluded.

base_MHz above is from TSC calibration global "cpu_khz".

This x86 native method to calculate MHz returns a meaningful result
no matter if P-states are controlled by hardware or firmware
and/or if the Linux cpufreq sub-system is or is-not installed.

When this routine is invoked more frequently, the measurement
interval becomes shorter. However, the code limits re-computation
to 10ms intervals so that average frequency remains meaningful.

Discerning users are encouraged to take advantage of
the turbostat(8) utility, which can gracefully handle
concurrent measurement intervals of arbitrary length.

Signed-off-by: Len Brown <len.brown@intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 55d85293 25-Apr-2017 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: create cpufreq_table_count_valid_entries()

We need such a routine at two places already, lets create one.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Tested-by: Lukasz Luba <lukasz.luba@arm.com>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>


# 1b72e7fd 10-Apr-2017 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq: schedutil: Use policy-dependent transition delays

Make the schedutil governor take the initial (default) value of the
rate_limit_us sysfs attribute from the (new) transition_delay_us
policy parameter (to be set by the scaling driver).

That will allow scaling drivers to make schedutil use smaller default
values of rate_limit_us and reduce the default average time interval
between consecutive frequency changes.

Make intel_pstate set transition_delay_us to 500.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>


# 565ebe80 03-Feb-2017 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Fix typos in comments

- s/freqnency/frequency/
- s/accomodating/accommodating/

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 052f573f 04-Jan-2017 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Remove CPUFREQ_START notifier event

Its not used anymore, remove it.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# f9f41e3e 04-Jan-2017 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Remove policy create/remove notifiers

Those were added by:

commit fcd7af917abb ("cpufreq: stats: handle cpufreq_unregister_driver()
and suspend/resume properly")

but aren't used anymore since:

commit 1aefc75b2449 ("cpufreq: stats: Make the stats code non-modular").

Remove them. Also remove the redundant parameter to the respective
routines.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 30248fef 18-Nov-2016 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq: Make cpufreq_update_policy() void

The return value of cpufreq_update_policy() is never used, so make
it void.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>


# ee7930ee 07-Nov-2016 Markus Mayer <mmayer@broadcom.com>

cpufreq: stats: New sysfs attribute for clearing statistics

Allow CPUfreq statistics to be cleared by writing anything to
/sys/.../cpufreq/stats/reset.

Signed-off-by: Markus Mayer <mmayer@broadcom.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# c6fe46a7 17-Oct-2016 Sergey Senozhatsky <sergey.senozhatsky@gmail.com>

cpufreq: fix overflow in cpufreq_table_find_index_dl()

'best' is always less or equals to 'pos', so `best - pos' returns
a negative value which is then getting casted to `unsigned int'
and passed to __cpufreq_driver_target()->acpi_cpufreq_target()
for policy->freq_table selection. This results in

BUG: unable to handle kernel paging request at ffff881019b469f8
IP: [<ffffffffa00356c1>] acpi_cpufreq_target+0x4f/0x190 [acpi_cpufreq]
PGD 267f067
PUD 0

Oops: 0000 [#1] PREEMPT SMP
CPU: 6 PID: 70 Comm: kworker/6:1 Not tainted 4.9.0-rc1-next-20161017-dbg-dirty
Workqueue: events dbs_work_handler
task: ffff88041b808000 task.stack: ffff88041b810000
RIP: 0010:[<ffffffffa00356c1>] [<ffffffffa00356c1>] acpi_cpufreq_target+0x4f/0x190 [acpi_cpufreq]
RSP: 0018:ffff88041b813c60 EFLAGS: 00010282
RAX: ffff880419b46a00 RBX: ffff88041b848400 RCX: ffff880419b20f80
RDX: 00000000001dff38 RSI: 00000000ffffffff RDI: ffff88041b848400
RBP: ffff88041b813cb0 R08: 0000000000000006 R09: 0000000000000040
R10: ffffffff8207f9e0 R11: ffffffff8173595b R12: 0000000000000000
R13: ffff88041f1dff38 R14: 0000000000262900 R15: 0000000bfffffff4
FS: 0000000000000000(0000) GS:ffff88041f000000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffff881019b469f8 CR3: 000000041a2d3000 CR4: 00000000001406e0
Stack:
ffff88041b813cb0 ffffffff813347f9 ffff88041b813ca0 ffffffff81334663
ffff88041f1d4bc0 ffff88041b848400 0000000000000000 0000000000000000
0000000000262900 0000000000000000 ffff88041b813d00 ffffffff813355dc
Call Trace:
[<ffffffff813347f9>] ? cpufreq_freq_transition_begin+0xf1/0xfc
[<ffffffff81334663>] ? get_cpu_idle_time+0x97/0xa6
[<ffffffff813355dc>] __cpufreq_driver_target+0x3b6/0x44e
[<ffffffff81336ca3>] cs_dbs_timer+0x11a/0x135
[<ffffffff81336fda>] dbs_work_handler+0x39/0x62
[<ffffffff81057823>] process_one_work+0x280/0x4a5
[<ffffffff81058719>] worker_thread+0x24f/0x397
[<ffffffff810584ca>] ? rescuer_thread+0x30b/0x30b
[<ffffffff81418380>] ? nl80211_get_key+0x29/0x36a
[<ffffffff8105d2b7>] kthread+0xfc/0x104
[<ffffffff8107ceea>] ? put_lock_stats.isra.9+0xe/0x20
[<ffffffff8105d1bb>] ? kthread_create_on_node+0x3f/0x3f
[<ffffffff814b2092>] ret_from_fork+0x22/0x30
Code: 56 4d 6b ff 0c 41 55 41 54 53 48 83 ec 28 48 8b 15 ad 1e 00 00 44 8b 41
08 48 8b 87 c8 00 00 00 49 89 d5 4e 03 2c c5 80 b2 78 81 <46> 8b 74 38 04 45
3b 75 00 75 11 31 c0 83 39 00 0f 84 1c 01 00
RIP [<ffffffffa00356c1>] acpi_cpufreq_target+0x4f/0x190 [acpi_cpufreq]
RSP <ffff88041b813c60>
CR2: ffff881019b469f8
---[ end trace 16d9fc7a17897d37 ]---

[ rjw: In some cases this bug may also cause incorrect frequencies to
be selected by cpufreq governors. ]

Fixes: 899bb6642f2a (cpufreq: skip invalid entries when searching the frequency)
Link: http://marc.info/?l=linux-kernel&m=147672030714331&w=2
Reported-and-tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Reported-and-tested-by: Jörg Otte <jrg.otte@gmail.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: 4.8+ <stable@vger.kernel.org> # 4.8+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 899bb664 11-Oct-2016 Aaro Koskinen <aaro.koskinen@iki.fi>

cpufreq: skip invalid entries when searching the frequency

Skip invalid entries when searching the frequency. This fixes cpufreq
at least on loongson2 MIPS board.

Fixes: da0c6dc00c69 (cpufreq: Handle sorted frequency tables more efficiently)
Signed-off-by: Aaro Koskinen <aaro.koskinen@iki.fi>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: 4.8+ <stable@vger.kernel.org> # 4.8+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# e3c06236 13-Jul-2016 Steve Muckle <steve.muckle@linaro.org>

cpufreq: add cpufreq_driver_resolve_freq()

Cpufreq governors may need to know what a particular target frequency
maps to in the driver without necessarily wanting to set the frequency.
Support this operation via a new cpufreq API,
cpufreq_driver_resolve_freq(). This API returns the lowest driver
frequency equal or greater than the target frequency
(CPUFREQ_RELATION_L), subject to any policy (min/max) or driver
limitations. The mapping is also cached in the policy so that a
subsequent fast_switch operation can avoid repeating the same lookup.

The API will call a new cpufreq driver callback, resolve_freq(), if it
has been registered by the driver. Otherwise the frequency is resolved
via cpufreq_frequency_table_target(). Rather than require ->target()
style drivers to provide a resolve_freq() callback it is left to the
caller to ensure that the driver implements this callback if necessary
to use cpufreq_driver_resolve_freq().

Suggested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Steve Muckle <smuckle@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# da0c6dc0 27-Jun-2016 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Handle sorted frequency tables more efficiently

cpufreq drivers aren't required to provide a sorted frequency table
today, and even the ones which provide a sorted table aren't handled
efficiently by cpufreq core.

This patch adds infrastructure to verify if the freq-table provided by
the drivers is sorted or not, and use efficient helpers if they are
sorted.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# d218ed77 02-Jun-2016 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Return index from cpufreq_frequency_table_target()

This routine can't fail unless the frequency table is invalid and
doesn't contain any valid entries.

Make it return the index and WARN() in case it is used for an invalid
table.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 7ab4aabb 02-Jun-2016 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Drop freq-table param to cpufreq_frequency_table_target()

The policy already has this pointer set, use it instead.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# f8bfc116 02-Jun-2016 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Remove cpufreq_frequency_get_table()

Most of the callers of cpufreq_frequency_get_table() already have the
pointer to a valid 'policy' structure and they don't really need to go
through the per-cpu variable first and then a check to validate the
frequency, in order to find the freq-table for the policy.

Directly use the policy->freq_table field instead for them.

Only one user of that API is left after above changes, cpu_cooling.c and
it accesses the freq_table in a racy way as the policy can get freed in
between.

Fix it by using cpufreq_cpu_get() properly.

Since there are no more users of cpufreq_frequency_get_table() left, get
rid of it.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Javi Merino <javi.merino@arm.com> (cpu_cooling.c)
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 1aefc75b 31-May-2016 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq: stats: Make the stats code non-modular

The modularity of cpufreq_stats is quite problematic.

First off, the usage of policy notifiers for the initialization
and cleanup in the cpufreq_stats module is inherently racy with
respect to CPU offline/online and the initialization and cleanup
of the cpufreq driver.

Second, fast frequency switching (used by the schedutil governor)
cannot be enabled if any transition notifiers are registered, so
if the cpufreq_stats module (that registers a transition notifier
for updating transition statistics) is loaded, the schedutil governor
cannot use fast frequency switching.

On the other hand, allowing cpufreq_stats to be built as a module
doesn't really add much value. Arguably, there's not much reason
for that code to be modular at all.

For the above reasons, make the cpufreq stats code non-modular,
modify the core to invoke functions provided by that code directly
and drop the notifiers from it.

Make the stats sysfs attributes appear empty if fast frequency
switching is enabled as the statistics will not be updated in that
case anyway (and returning -EBUSY from those attributes breaks
powertop).

While at it, clean up Kconfig help for the CPU_FREQ_STAT and
CPU_FREQ_STAT_DETAILS options.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>


# 9a15fb2c 18-May-2016 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq: Drop the 'initialized' field from struct cpufreq_governor

The 'initialized' field in struct cpufreq_governor is only used by
the conservative governor (as a usage counter) and the way that
happens is far from straightforward and arguably incorrect.

Namely, the value of 'initialized' is checked by
cpufreq_dbs_governor_init() and cpufreq_dbs_governor_exit() and
the results of those checks are passed (as the second argument) to
the ->init() and ->exit() callbacks in struct dbs_governor. Those
callbacks are only implemented by the ondemand and conservative
governors and ondemand doesn't use their second argument at all.
In turn, the conservative governor uses it to decide whether or not
to either register or unregister a transition notifier.

That whole mechanism is not only unnecessarily convoluted, but also
racy, because the 'initialized' field of struct cpufreq_governor is
updated in cpufreq_init_governor() and cpufreq_exit_governor() under
policy->rwsem which doesn't help if one of these functions is run
twice in parallel for different policies (which isn't impossible in
principle), for example.

Instead of it, add a proper usage counter to the conservative
governor and update it from cs_init() and cs_exit() which is
guaranteed to be non-racy, as those functions are only called
under gov_dbs_data_mutex which is global.

With that in place, drop the 'initialized' field from struct
cpufreq_governor as it is not used any more.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>


# bf2be2de 18-May-2016 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: governor: Create cpufreq_policy_apply_limits()

Create a new helper to avoid code duplication across governors.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# e788892b 02-Jun-2016 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq: governor: Get rid of governor events

The design of the cpufreq governor API is not very straightforward,
as struct cpufreq_governor provides only one callback to be invoked
from different code paths for different purposes. The purpose it is
invoked for is determined by its second "event" argument, causing it
to act as a "callback multiplexer" of sorts.

Unfortunately, that leads to extra complexity in governors, some of
which implement the ->governor() callback as a switch statement
that simply checks the event argument and invokes a separate function
to handle that specific event.

That extra complexity can be eliminated by replacing the all-purpose
->governor() callback with a family of callbacks to carry out specific
governor operations: initialization and exit, start and stop and policy
limits updates. That also turns out to reduce the code size too, so
do it.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>


# 6c9d9c81 07-Apr-2016 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq: Call cpufreq_disable_fast_switch() in sugov_exit()

Due to differences in the cpufreq core's handling of runtime CPU
offline and nonboot CPUs disabling during system suspend-to-RAM,
fast frequency switching gets disabled after a suspend-to-RAM and
resume cycle on all of the nonboot CPUs.

To prevent that from happening, move the invocation of
cpufreq_disable_fast_switch() from cpufreq_exit_governor() to
sugov_exit(), as the schedutil governor is the only user of fast
frequency switching today anyway.

That simply prevents cpufreq_disable_fast_switch() from being called
without invoking the ->governor callback for the CPUFREQ_GOV_POLICY_EXIT
event (which happens during system suspend now).

Fixes: b7898fda5bc7 (cpufreq: Support for fast frequency switching)
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>


# b7898fda 29-Mar-2016 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq: Support for fast frequency switching

Modify the ACPI cpufreq driver to provide a method for switching
CPU frequencies from interrupt context and update the cpufreq core
to support that method if available.

Introduce a new cpufreq driver callback, ->fast_switch, to be
invoked for frequency switching from interrupt context by (future)
governors supporting that feature via (new) helper function
cpufreq_driver_fast_switch().

Add two new policy flags, fast_switch_possible, to be set by the
cpufreq driver if fast frequency switching can be used for the
given policy and fast_switch_enabled, to be set by the governor
if it is going to use fast frequency switching for the given
policy. Also add a helper for setting the latter.

Since fast frequency switching is inherently incompatible with
cpufreq transition notifiers, make it possible to set the
fast_switch_enabled only if there are no transition notifiers
already registered and make the registration of new transition
notifiers fail if fast_switch_enabled is set for at least one
policy.

Implement the ->fast_switch callback in the ACPI cpufreq driver
and make it set fast_switch_possible during policy initialization
as appropriate.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>


# 379480d8 21-Mar-2016 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq: Move governor symbols to cpufreq.h

Move definitions of symbols related to transition latency and
sampling rate to include/linux/cpufreq.h so they can be used by
(future) goverernors located outside of drivers/cpufreq/.

No functional changes.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>


# 66893b6a 21-Mar-2016 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq: Move governor attribute set headers to cpufreq.h

Move definitions and function headers related to struct gov_attr_set
to include/linux/cpufreq.h so they can be used by (future) goverernors
located outside of drivers/cpufreq/.

No functional changes.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>


# adaf9fcd 10-Mar-2016 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq: Move scheduler-related code to the sched directory

Create cpufreq.c under kernel/sched/ and move the cpufreq code
related to the scheduler to that file and to sched.h.

Redefine cpufreq_update_util() as a static inline function to avoid
function calls at its call sites in the scheduler code (as suggested
by Peter Zijlstra).

Also move the definition of struct update_util_data and declaration
of cpufreq_set_update_util_data() from include/linux/cpufreq.h to
include/linux/sched.h.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>


# 242aa883 22-Feb-2016 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Remove 'policy->governor_enabled'

The entire sequence of events (like INIT/START or STOP/EXIT) for which
cpufreq_governor() is called, is guaranteed to be protected by
policy->rwsem now.

The additional checks that were added earlier (as we were forced to drop
policy->rwsem before calling cpufreq_governor() for EXIT event), aren't
required anymore.

Over that, they weren't sufficient really. They just take care of
START/STOP events, but not INIT/EXIT and the state machine was never
maintained properly by them.

Kill the unnecessary checks and policy->governor_enabled field.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 68e80dae 08-Feb-2016 Viresh Kumar <viresh.kumar@linaro.org>

Revert "cpufreq: Drop rwsem lock around CPUFREQ_GOV_POLICY_EXIT"

Earlier, when the struct freq-attr was used to represent governor
attributes, the standard cpufreq show/store sysfs attribute callbacks
were applied to the governor tunable attributes and they always acquire
the policy->rwsem lock before carrying out the operation. That could
have resulted in an ABBA deadlock if governor tunable attributes are
removed under policy->rwsem while one of them is being accessed
concurrently (if sysfs attributes removal wins the race, it will wait
for the access to complete with policy->rwsem held while the attribute
callback will block on policy->rwsem indefinitely).

We attempted to address this issue by dropping policy->rwsem around
governor tunable attributes removal (that is, around invocations of the
->governor callback with the event arg equal to CPUFREQ_GOV_POLICY_EXIT)
in cpufreq_set_policy(), but that opened up race conditions that had not
been possible with policy->rwsem held all the time.

The previous commit, "cpufreq: governor: New sysfs show/store callbacks
for governor tunables", fixed the original ABBA deadlock by adding new
governor specific show/store callbacks.

We don't have to drop rwsem around invocations of governor event
CPUFREQ_GOV_POLICY_EXIT anymore, and original fix can be reverted now.

Fixes: 955ef4833574 (cpufreq: Drop rwsem lock around CPUFREQ_GOV_POLICY_EXIT)
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reported-by: Juri Lelli <juri.lelli@arm.com>
Tested-by: Juri Lelli <juri.lelli@arm.com>
Tested-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 34e2c555 15-Feb-2016 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq: Add mechanism for registering utilization update callbacks

Introduce a mechanism by which parts of the cpufreq subsystem
("setpolicy" drivers or the core) can register callbacks to be
executed from cpufreq_update_util() which is invoked by the
scheduler's update_load_avg() on CPU utilization changes.

This allows the "setpolicy" drivers to dispense with their timers
and do all of the computations they need and frequency/voltage
adjustments in the update_load_avg() code path, among other things.

The update_load_avg() changes were suggested by Peter Zijlstra.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Ingo Molnar <mingo@kernel.org>


# 34b08705 25-Feb-2016 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq: Simplify the cpufreq_for_each_valid_entry()

That macro uses an internal static inline function that is first
totally unnecessary and second hard to read, so simplify it and
get rid of that monster.

No functional changes.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>


# de1df26b 04-Feb-2016 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq: Clean up default and fallback governor setup

The preprocessor magic used for setting the default cpufreq governor
(and for using the performance governor as a fallback one for that
matter) is really nasty, so replace it with __weak functions and
overrides.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Saravana Kannan <skannan@codeaurora.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>


# 7a6c79f2 26-Dec-2015 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq: Simplify core code related to boost support

Notice that the boost_supported field in struct cpufreq_driver is
redundant, because the driver's ->set_boost callback may be left
unset if "boost" is not supported. Moreover, the only driver
populating the ->set_boost callback is acpi_cpufreq, so make it
avoid populating that callback if "boost" is not supported, rework
the core to check ->set_boost instead of boost_supported to
verify "boost" support and drop boost_supported which isn't
used any more.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>


# 41669da0 26-Dec-2015 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq: Make cpufreq_boost_supported() static

cpufreq_boost_supported() is not used outside of cpufreq.c, so make
it static.

While at it, refactor it as a one-liner (which it really is).

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>


# 69030dd1 01-Dec-2015 Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>

cpufreq: use last policy after online for drivers with ->setpolicy

For cpufreq drivers which use setpolicy interface, after offline->online
the policy is set to default. This can be reproduced by setting the
default policy of intel_pstate or longrun to ondemand and then change to
"performance". After offline and online, the setpolicy will be called with
the policy=ondemand.

For drivers using governors this condition is handled by storing
last_governor, during offline and restoring during online. The same should
be done for drivers using setpolicy interface. Storing last_policy during
offline and restoring during online.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 96bdda61 15-Oct-2015 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: create cpu/cpufreq/policyX directories

The cpufreq sysfs interface had been a bit inconsistent as one of the
CPUs for a policy had a real directory within its sysfs 'cpuX' directory
and all other CPUs had links to it. That also made the code a bit
complex as we need to take care of moving the sysfs directory if the CPU
containing the real directory is getting physically hot-unplugged.

Solve this by creating 'policyX' directories (per-policy) in
/sys/devices/system/cpu/cpufreq/ directory, where X is the CPU for which
the policy was first created.

This also removes the need of keeping kobj_cpu and we can remove it now.

Suggested-by: Saravana Kannan <skannan@codeaurora.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Saravana Kannan <skannan@codeaurora.org>
Acked-by: is more of a general agreement from the person that he is
Reviewed-by: is a more strict tag and implies that the reviewer has
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# c82bd444 15-Oct-2015 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: remove cpufreq_sysfs_{create|remove}_file()

They don't do anything special now, remove the unnecessary wrapper.

Reviewed-by: Saravana Kannan <skannan@codeaurora.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 8eec1020 15-Oct-2015 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: create cpu/cpufreq at boot time

Later patches will need to create policy specific directories in
/sys/devices/system/cpu/cpufreq/ directory and so the cpufreq directory
wouldn't be ever empty.

And so no fun creating/destroying it on need basis anymore. Create it
once on system boot.

Reviewed-by: Saravana Kannan <skannan@codeaurora.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 1f0bd44e 15-Sep-2015 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq: acpi-cpufreq: Use cpufreq_cpu_get_raw() in ->get()

cpufreq_cpu_get() called by get_cur_freq_on_cpu() is overkill,
because the ->get() callback is always invoked in a context in
which all of the conditions checked by cpufreq_cpu_get() are
guaranteed to be satisfied.

Use cpufreq_cpu_get_raw() instead of it and drop the
corresponding cpufreq_cpu_put() from get_cur_freq_on_cpu().

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>


# c7a7b418 02-Aug-2015 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: rename cpufreq_real_policy as cpufreq_user_policy

Its all about caching min/max freq requested by userspace, and
the name 'cpufreq_real_policy' doesn't fit that well. Rename it to
cpufreq_user_policy.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 88dc4384 02-Aug-2015 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: remove redundant 'policy' field from user_policy

Its always same as policy->policy, and there is no need to keep another
copy of it. Remove it.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# e27f8bd2 02-Aug-2015 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: remove redundant 'governor' field from user_policy

Its always same as policy->governor, and there is no need to keep
another copy of it. Remove it.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 6bfb7c74 02-Aug-2015 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: remove redundant CPUFREQ_INCOMPATIBLE notifier event

What's being done from CPUFREQ_INCOMPATIBLE, can also be done with
CPUFREQ_ADJUST. There is nothing special with CPUFREQ_INCOMPATIBLE
notifier.

Kill CPUFREQ_INCOMPATIBLE and fix its usage sites.

This also updates the numbering of notifier events to remove holes.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 21c36d35 07-Aug-2015 Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>

cpufreq-dt: make scaling_boost_freqs sysfs attr available when boost is enabled

Make scaling_boost_freqs sysfs attribute is available when
cpufreq-dt driver is used and boost support is enabled.

Suggested-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 44139ed4 29-Jul-2015 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Allow drivers to enable boost support after registering driver

In some cases it wouldn't be known at time of driver registration, if
the driver needs to support boost frequencies.

For example, while getting boost information from DT with opp-v2
bindings, we need to parse the bindings for all the CPUs to know if
turbo/boost OPPs are supported or not.

One way out to do that efficiently is to delay supporting boost mode
(i.e. creating /sys/devices/system/cpu/cpufreq/boost file), until the
time OPP bindings are parsed.

At that point, the driver can enable boost support. This can be done at
->init(), where the frequency table is created.

To do that, the driver requires few APIs from cpufreq core that let him
do this. This patch provides these APIs.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 559ed407 25-Jul-2015 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq: Avoid attempts to create duplicate symbolic links

After commit 87549141d516 (cpufreq: Stop migrating sysfs files on
hotplug) there is a problem with CPUs that share cpufreq policy
objects with other CPUs and are initially offline.

Say CPU1 shares a policy with CPU0 which is online and is registered
first. As part of the registration process, cpufreq_add_dev() is
called for it. It creates the policy object and a symbolic link
to it from the CPU1's sysfs directory. If CPU1 is registered
subsequently and it is offline at that time, cpufreq_add_dev() will
attempt to create a symbolic link to the policy object for it, but
that link is present already, so a warning about that will be
triggered.

To avoid that warning, make cpufreq use an additional CPU mask
containing related CPUs that are actually present for each policy
object. That mask is initialized when the policy object is populated
after its creation (for the first online CPU using it) and it includes
CPUs from the "policy CPUs" mask returned by the cpufreq driver's
->init() callback that are physically present at that time. Symbolic
links to the policy are created only for the CPUs in that mask.

If cpufreq_add_dev() is invoked for an offline CPU, it checks the
new mask and only creates the symlink if the CPU was not in it (the
CPU is added to the mask at the same time).

In turn, cpufreq_remove_dev() drops the given CPU from the new mask,
removes its symlink to the policy object and returns, unless it is
the CPU owning the policy object. In that case, the policy object
is moved to a new CPU's sysfs directory or deleted if the CPU being
removed was the last user of the policy.

While at it, notice that cpufreq_remove_dev() can't fail, because
its return value is ignored, so make it ignore return values from
__cpufreq_remove_dev_prepare() and __cpufreq_remove_dev_finish()
and prevent these functions from aborting on errors returned by
__cpufreq_governor(). Also drop the now unused sif argument from
them.

Fixes: 87549141d516 (cpufreq: Stop migrating sysfs files on hotplug)
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reported-and-tested-by: Russell King <linux@arm.linux.org.uk>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>


# 9d16f207 17-May-2015 Saravana Kannan <skannan@codeaurora.org>

cpufreq: Track cpu managing sysfs kobjects separately

In order to prepare for the next few commits, that will stop migrating
sysfs files on cpu hotplug, this patch starts managing sysfs-cpu
separately.

The behavior is still the same as we are still migrating sysfs files on
hotplug, later commits would change that.

Signed-off-by: Saravana Kannan <skannan@codeaurora.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 4573237b 11-May-2015 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Manage governor usage history with 'policy->last_governor'

History of which governor was used last is common to all CPUs within a
policy and maintaining it per-cpu isn't the best approach for sure.

Apart from wasting memory, this also increases the complexity of
managing this data structure as it has to be updated for all CPUs.

To make that somewhat simpler, lets store this information in a new
field 'last_governor' in struct cpufreq_policy and update it on removal
of last cpu of a policy.

As a side-effect it also solves an old problem, consider a system with
two clusters 0 & 1. And there is one policy per cluster.

Cluster 0: CPU0 and 1.
Cluster 1: CPU2 and 3.

- CPU2 is first brought online, and governor is set to performance
(default as cpufreq_cpu_governor wasn't set).
- Governor is changed to ondemand.
- CPU2 is taken offline and cpufreq_cpu_governor is updated for CPU2.
- CPU3 is brought online.
- Because cpufreq_cpu_governor wasn't set for CPU3, the default governor
performance is picked for CPU3.

This patch fixes the bug as we now have a single variable to update for
policy.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# d9f35446 06-Jan-2015 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: remove CPUFREQ_UPDATE_POLICY_CPU notifications

CPUFREQ_UPDATE_POLICY_CPU notifications were used only from cpufreq-stats which
doesn't use it anymore. Remove them.

This also decrements values of other notification macros defined after
CPUFREQ_UPDATE_POLICY_CPU by 1 to remove gaps. Hopefully all users are using
macro's instead of direct numbers and so they wouldn't break as macro values are
changed now.

Reviewed-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 7c418ff0 06-Jan-2015 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Remove (now) unused 'last_cpu' from struct cpufreq_policy

'last_cpu' was used only from cpufreq-stats and isn't used anymore. Get rid of
it.

Reviewed-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# a9aaf291 12-Jan-2015 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: stats: get rid of per-cpu cpufreq_stats_table

All CPUs sharing a cpufreq policy share stats too. For this reason,
add a stats pointer to struct cpufreq_policy and drop per-CPU variable
cpufreq_stats_table used for accessing cpufreq stats so as to reduce
code complexity.

Reviewed-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 7c45cf31 26-Nov-2014 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Introduce ->ready() callback for cpufreq drivers

Currently there is no callback for cpufreq drivers which is called once the
policy is ready to be used. There are some requirements where such a callback is
required.

One of them is registering a cooling device with the help of
of_cpufreq_cooling_register(). This routine tries to get 'struct cpufreq_policy'
for CPUs which isn't yet initialed at the time ->init() is called and so we face
issues while registering the cooling device.

Because we can't register cooling device from ->init(), we need a callback that
is called after the policy is ready to be used and hence we introduce ->ready()
callback.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Eduardo Valentin <edubezval@gmail.com>
Tested-by: Eduardo Valentin <edubezval@gmail.com>
Reviewed-by: Lukasz Majewski <l.majewski@samsung.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 90452e61 26-Nov-2014 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Fix formatting issues in 'struct cpufreq_driver'

Adding any new callback to 'struct cpufreq_driver' gives following checkpatch
warning:

WARNING: Unnecessary space before function pointer arguments
+ void (*ready) (struct cpufreq_policy *policy);

This is because we have been using a tab spacing between function pointer name
and its arguments and the new one tried to follow that.

Though we normally don't try to fix every checkpatch warning, specially around
formatting issues as that creates unnecessary noise over lists. But I thought we
better fix this so that new additions don't generate these warnings plus it
looks far better/symmetric now.

So, remove these tab spacing issues in 'struct cpufreq_driver' only + fix
alignment of all members.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Eduardo Valentin <edubezval@gmail.com>
Tested-by: Eduardo Valentin <edubezval@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 51315cdf 19-Oct-2014 Thomas Petazzoni <thomas.petazzoni@free-electrons.com>

cpufreq: allow driver-specific data

This commit extends the cpufreq_driver structure with an additional
'void *driver_data' field that can be filled by the ->probe() function
of a cpufreq driver to pass additional custom information to the
driver itself.

A new function called cpufreq_get_driver_data() is added to allow a
cpufreq driver to retrieve those driver data, since they are typically
needed from a cpufreq_policy->init() callback, which does not have
access to the cpufreq_driver structure. This function call is similar
to the existing cpufreq_get_current_driver() function call.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 413fffc3 27-Aug-2014 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Add support for per-policy driver data

Drivers supporting multiple clusters or multiple 'struct cpufreq_policy'
instances may need to keep per-policy data. If the core doesn't provide support
for that, they might do it in the most unoptimized way: 'per-cpu' data.

This patch adds another field in struct cpufreq_policy: 'driver_data'. It isn't
accessed by core and is for driver's internal use only.

Tested-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 5b0c0b16 30-Jun-2014 Stratos Karafotis <stratosk@semaphore.gr>

cpufreq: Introduce new relation for freq selection

Introduce CPUFREQ_RELATION_C for frequency selection.
It selects the frequency with the minimum euclidean distance to target.
In case of equal distance between 2 frequencies, it will select the
greater frequency.

Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 2b1987a9 27-Jun-2014 Brian W Hart <hartb@linux.vnet.ibm.com>

cpufreq: make table sentinel macros unsigned to match use

Commit 5eeaf1f18973 (cpufreq: Fix build error on some platforms that
use cpufreq_for_each_*) moved function cpufreq_next_valid() to a public
header. Warnings are now generated when objects including that header
are built with -Wsign-compare (as an out-of-tree module might be):

.../include/linux/cpufreq.h: In function ‘cpufreq_next_valid’:
.../include/linux/cpufreq.h:519:27: warning: comparison between signed
and unsigned integer expressions [-Wsign-compare]
while ((*pos)->frequency != CPUFREQ_TABLE_END)
^
.../include/linux/cpufreq.h:520:25: warning: comparison between signed
and unsigned integer expressions [-Wsign-compare]
if ((*pos)->frequency != CPUFREQ_ENTRY_INVALID)
^

Constants CPUFREQ_ENTRY_INVALID and CPUFREQ_TABLE_END are signed, but
are used with unsigned member 'frequency' of cpufreq_frequency_table.
Update the macro definitions to be explicitly unsigned to match their
use.

This also corrects potentially wrong behavior of clk_rate_table_iter()
if unsigned long is wider than usigned int.

Fixes: 5eeaf1f18973 (cpufreq: Fix build error on some platforms that use cpufreq_for_each_*)
Signed-off-by: Brian W Hart <hartb@linux.vnet.ibm.com>
Reviewed-by: Simon Horman <horms+renesas@verge.net.au>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 1c03a2d0 02-Jun-2014 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: add support for intermediate (stable) frequencies

Douglas Anderson, recently pointed out an interesting problem due to which
udelay() was expiring earlier than it should.

While transitioning between frequencies few platforms may temporarily switch to
a stable frequency, waiting for the main PLL to stabilize.

For example: When we transition between very low frequencies on exynos, like
between 200MHz and 300MHz, we may temporarily switch to a PLL running at 800MHz.
No CPUFREQ notification is sent for that. That means there's a period of time
when we're running at 800MHz but loops_per_jiffy is calibrated at between 200MHz
and 300MHz. And so udelay behaves badly.

To get this fixed in a generic way, introduce another set of callbacks
get_intermediate() and target_intermediate(), only for drivers with
target_index() and CPUFREQ_ASYNC_NOTIFICATION unset.

get_intermediate() should return a stable intermediate frequency platform wants
to switch to, and target_intermediate() should set CPU to that frequency,
before jumping to the frequency corresponding to 'index'. Core will take care of
sending notifications and driver doesn't have to handle them in
target_intermediate() or target_index().

NOTE: ->target_index() should restore to policy->restore_freq in case of
failures as core would send notifications for that.

Tested-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Doug Anderson <dianders@chromium.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 5eeaf1f1 07-May-2014 Stratos Karafotis <stratosk@semaphore.gr>

cpufreq: Fix build error on some platforms that use cpufreq_for_each_*

On platforms that use cpufreq_for_each_* macros, build fails if
CONFIG_CPU_FREQ=n, e.g. ARM/shmobile/koelsch/non-multiplatform:

drivers/built-in.o: In function `clk_round_parent':
clkdev.c:(.text+0xcf168): undefined reference to `cpufreq_next_valid'
drivers/built-in.o: In function `clk_rate_table_find':
clkdev.c:(.text+0xcf820): undefined reference to `cpufreq_next_valid'
make[3]: *** [vmlinux] Error 1

Fix this making cpufreq_next_valid function inline and move it to
cpufreq.h.

Fixes: 27e289dce297 (cpufreq: Introduce macros for cpufreq_frequency_table iteration)
Reported-and-tested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# a0dd7b79 05-May-2014 Nishanth Menon <nm@ti.com>

PM / OPP: Move cpufreq specific OPP functions out of generic OPP library

CPUFreq specific helper functions for OPP (Operating Performance Points)
now use generic OPP functions that allow CPUFreq to be be moved back
into CPUFreq framework. This allows for independent modifications
or future enhancements as needed isolated to just CPUFreq framework
alone.

Here, we just move relevant code and documentation to make this part of
CPUFreq infrastructure.

Cc: Kevin Hilman <khilman@deeprootsystems.com>
Signed-off-by: Nishanth Menon <nm@ti.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# ca654dc3 04-May-2014 Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>

cpufreq: Catch double invocations of cpufreq_freq_transition_begin/end

Some cpufreq drivers were redundantly invoking the _begin() and _end()
APIs around frequency transitions, and this double invocation (one from
the cpufreq core and the other from the cpufreq driver) used to result
in a self-deadlock, leading to system hangs during boot. (The _begin()
API makes contending callers wait until the previous invocation is
complete. Hence, the cpufreq driver would end up waiting on itself!).

Now all such drivers have been fixed, but debugging this issue was not
very straight-forward (even lockdep didn't catch this). So let us add a
debug infrastructure to the cpufreq core to catch such issues more easily
in the future.

We add a new field called 'transition_task' to the policy structure, to keep
track of the task which is performing the frequency transition. Using this
field, we make note of this task during _begin() and print a warning if we
find a case where the same task is calling _begin() again, before completing
the previous frequency transition using the corresponding _end().

We have left out ASYNC_NOTIFICATION drivers from this debug infrastructure
for 2 reasons:

1. At the moment, we have no way to avoid a particular scenario where this
debug infrastructure can emit false-positive warnings for such drivers.
The scenario is depicted below:

Task A Task B

/* 1st freq transition */
Invoke _begin() {
...
...
}

Change the frequency

/* 2nd freq transition */
Invoke _begin() {
... //waiting for B to
... //finish _end() for
... //the 1st transition
... | Got interrupt for successful
... | change of frequency (1st one).
... |
... | /* 1st freq transition */
... | Invoke _end() {
... | ...
... V }
...
...
}

This scenario is actually deadlock-free because, once Task A changes the
frequency, it is Task B's responsibility to invoke the corresponding
_end() for the 1st frequency transition. Hence it is perfectly legal for
Task A to go ahead and attempt another frequency transition in the meantime.
(Of course it won't be able to proceed until Task B finishes the 1st _end(),
but this doesn't cause a deadlock or a hang).

The debug infrastructure cannot handle this scenario and will treat it as
a deadlock and print a warning. To avoid this, we exclude such drivers
from the purview of this code.

2. Luckily, we don't _need_ this infrastructure for ASYNC_NOTIFICATION drivers
at all! The cpufreq core does not automatically invoke the _begin() and
_end() APIs during frequency transitions in such drivers. Thus, the driver
alone is responsible for invoking _begin()/_end() and hence there shouldn't
be any conflicts which lead to double invocations. So, we can skip these
drivers, since the probability that such drivers will hit this problem is
extremely low, as outlined above.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 27e289dc 25-Apr-2014 Stratos Karafotis <stratosk@semaphore.gr>

cpufreq: Introduce macros for cpufreq_frequency_table iteration

Many cpufreq drivers need to iterate over the cpufreq_frequency_table
for various tasks.

This patch introduces two macros which can be used for iteration over
cpufreq_frequency_table keeping a common coding style across drivers:

- cpufreq_for_each_entry: iterate over each entry of the table
- cpufreq_for_each_valid_entry: iterate over each entry that contains
a valid frequency.

It should have no functional changes.

Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr>
Acked-by: Lad, Prabhakar <prabhakar.csengg@gmail.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 7f4b0461 28-Mar-2014 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: create another field .flags in cpufreq_frequency_table

Currently cpufreq frequency table has two fields: frequency and driver_data.
driver_data is only for drivers' internal use and cpufreq core shouldn't use
it at all. But with the introduction of BOOST frequencies, this assumption
was broken and we started using it as a flag instead.

There are two problems due to this:
- It is against the description of this field, as driver's data is used by
the core now.
- if drivers fill it with -3 for any frequency, then those frequencies are
never considered by cpufreq core as it is exactly same as value of
CPUFREQ_BOOST_FREQ, i.e. ~2.

The best way to get this fixed is by creating another field flags which
will be used for such flags. This patch does that. Along with that various
drivers need modifications due to the change of struct cpufreq_frequency_table.

Reviewed-by: Gautham R Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 236a9800 24-Mar-2014 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Make cpufreq_notify_transition & cpufreq_notify_post_transition static

cpufreq_notify_transition() and cpufreq_notify_post_transition() shouldn't be
called directly by cpufreq drivers anymore and so these should be marked static.

Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 12478cf0 24-Mar-2014 Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>

cpufreq: Make sure frequency transitions are serialized

Whenever we change the frequency of a CPU, we call the PRECHANGE and POSTCHANGE
notifiers. They must be serialized, i.e. PRECHANGE and POSTCHANGE notifiers
should strictly alternate, thereby preventing two different sets of PRECHANGE or
POSTCHANGE notifiers from interleaving arbitrarily.

The following examples illustrate why this is important:

Scenario 1:
-----------
A thread reading the value of cpuinfo_cur_freq, will call
__cpufreq_cpu_get()->cpufreq_out_of_sync()->cpufreq_notify_transition()

The ondemand governor can decide to change the frequency of the CPU at the same
time and hence it can end up sending the notifications via ->target().

If the notifiers are not serialized, the following sequence can occur:
- PRECHANGE Notification for freq A (from cpuinfo_cur_freq)
- PRECHANGE Notification for freq B (from target())
- Freq changed by target() to B
- POSTCHANGE Notification for freq B
- POSTCHANGE Notification for freq A

We can see from the above that the last POSTCHANGE Notification happens for freq
A but the hardware is set to run at freq B.

Where would we break then?: adjust_jiffies() in cpufreq.c & cpufreq_callback()
in arch/arm/kernel/smp.c (which also adjusts the jiffies). All the
loops_per_jiffy calculations will get messed up.

Scenario 2:
-----------
The governor calls __cpufreq_driver_target() to change the frequency. At the
same time, if we change scaling_{min|max}_freq from sysfs, it will end up
calling the governor's CPUFREQ_GOV_LIMITS notification, which will also call
__cpufreq_driver_target(). And hence we end up issuing concurrent calls to
->target().

Typically, platforms have the following logic in their ->target() routines:
(Eg: cpufreq-cpu0, omap, exynos, etc)

A. If new freq is more than old: Increase voltage
B. Change freq
C. If new freq is less than old: decrease voltage

Now, if the two concurrent calls to ->target() are X and Y, where X is trying to
increase the freq and Y is trying to decrease it, we get the following race
condition:

X.A: voltage gets increased for larger freq
Y.A: nothing happens
Y.B: freq gets decreased
Y.C: voltage gets decreased
X.B: freq gets increased
X.C: nothing happens

Thus we can end up setting a freq which is not supported by the voltage we have
set. That will probably make the clock to the CPU unstable and the system might
not work properly anymore.

This patch introduces a set of synchronization primitives to serialize frequency
transitions, which are to be used as shown below:

cpufreq_freq_transition_begin();

//Perform the frequency change

cpufreq_freq_transition_end();

The _begin() call sends the PRECHANGE notification whereas the _end() call sends
the POSTCHANGE notification. Also, all the necessary synchronization is handled
within these calls. In particular, even drivers which set the ASYNC_NOTIFICATION
flag can also use these APIs for performing frequency transitions (ie., you can
call _begin() from one task, and call the corresponding _end() from a different
task).

The actual synchronization underneath is not that complicated:

The key challenge is to allow drivers to begin the transition from one thread
and end it in a completely different thread (this is to enable drivers that do
asynchronous POSTCHANGE notification from bottom-halves, to also use the same
interface).

To achieve this, a 'transition_ongoing' flag, a 'transition_lock' spinlock and a
wait-queue are added per-policy. The flag and the wait-queue are used in
conjunction to create an "uninterrupted flow" from _begin() to _end(). The
spinlock is used to ensure that only one such "flow" is in flight at any given
time. Put together, this provides us all the necessary synchronization.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 367dc4aa 19-Mar-2014 Dirk Brandewie <dirk.j.brandewie@intel.com>

cpufreq: Add stop CPU callback to cpufreq_driver interface

This callback allows the driver to do clean up before the CPU is
completely down and its state cannot be modified. This is used
by the intel_pstate driver to reduce the requested P state prior to
the core going away. This is required because the requested P state
of the offline core is used to select the package P state. This
effectively sets the floor package P state to the requested P state on
the offline core.

Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
[rjw: Minor modifications]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 0b443ead 18-Mar-2014 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: remove unused notifier: CPUFREQ_{SUSPENDCHANGE|RESUMECHANGE}

Two cpufreq notifiers CPUFREQ_RESUMECHANGE and CPUFREQ_SUSPENDCHANGE have
not been used for some time, so remove them to clean up code a bit.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
[rjw: Changelog]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 979d86fa 10-Mar-2014 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Remove cpufreq_generic_exit()

cpufreq_generic_exit() is empty now and can be deleted.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# e0b3165b 10-Mar-2014 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: add 'freq_table' in struct cpufreq_policy

freq table is not per CPU but per policy, so it makes more sense to
keep it within struct cpufreq_policy instead of a per-cpu variable.

This patch does it. Over that, there is no need to set policy->freq_table
to NULL in ->exit(), as policy structure is going to be freed soon.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# e28867ea 03-Mar-2014 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Implement cpufreq_generic_suspend()

Multiple platforms need to set CPUs to a particular frequency before
suspending the system, so provide a common infrastructure for them.

Those platforms only need to point their ->suspend callback pointers
to the generic routine.

Tested-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
[rjw: Changelog]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 2f0aea93 03-Mar-2014 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: suspend governors on system suspend/hibernate

This patch adds cpufreq suspend/resume calls to dpm_{suspend|resume}()
for handling suspend/resume of cpufreq governors.

Lan Tianyu (Intel) & Jinhyuk Choi (Broadcom) found an issue where the
tunables configuration for clusters/sockets with non-boot CPUs was
lost after system suspend/resume, as we were notifying governors with
CPUFREQ_GOV_POLICY_EXIT on removal of the last CPU for that policy
which caused the tunables memory to be freed.

This is fixed by preventing any governor operations from being
carried out between the device suspend and device resume stages of
system suspend and resume, respectively.

We could have added these callbacks at dpm_{suspend|resume}_noirq()
level, but there is an additional problem that the majority of I/O
devices is already suspended at that point and if cpufreq drivers
want to change the frequency before suspending, then that not be
possible on some platforms (which depend on peripherals like i2c,
regulators, etc).

Reported-and-tested-by: Lan Tianyu <tianyu.lan@intel.com>
Reported-by: Jinhyuk Choi <jinchoi@broadcom.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
[rjw: Changelog]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 6f19efc0 20-Dec-2013 Lukasz Majewski <l.majewski@samsung.com>

cpufreq: Add boost frequency support in core

This commit adds boost frequency support in cpufreq core (Hardware &
Software). Some SoCs (like Exynos4 - e.g. 4x12) allow setting frequency
above its normal operation limits. Such mode shall be only used for a
short time.

Overclocking (boost) support is essentially provided by platform
dependent cpufreq driver.

This commit unifies support for SW and HW (Intel) overclocking solutions
in the core cpufreq driver. Previously the "boost" sysfs attribute was
defined in the ACPI processor driver code. By default boost is disabled.
One global attribute is available at: /sys/devices/system/cpu/cpufreq/boost.

It only shows up when cpufreq driver supports overclocking.
Under the hood frequencies dedicated for boosting are marked with a
special flag (CPUFREQ_BOOST_FREQ) at driver's frequency table.
It is the user's concern to enable/disable overclocking with a proper call
to sysfs.

The cpufreq_boost_trigger_state() function is defined non static on purpose.
It is used later with thermal subsystem to provide automatic enable/disable
of the BOOST feature.

Signed-off-by: Lukasz Majewski <l.majewski@samsung.com>
Signed-off-by: Myungjoo Ham <myungjoo.ham@samsung.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 652ed95d 09-Jan-2014 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: introduce cpufreq_generic_get() routine

CPUFreq drivers that use clock frameworks interface,i.e. clk_get_rate(),
to get CPUs clk rate, have similar sort of code used in most of them.

This patch adds a generic ->get() which will do the same thing for them.
All those drivers are required to now is to set .get to cpufreq_generic_get()
and set their clk pointer in policy->clk during ->init().

Acked-by: Hans-Christian Egtvedt <egtvedt@samfundet.no>
Acked-by: Shawn Guo <shawn.guo@linaro.org>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Acked-by: Shawn Guo <shawn.guo@linaro.org>
Acked-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# fcd7af91 06-Jan-2014 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: stats: handle cpufreq_unregister_driver() and suspend/resume properly

There are several problems with cpufreq stats in the way it handles
cpufreq_unregister_driver() and suspend/resume..

- We must not lose data collected so far when suspend/resume happens
and so stats directories must not be removed/allocated during these
operations, which is done currently.

- cpufreq_stat has registered notifiers with both cpufreq and hotplug.
It adds sysfs stats directory with a cpufreq notifier: CPUFREQ_NOTIFY
and removes this directory with a notifier from hotplug core.

In case cpufreq_unregister_driver() is called (on rmmod cpufreq driver),
stats directories per cpu aren't removed as CPUs are still online. The
only call cpufreq_stats gets is cpufreq_stats_update_policy_cpu() for
all CPUs except the last of each policy. And pointer to stat information
is stored in the entry for last CPU in the per-cpu cpufreq_stats_table.
But policy structure would be freed inside cpufreq core and so that will
result in memory leak inside cpufreq stats (as we are never freeing
memory for stats).

Now if we again insert the module cpufreq_register_driver() will be
called and we will again allocate stats data and put it on for first
CPU of every policy. In case we only have a single CPU per policy, we
will return with a error from cpufreq_stats_create_table() due to this
code:

if (per_cpu(cpufreq_stats_table, cpu))
return -EBUSY;

And so probably cpufreq stats directory would not show up anymore (as
it was added inside last policies->kobj which doesn't exist anymore).
I haven't tested it, though. Also the values in stats files wouldn't
be refreshed as we are using the earlier stats structure.

- CPUFREQ_NOTIFY is called from cpufreq_set_policy() which is called for
scenarios where we don't really want cpufreq_stat_notifier_policy() to get
called. For example whenever we are changing anything related to a policy:
min/max/current freq, etc. cpufreq_set_policy() is called and so cpufreq
stats is notified. Where we don't do any useful stuff other than simply
returning with -EBUSY from cpufreq_stats_create_table(). And so this
isn't the right notifier that cpufreq stats..

Due to all above reasons this patch does following changes:
- Add new notifiers CPUFREQ_CREATE_POLICY and CPUFREQ_REMOVE_POLICY,
which are only called when policy is created/destroyed. They aren't
called for suspend/resume paths..
- Use these notifiers in cpufreq_stat_notifier_policy() to create/destory
stats sysfs entries. And so cpufreq_unregister_driver() or suspend/resume
shouldn't be a problem for cpufreq_stats.
- Return early from cpufreq_stat_cpu_callback() for suspend/resume sequence,
so that we don't free stats structure.

Acked-by: Nicolas Pitre <nico@linaro.org>
Tested-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# d3916691 02-Dec-2013 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Make sure CPU is running on a freq from freq-table

Sometimes boot loaders set CPU frequency to a value outside of frequency table
present with cpufreq core. In such cases CPU might be unstable if it has to run
on that frequency for long duration of time and so its better to set it to a
frequency which is specified in freq-table. This also makes cpufreq stats
inconsistent as cpufreq-stats would fail to register because current frequency
of CPU isn't found in freq-table.

Because we don't want this change to affect boot process badly, we go for the
next freq which is >= policy->cur ('cur' must be set by now, otherwise we will
end up setting freq to lowest of the table as 'cur' is initialized to zero).

In case current frequency doesn't match any frequency from freq-table, we throw
warnings to user, so that user can get this fixed in their bootloaders or
freq-tables.

Reported-by: Carlos Hernandez <ceh@ti.com>
Reported-and-tested-by: Nishanth Menon <nm@ti.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# ae6b4271 02-Dec-2013 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Mark ARM drivers with CPUFREQ_NEED_INITIAL_FREQ_CHECK flag

Sometimes boot loaders set CPU frequency to a value outside of frequency table
present with cpufreq core. In such cases CPU might be unstable if it has to run
on that frequency for long duration of time and so its better to set it to a
frequency which is specified in frequency table.

On some systems we can't really say what frequency we're running at the moment
and so for these we shouldn't check if we are running at a frequency present in
frequency table. And so we really can't force this for all the cpufreq drivers.

Hence we are created another flag here: CPUFREQ_NEED_INITIAL_FREQ_CHECK that
will be marked by platforms which want to go for this check at boot time.

Initially this is done for all ARM platforms but others may follow if required.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# f7ba3b41 01-Dec-2013 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Introduce cpufreq_notify_post_transition()

This introduces a new routine cpufreq_notify_post_transition() which
can be used to send POSTCHANGE notification for new freq with or
without both {PRE|POST}CHANGE notifications for last freq. This is
useful at multiple places, especially for sending transition failure
notifications.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 12205a4b 07-Dec-2013 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Revert "cpufreq: suspend governors on system suspend/hibernate"

Commit 5a87182aa21d (cpufreq: suspend governors on system
suspend/hibernate) causes hibernation problems to happen on
Bjørn Mork's and Paul Bolle's systems, so revert it.

Fixes: 5a87182aa21d (cpufreq: suspend governors on system suspend/hibernate)
Reported-by: Bjørn Mork <bjorn@mork.no>
Reported-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 5a87182a 26-Nov-2013 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: suspend governors on system suspend/hibernate

This patch adds cpufreq suspend/resume calls to dpm_{suspend|resume}_noirq()
for handling suspend/resume of cpufreq governors.

Lan Tianyu (Intel) & Jinhyuk Choi (Broadcom) found anr issue where
tunables configuration for clusters/sockets with non-boot CPUs was
getting lost after suspend/resume, as we were notifying governors
with CPUFREQ_GOV_POLICY_EXIT on removal of the last cpu for that
policy and so deallocating memory for tunables. This is fixed by
this patch as we don't allow any operation on governors after
device suspend and before device resume now.

Reported-and-tested-by: Lan Tianyu <tianyu.lan@intel.com>
Reported-by: Jinhyuk Choi <jinchoi@broadcom.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
[rjw: Changelog, minor cleanups]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 7dbf694d 29-Oct-2013 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: distinguish drivers that do asynchronous notifications

There are few special cases like exynos5440 which doesn't send POSTCHANGE
notification from their ->target() routine and call some kind of bottom halves
for doing this work, work/tasklet/etc.. From which they finally send POSTCHANGE
notification.

Its better if we distinguish them from other cpufreq drivers in some way so that
core can handle them specially. So this patch introduces another flag:
CPUFREQ_ASYNC_NOTIFICATION, which will be set by such drivers.

This also changes exynos5440-cpufreq.c and powernow-k8 in order to set this
flag.

Acked-by: Amit Daniel Kachhap <amit.daniel@samsung.com>
Acked-by: Kukjin Kim <kgene.kim@samsung.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# ad7722da 18-Oct-2013 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: create per policy rwsem instead of per CPU cpu_policy_rwsem

We have per-CPU cpu_policy_rwsem for cpufreq core, but we never use
all of them. We always use rwsem of policy->cpu and so we can
actually make this rwsem per policy instead.

This patch does this change. With this change other tricky situations
are also avoided now, like which lock to take while we are changing
policy->cpu, etc.

Suggested-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Tested-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 9c0ebcf7 25-Oct-2013 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Implement light weight ->target_index() routine

Currently, the prototype of cpufreq_drivers target routines is:

int target(struct cpufreq_policy *policy, unsigned int target_freq,
unsigned int relation);

And most of the drivers call cpufreq_frequency_table_target() to get a valid
index of their frequency table which is closest to the target_freq. And they
don't use target_freq and relation after that.

So, it makes sense to just do this work in cpufreq core before calling
cpufreq_frequency_table_target() and simply pass index instead. But this can be
done only with drivers which expose their frequency table with cpufreq core. For
others we need to stick with the old prototype of target() until those drivers
are converted to expose frequency tables.

This patch implements the new light weight prototype for target_index() routine.
It looks like this:

int target_index(struct cpufreq_policy *policy, unsigned int index);

CPUFreq core will call cpufreq_frequency_table_target() before calling this
routine and pass index to it. Because CPUFreq core now requires to call routines
present in freq_table.c CONFIG_CPU_FREQ_TABLE must be enabled all the time.

This also marks target() interface as deprecated. So, that new drivers avoid
using it. And Documentation is updated accordingly.

It also converts existing .target() to newly defined light weight
.target_index() routine for many driver.

Acked-by: Hans-Christian Egtvedt <egtvedt@samfundet.no>
Acked-by: Jesper Nilsson <jesper.nilsson@axis.com>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Acked-by: Russell King <linux@arm.linux.org.uk>
Acked-by: David S. Miller <davem@davemloft.net>
Tested-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rjw@rjwysocki.net>


# c75b505d 08-Oct-2013 Daniel Vetter <daniel.vetter@ffwll.ch>

cpufreq: Add dummy cpufreq_cpu_get/put for CONFIG_CPU_FREQ=n

The drm/i915 driver wants to adjust it's own power policies using the
cpu policies as a guideline (we can implicitly boost the cpus through
the gpus on some platforms). To avoid a dreaded select (since a
depends will leave users wondering where where their driver has gone
too) add dummy functions.

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Cc: kbuild test robot <fengguang.wu@intel.com>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: cpufreq@vger.kernel.org
Cc: linux-pm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 70e9e778 03-Oct-2013 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: create cpufreq_generic_init() routine

Many CPUFreq drivers for SMP system (where all cores share same clock lines), do
similar stuff in their ->init() part.

This patch creates a generic routine in cpufreq core which can be used by these
so that we can remove some redundant code.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 18434512 03-Oct-2013 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: define generic .attr, .exit() and .verify() routines

Most of the CPUFreq drivers do similar things in .exit() and .verify() routines
and .attr. So its better if we have generic routines for them which can be used
by cpufreq drivers then.

This patch introduces generic .attr, .exit() and .verify() cpufreq drivers.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# be49e346 02-Oct-2013 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: add new routine cpufreq_verify_within_cpu_limits()

Most of the users of cpufreq_verify_within_limits() calls it for
limiting with min/max from policy->cpuinfo. We can make that code
simple by introducing another routine which will do this for them
automatically.

This patch adds another routine cpufreq_verify_within_cpu_limits()
and updates others to use it.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 0b981e70 02-Oct-2013 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: use cpufreq_driver->flags to mark CPUFREQ_HAVE_GOVERNOR_PER_POLICY

Use cpufreq_driver->flags to mark CPUFREQ_HAVE_GOVERNOR_PER_POLICY instead
of a separate field within cpufreq_driver. This will save some bytes of
memory.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 6461f018 02-Oct-2013 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: rewrite cpufreq_driver->flags using shift operator

Currently cpufreq_driver's flags are defined directly using 0x1, 0x2, 0x4, 0x8,
etc.. As the list grows it becomes less readable..

Use bitwise shift operator << to generate these numbers for respective
positions.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 27047a60 16-Sep-2013 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Add new helper cpufreq_table_validate_and_show()

Almost every cpufreq driver is required to validate its frequency table with:
cpufreq_frequency_table_cpuinfo() and then expose it to cpufreq core with:
cpufreq_frequency_table_get_attr().

This patch creates another helper routine cpufreq_table_validate_and_show() that
will do both these steps in a single call and will return 0 for success, error
otherwise.

This also fixes potential bugs in cpufreq drivers where people have called
cpufreq_frequency_table_get_attr() before calling
cpufreq_frequency_table_cpuinfo(), as the later may fail.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 798282a8 09-Sep-2013 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Revert "cpufreq: make sure frequency transitions are serialized"

Commit 7c30ed5 (cpufreq: make sure frequency transitions are
serialized) attempted to serialize frequency transitions by
adding checks to the CPUFREQ_PRECHANGE and CPUFREQ_POSTCHANGE
notifications. However, it assumed that the notifications will
always originate from the driver's .target() callback, but they
also can be triggered by cpufreq_out_of_sync() and that leads to
warnings like this on some systems:

WARNING: CPU: 0 PID: 14543 at drivers/cpufreq/cpufreq.c:317
__cpufreq_notify_transition+0x238/0x260()
In middle of another frequency transition

accompanied by a call trace similar to this one:

[<ffffffff81720daa>] dump_stack+0x46/0x58
[<ffffffff8106534c>] warn_slowpath_common+0x8c/0xc0
[<ffffffff815b8560>] ? acpi_cpufreq_target+0x320/0x320
[<ffffffff81065436>] warn_slowpath_fmt+0x46/0x50
[<ffffffff815b1ec8>] __cpufreq_notify_transition+0x238/0x260
[<ffffffff815b33be>] cpufreq_notify_transition+0x3e/0x70
[<ffffffff815b345d>] cpufreq_out_of_sync+0x6d/0xb0
[<ffffffff815b370c>] cpufreq_update_policy+0x10c/0x160
[<ffffffff815b3760>] ? cpufreq_update_policy+0x160/0x160
[<ffffffff81413813>] cpufreq_set_cur_state+0x8c/0xb5
[<ffffffff814138df>] processor_set_cur_state+0xa3/0xcf
[<ffffffff8158e13c>] thermal_cdev_update+0x9c/0xb0
[<ffffffff8159046a>] step_wise_throttle+0x5a/0x90
[<ffffffff8158e21f>] handle_thermal_trip+0x4f/0x140
[<ffffffff8158e377>] thermal_zone_device_update+0x57/0xa0
[<ffffffff81415b36>] acpi_thermal_check+0x2e/0x30
[<ffffffff81415ca0>] acpi_thermal_notify+0x40/0xdc
[<ffffffff813e7dbd>] acpi_device_notify+0x19/0x1b
[<ffffffff813f8241>] acpi_ev_notify_dispatch+0x41/0x5c
[<ffffffff813e3fbe>] acpi_os_execute_deferred+0x25/0x32
[<ffffffff81081060>] process_one_work+0x170/0x4a0
[<ffffffff81082121>] worker_thread+0x121/0x390
[<ffffffff81082000>] ? manage_workers.isra.20+0x170/0x170
[<ffffffff81088fe0>] kthread+0xc0/0xd0
[<ffffffff81088f20>] ? flush_kthread_worker+0xb0/0xb0
[<ffffffff8173582c>] ret_from_fork+0x7c/0xb0
[<ffffffff81088f20>] ? flush_kthread_worker+0xb0/0xb0

For this reason, revert commit 7c30ed5 along with the fix 266c13d
(cpufreq: Fix serialization of frequency transitions) on top of it
and we will revisit the serialization problem later.

Reported-by: Alessandro Bono <alessandro.bono@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 56d07db2 06-Sep-2013 Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>

cpufreq: Remove temporary fix for race between CPU hotplug and sysfs-writes

Commit "cpufreq: serialize calls to __cpufreq_governor()" had been a temporary
and partial solution to the race condition between writing to a cpufreq sysfs
file and taking a CPU offline. Now that we have a proper and complete solution
to that problem, remove the temporary fix.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 19c76303 31-Aug-2013 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: serialize calls to __cpufreq_governor()

We can't take a big lock around __cpufreq_governor() as this causes
recursive locking for some cases. But calls to this routine must be
serialized for every policy. Otherwise we can see some unpredictable
events.

For example, consider following scenario:

__cpufreq_remove_dev()
__cpufreq_governor(policy, CPUFREQ_GOV_STOP);
policy->governor->governor(policy, CPUFREQ_GOV_STOP);
cpufreq_governor_dbs()
case CPUFREQ_GOV_STOP:
mutex_destroy(&cpu_cdbs->timer_mutex)
cpu_cdbs->cur_policy = NULL;
<PREEMPT>
store()
__cpufreq_set_policy()
__cpufreq_governor(policy, CPUFREQ_GOV_LIMITS);
policy->governor->governor(policy, CPUFREQ_GOV_LIMITS);
case CPUFREQ_GOV_LIMITS:
mutex_lock(&cpu_cdbs->timer_mutex); <-- Warning (destroyed mutex)
if (policy->max < cpu_cdbs->cur_policy->cur) <- cur_policy == NULL

And so store() will eventually result in a crash if cur_policy is
NULL at this point.

Introduce an additional variable which would guarantee serialization
here.

Reported-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# adc97d6a 06-Aug-2013 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Drop the owner field from struct cpufreq_driver

We don't need to set .owner = THIS_MODULE any more in cpufreq drivers
as this field isn't used any more by the cpufreq core.

This patch removes it and updates all dependent drivers accordingly.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# c88a1f8b 06-Aug-2013 Lukasz Majewski <l.majewski@samsung.com>

cpufreq: Store cpufreq policies in a list

Policies available in the cpufreq framework are now linked together.
They are accessible via cpufreq_policy_list defined in the cpufreq
core.

[rjw: Fix from Yinghai Lu folded in]
Signed-off-by: Lukasz Majewski <l.majewski@samsung.com>
Signed-off-by: Myungjoo Ham <myungjoo.ham@samsung.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 3a3e9e06 06-Aug-2013 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Give consistent names to cpufreq_policy objects

They are called policy, cur_policy, new_policy, data, etc. Just call
them policy wherever possible.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 74aca95d 06-Aug-2013 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Re-arrange declarations in cpufreq.h

They are pretty much mixed up. Although generic headers are present,
definitions/declarations are present outside of them too ...

This patch just moves stuff up and down to make it look better and
consistent.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 5ff0a268 06-Aug-2013 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Clean up header files included in the core

This patch addresses the following issues in the header files in the
cpufreq core:
- Include headers in ascending order, so that we don't add same
many times by mistake.
- <asm/> must be included after <linux/>, so that they override
whatever they need to.
- Remove unnecessary includes.
- Don't include files already included by cpufreq.h or
cpufreq_governor.h.

[rjw: Changelog]
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# cffe4e0e 05-Jun-2013 Stratos Karafotis <stratosk@semaphore.gr>

cpufreq: Remove unused function __cpufreq_driver_getavg()

The target frequency calculation method in the ondemand governor has
changed and it is now independent of the measured average frequency.
Consequently, the __cpufreq_driver_getavg() function and getavg
member of struct cpufreq_driver are not used any more, so drop them.

[rjw: Changelog]
Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 266c13d7 02-Jul-2013 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Fix serialization of frequency transitions

Commit 7c30ed ("cpufreq: make sure frequency transitions are serialized")
interacts poorly with systems that have a single core freqency for all
cores. On such systems we have a single policy for all cores with
several CPUs. When we do a frequency transition the governor calls the
pre and post change notifiers which causes cpufreq_notify_transition()
per CPU. Since the policy is the same for all of them all CPUs after
the first and the warnings added are generated by checking a per-policy
flag the warnings will be triggered for all cores after the first.

Fix this by allowing notifier to be called for n times. Where n is the number of
cpus in policy->cpus.

Reported-and-tested-by: Mark Brown <broonie@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# f4fd3797 27-Jun-2013 Lan Tianyu <tianyu.lan@intel.com>

acpi-cpufreq: Add new sysfs attribute freqdomain_cpus

Commits fcf8058 (cpufreq: Simplify cpufreq_add_dev()) and aa77a52
(cpufreq: acpi-cpufreq: Don't set policy->related_cpus from .init())
changed the contents of the "related_cpus" sysfs attribute on systems
where acpi-cpufreq is used and user space can't get the list of CPUs
which are in the same hardware coordination CPU domain (provided by
the ACPI AML method _PSD) via "related_cpus" any more.

To make up for that loss add a new sysfs attribute "freqdomian_cpus"
for the acpi-cpufreq driver which exposes the list of CPUs in the
same domain regardless of whether it is coordinated by hardware or
software.

[rjw: Changelog, documentation]
References: https://bugzilla.kernel.org/show_bug.cgi?id=58761
Reported-by: Jean-Philippe Halimi <jean-philippe.halimi@exascale-computing.eu>
Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 7c30ed53 18-Jun-2013 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: make sure frequency transitions are serialized

Whenever we are changing frequency of a cpu, we are calling PRECHANGE and
POSTCHANGE notifiers. They must be serialized. i.e. PRECHANGE or POSTCHANGE
shouldn't be called twice contiguously.

This can happen due to bugs in users of __cpufreq_driver_target() or actual
cpufreq drivers who are sending these notifiers.

This patch adds some protection against this. Now, we keep track of the last
transaction and see if something went wrong.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# bb176f7d 19-Jun-2013 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Fix minor formatting issues

There were a few noticeable formatting issues in core cpufreq code.
This cleans them up to make code look better. The changes include:
- Whitespace cleanup.
- Rearrangements of code.
- Multiline comments fixes.
- Formatting changes to fit 80 columns.

Copyright information in cpufreq.c is also updated to include my name
for 2013.

[rjw: Changelog]
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 95731ebb 19-Jun-2013 Xiaoguang Chen <chenxg@marvell.com>

cpufreq: Fix governor start/stop race condition

Cpufreq governors' stop and start operations should be carried out
in sequence. Otherwise, there will be unexpected behavior, like in
the example below.

Suppose there are 4 CPUs and policy->cpu=CPU0, CPU1/2/3 are linked
to CPU0. The normal sequence is:

1) Current governor is userspace. An application tries to set the
governor to ondemand. It will call __cpufreq_set_policy() in
which it will stop the userspace governor and then start the
ondemand governor.

2) Current governor is userspace. The online of CPU3 runs on CPU0.
It will call cpufreq_add_policy_cpu() in which it will first
stop the userspace governor, and then start it again.

If the sequence of the above two cases interleaves, it becomes:

1) Application stops userspace governor
2) Hotplug stops userspace governor

which is a problem, because the governor shouldn't be stopped twice
in a row. What happens next is:

3) Application starts ondemand governor
4) Hotplug starts a governor

In step 4, the hotplug is supposed to start the userspace governor,
but now the governor has been changed by the application to ondemand,
so the ondemand governor is started once again, which is incorrect.

The solution is to prevent policy governors from being stopped
multiple times in a row. A governor should only be stopped once for
one policy. After it has been stopped, no more governor stop
operations should be executed.

Also add a mutex to serialize governor operations.

[rjw: Changelog. And you owe me a beverage of my choice.]
Signed-off-by: Xiaoguang Chen <chenxg@marvell.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 50701588 30-Mar-2013 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: rename index as driver_data in cpufreq_frequency_table

The "index" field of struct cpufreq_frequency_table was never an
index and isn't used at all by the cpufreq core. It only is useful
for cpufreq drivers for their internal purposes.

Many people nowadays blindly set it in ascending order with the
assumption that the core will use it, which is a mistake.

Rename it to "driver_data" as that's what its purpose is. All of its
users are updated accordingly.

[rjw: Changelog]
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Simon Horman <horms+renesas@verge.net.au>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 2361be23 17-May-2013 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Don't create empty /sys/devices/system/cpu/cpufreq directory

When we don't have any file in cpu/cpufreq directory we shouldn't
create it. Specially with the introduction of per-policy governor
instance patchset, even governors are moved to
cpu/cpu*/cpufreq/governor-name directory and so this directory is
just not required.

Lets have it only when required.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 72a4ce34 17-May-2013 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Move get_cpu_idle_time() to cpufreq.c

Governors other than ondemand and conservative can also use
get_cpu_idle_time() and they aren't required to compile
cpufreq_governor.c. So, move these independent routines to
cpufreq.c instead.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 944e9a03 15-May-2013 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: governors: Move get_governor_parent_kobj() to cpufreq.c

get_governor_parent_kobj() can be used by any governor, generic
cpufreq governors or platform specific ones and so must be present in
cpufreq.c instead of cpufreq_governor.c.

This patch moves it to cpufreq.c. This also adds
EXPORT_SYMBOL_GPL(get_governor_parent_kobj) so that modules can use
this function too.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# b43a7ffb 24-Mar-2013 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Notify all policy->cpus in cpufreq_notify_transition()

policy->cpus contains all online cpus that have single shared clock line. And
their frequencies are always updated together.

Many SMP system's cpufreq drivers take care of this in individual drivers but
the best place for this code is in cpufreq core.

This patch modifies cpufreq_notify_transition() to notify frequency change for
all cpus in policy->cpus and hence updates all users of this API.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Stephen Warren <swarren@nvidia.com>
Tested-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 4d5dcc42 27-Mar-2013 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: governor: Implement per policy instances of governors

Currently, there can't be multiple instances of single governor_type.
If we have a multi-package system, where we have multiple instances
of struct policy (per package), we can't have multiple instances of
same governor. i.e. We can't have multiple instances of ondemand
governor for multiple packages.

Governors directory in sysfs is created at /sys/devices/system/cpu/cpufreq/
governor-name/. Which again reflects that there can be only one
instance of a governor_type in the system.

This is a bottleneck for multicluster system, where we want different
packages to use same governor type, but with different tunables.

This patch uses the infrastructure provided by earlier patch and
implements init/exit routines for ondemand and conservative
governors.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 7bd353a9 27-Mar-2013 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Add per policy governor-init/exit infrastructure

Currently, there can't be multiple instances of single governor_type.
If we have a multi-package system, where we have multiple instances
of struct policy (per package), we can't have multiple instances of
same governor. i.e. We can't have multiple instances of ondemand
governor for multiple packages.

Governors directory in sysfs is created at /sys/devices/system/cpu/cpufreq/
governor-name/. Which again reflects that there can be only one
instance of a governor_type in the system.

This is a bottleneck for multicluster system, where we want different
packages to use same governor type, but with different tunables.

This patch is inclined towards providing this infrastructure. Because
we are required to allocate governor's resources dynamically now, we
must do it at policy creation and end. And so got
CPUFREQ_GOV_POLICY_INIT/EXIT.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 62b36cc1 31-Jan-2013 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Remove unnecessary use of policy->shared_type

policy->shared_type field was added only for SoCs with ACPI support:

commit 3b2d99429e3386b6e2ac949fc72486509c8bbe36
Author: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Date: Wed Dec 14 15:05:00 2005 -0500

P-state software coordination for ACPI core

http://bugzilla.kernel.org/show_bug.cgi?id=5737

Many non-ACPI systems are filling this field by mistake, which makes its usage
confusing. Lets clean it.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# b394058f 31-Jan-2013 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: governors: Reset tunables only for cpufreq_unregister_governor()

Currently, whenever governor->governor() is called for CPUFRREQ_GOV_START event
we reset few tunables of governor. Which isn't correct, as this routine is
called for every cpu hot-[un]plugging event. We should actually be resetting
these only when the governor module is removed and re-installed.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 2624f90c 31-Jan-2013 Fabio Baltieri <fabio.baltieri@linaro.org>

cpufreq: governors: implement generic policy_is_shared

Implement a generic helper function policy_is_shared() to replace the
current dbs_sw_coordinated_cpus() at cpufreq level, so that it can be
used by code other than cpufreq governors.

Suggested-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Fabio Baltieri <fabio.baltieri@linaro.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 951fc5f4 30-Jan-2013 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Update Documentation for cpus and related_cpus

Documentation related to cpus and related_cpus is confusing and not very clear.
Over that CPUFreq core has seen much changes recently. Lets update documentation
and comments for cpus and related_cpus.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# ab45bd9b 28-Jan-2013 Borislav Petkov <bp@suse.de>

cpufreq: Sort function prototypes properly

Move function prototypes to a place where they logically fit better.

Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 9d95046e 20-Jan-2013 Borislav Petkov <bp@suse.de>

cpufreq: Add a get_current_driver helper

Add a helper function to return cpufreq_driver->name.

Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# b8eed8af 14-Jan-2013 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Simplify __cpufreq_remove_dev()

__cpufreq_remove_dev() is called on multiple occasions: cpufreq_driver
unregister and cpu removals.

Current implementation of this routine is overly complex without much need. If
the cpu to be removed is the policy->cpu, we remove the policy first and add all
other cpus again from policy->cpus and then finally call __cpufreq_remove_dev()
again to remove the cpu to be deleted. Haahhhh..

There exist a simple solution to removal of a cpu:
- Simply use the old policy structure
- update its fields like: policy->cpu, etc.
- notify any users of cpufreq, which depend on changing policy->cpu

Hence this patch, which tries to implement the above theory. It is tested well
by myself on ARM big.LITTLE TC2 SoC, which has 5 cores (2 A15 and 3 A7). Both
A15's share same struct policy and all A7's share same policy structure.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Tested-by: Shawn Guo <shawn.guo@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 4471a34f 25-Oct-2012 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: governors: remove redundant code

Initially ondemand governor was written and then using its code conservative
governor is written. It used a lot of code from ondemand governor, but copy of
code was created instead of using the same routines from both governors. Which
increased code redundancy, which is difficult to manage.

This patch is an attempt to move common part of both the governors to
cpufreq_governor.c file to come over above mentioned issues.

This shouldn't change anything from functionality point of view.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 2aacdfff 22-Oct-2012 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Move common part from governors to separate file, v2

Multiple cpufreq governers have defined similar get_cpu_idle_time_***()
routines. These routines must be moved to some common place, so that all
governors can use them.

So moving them to cpufreq_governor.c, which seems to be a better place for
keeping these routines.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 4b972f0b 22-Oct-2012 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq / core: Fix printing of governor and driver name

Arrays for governer and driver name are of size CPUFREQ_NAME_LEN or 16.
i.e. 15 bytes for name and 1 for trailing '\0'.

When cpufreq driver print these names (for sysfs), it includes '\n' or ' ' in
the fmt string and still passes length as CPUFREQ_NAME_LEN. If the driver or
governor names are using all 15 fields allocated to them, then the trailing '\n'
or ' ' will never be printed. And so commands like:

root@linaro-developer# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_driver

will print something like:

cpufreq_foodrvroot@linaro-developer#

Fix this by increasing print length by one character.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 448c8b1d 13-Mar-2012 Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

provide disable_cpufreq() function to disable the API.

useful for disabling cpufreq altogether. The cpu frequency
scaling drivers and cpu frequency governors will fail to register.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Dave Jones <davej@redhat.com>


# 313162d0 30-Jan-2012 Paul Gortmaker <paul.gortmaker@windriver.com>

device.h: audit and cleanup users in main include dir

The <linux/device.h> header includes a lot of stuff, and
it in turn gets a lot of use just for the basic "struct device"
which appears so often.

Clean up the users as follows:

1) For those headers only needing "struct device" as a pointer
in fcn args, replace the include with exactly that.

2) For headers not really using anything from device.h, simply
delete the include altogether.

3) For headers relying on getting device.h implicitly before
being included themselves, now explicitly include device.h

4) For files in which doing #1 or #2 uncovers an implicit
dependency on some other header, fix by explicitly adding
the required header(s).

Any C files that were implicitly relying on device.h to be
present have already been dealt with in advance.

Total removals from #1 and #2: 51. Total additions coming
from #3: 9. Total other implicit dependencies from #4: 7.

As of 3.3-rc1, there were 110, so a net removal of 42 gives
about a 38% reduction in device.h presence in include/*

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>


# a7b422cd 13-Mar-2012 Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

provide disable_cpufreq() function to disable the API.

useful for disabling cpufreq altogether. The cpu frequency
scaling drivers and cpu frequency governors will fail to register.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Dave Jones <davej@redhat.com>


# 3d737108 28-Jun-2011 Jesse Barnes <jbarnes@virtuousgeek.org>

cpufreq: expose a cpufreq_quick_get_max routine

This allows drivers and other code to get the max reported CPU frequency.
Initial use is for scaling ring frequency with GPU frequency in the i915
driver.

Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Keith Packard <keithp@keithp.com>


# 335dc333 28-Apr-2011 Thiago Farina <tfransosi@gmail.com>

[CPUFREQ] cpufreq.h: Fix some checkpatch.pl coding style issues.

Before:
$ scripts/checkpatch.pl --file --terse include/linux/cpufreq.h
total: 14 errors, 11 warnings, 419 lines checked

After:
$ scripts/checkpatch.pl --file --terse include/linux/cpufreq.h
total: 2 errors, 4 warnings, 422 lines checked

Signed-off-by: Thiago Farina <tfransosi@gmail.com>
Signed-off-by: Dave Jones <davej@redhat.com>


# 2d06d8c4 27-Mar-2011 Dominik Brodowski <linux@dominikbrodowski.net>

[CPUFREQ] use dynamic debug instead of custom infrastructure

With dynamic debug having gained the capability to report debug messages
also during the boot process, it offers a far superior interface for
debug messages than the custom cpufreq infrastructure. As a first step,
remove the old cpufreq_debug_printk() function and replace it with a call
to the generic pr_debug() function.

How can dynamic debug be used on cpufreq? You need a kernel which has
CONFIG_DYNAMIC_DEBUG enabled.

To enabled debugging during runtime, mount debugfs and

$ echo -n 'module cpufreq +p' > /sys/kernel/debug/dynamic_debug/control

for debugging the complete "cpufreq" module. To achieve the same goal during
boot, append

ddebug_query="module cpufreq +p"

as a boot parameter to the kernel of your choice.

For more detailled instructions, please see
Documentation/dynamic-debug-howto.txt

Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Dave Jones <davej@redhat.com>


# 7ca64e2d 10-Mar-2011 Rafael J. Wysocki <rjw@rjwysocki.net>

[CPUFREQ] Remove the pm_message_t argument from driver suspend

None of the existing cpufreq drivers uses the second argument of
its .suspend() callback (which isn't useful anyway), so remove it.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Dave Jones <davej@redhat.com>


# e8951251 03-Mar-2011 Thomas Renninger <trenn@suse.de>

[CPUFREQ] Remove old, deprecated per cpu ondemand/conservative sysfs files

Marked deprecated for quite a whilte now...

Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Dave Jones <davej@redhat.com>
CC: cpufreq@vger.kernel.org


# 226528c6 04-Mar-2010 Amerigo Wang <amwang@redhat.com>

[CPUFREQ] unexport (un)lock_policy_rwsem* functions

lock_policy_rwsem_* and unlock_policy_rwsem_* functions are scheduled
to be unexported when 2.6.33. Now there are no other callers of them
out of cpufreq.c, unexport them and make them static.

Signed-off-by: WANG Cong <amwang@redhat.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>


# 6dad2a29 31-Mar-2010 Borislav Petkov <borislav.petkov@amd.com>

cpufreq: Unify sysfs attribute definition macros

Multiple modules used to define those which are with identical
functionality and were needlessly replicated among the different cpufreq
drivers. Push them into the header and remove duplication.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <1270065406-1814-7-git-send-email-bp@amd64.org>
Reviewed-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>


# e2f74f35 18-Nov-2009 Thomas Renninger <trenn@suse.de>

[ACPI/CPUFREQ] Introduce bios_limit per cpu cpufreq sysfs interface

This interface is mainly intended (and implemented) for ACPI _PPC BIOS
frequency limitations, but other cpufreq drivers can also use it for
similar use-cases.

Why is this needed:

Currently it's not obvious why cpufreq got limited.
People see cpufreq/scaling_max_freq reduced, but this could have
happened by:
- any userspace prog writing to scaling_max_freq
- thermal limitations
- hardware (_PPC in ACPI case) limitiations

Therefore export bios_limit (in kHz) to:
- Point the user that it's the BIOS (broken or intended) which limits
frequency
- Export it as a sysfs interface for userspace progs.
While this was a rarely used feature on laptops, there will appear
more and more server implemenations providing "Green IT" features like
allowing the service processor to limit the frequency. People want
to know about HW/BIOS frequency limitations.

All ACPI P-state driven cpufreq drivers are covered with this patch:
- powernow-k8
- powernow-k7
- acpi-cpufreq

Tested with a patched DSDT which limits the first two cores (_PPC returns 1)
via _PPC, exposed by bios_limit:
# echo 2200000 >cpu2/cpufreq/scaling_max_freq
# cat cpu*/cpufreq/scaling_max_freq
2600000
2600000
2200000
2200000
# #scaling_max_freq shows general user/thermal/BIOS limitations

# cat cpu*/cpufreq/bios_limit
2600000
2600000
2800000
2800000
# #bios_limit only shows the HW/BIOS limitation

CC: Pallipadi Venkatesh <venkatesh.pallipadi@intel.com>
CC: Len Brown <lenb@kernel.org>
CC: davej@codemonkey.org.uk
CC: linux@dominikbrodowski.net

Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Dave Jones <davej@redhat.com>


# 2eca40a8 26-Oct-2009 Randy Dunlap <randy.dunlap@oracle.com>

cpufreq: add cpufreq_get() stub for CONFIG_CPU_FREQ=n

When CONFIG_CPU_FREQ is disabled, cpufreq_get() needs a stub. Used by kvm
(although it looks like a bit of the kvm code could be omitted when
CONFIG_CPU_FREQ is disabled).

arch/x86/built-in.o: In function `kvm_arch_init':
(.text+0x10de7): undefined reference to `cpufreq_get'

(Needed in linux-next's KVM tree, but it's correct in 2.6.32).

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Tested-by: Eric Paris <eparis@redhat.com>
Cc: Jiri Slaby <jirislaby@gmail.com>
Cc: Avi Kivity <avi@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 8aa84ad8 24-Jul-2009 Thomas Renninger <trenn@suse.de>

[CPUFREQ] Introduce global, not per core: /sys/devices/system/cpu/cpufreq

Currently everything in the cpufreq layer is per core based.
This does not reflect reality, for example ondemand on conservative
governors have global sysfs variables.

Introduce a global cpufreq directory and add the kobject to the governor
struct, so that governors can easily access it.
The directory is initialized in the cpufreq_core_init initcall and thus will
always be created if cpufreq is compiled in, even if no cpufreq driver is
active later.

Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Dave Jones <davej@redhat.com>


# 129f8ae9 09-Mar-2009 Dave Jones <davej@redhat.com>

Revert "[CPUFREQ] Disable sysfs ui for p4-clockmod."

This reverts commit e088e4c9cdb618675874becb91b2fd581ee707e6.

Removing the sysfs interface for p4-clockmod was flagged as a
regression in bug 12826.

Course of action:
- Find out the remaining causes of overheating, and fix them
if possible. ACPI should be doing the right thing automatically.
If it isn't, we need to fix that.
- mark p4-clockmod ui as deprecated
- try again with the removal in six months.

It's not really feasible to printk about the deprecation, because
it needs to happen at all the sysfs entry points, which means adding
a lot of strcmp("p4-clockmod".. calls to the core, which.. bleuch.

Signed-off-by: Dave Jones <davej@redhat.com>


# 835481d9 04-Jan-2009 Rusty Russell <rusty@rustcorp.com.au>

cpumask: convert struct cpufreq_policy to cpumask_var_t

Impact: use new cpumask API to reduce memory usage

This is part of an effort to reduce structure sizes for machines
configured with large NR_CPUS. cpumask_t gets replaced by
cpumask_var_t, which is either struct cpumask[1] (small NR_CPUS) or
struct cpumask * (large NR_CPUS).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Mike Travis <travis@sgi.com>
Acked-by: Dave Jones <davej@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# e088e4c9 25-Nov-2008 Matthew Garrett <mjg@redhat.com>

[CPUFREQ] Disable sysfs ui for p4-clockmod.

p4-clockmod has a long history of abuse. It pretends to be a CPU
frequency scaling driver, even though it doesn't actually change
the CPU frequency, but instead just modulates the frequency with
wait-states.
The biggest misconception is that when running at the lower 'frequency'
p4-clockmod is saving power. This isn't the case, as workloads running
slower take longer to complete, preventing the CPU from entering deep C states.

However p4-clockmod does have a purpose. It can prevent overheating.
Having it hooked up to the cpufreq interfaces is the wrong way to achieve
cooling however. It should instead be hooked up to ACPI.

This diff introduces a means for a cpufreq driver to register with the
cpufreq core, but not present a sysfs interface.

Signed-off-by: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Dave Jones <davej@redhat.com>


# bf0b90e3 04-Aug-2008 venkatesh.pallipadi@intel.com <venkatesh.pallipadi@intel.com>

[CPUFREQ][1/6] cpufreq: Add cpu number parameter to __cpufreq_driver_getavg()

Add a cpu parameter to __cpufreq_driver_getavg(). This is needed for software
cpufreq coordination where policy->cpu may not be same as the CPU on which we
want to getavg frequency.

A follow-on patch will use this parameter to getavg freq from all cpus
in policy->cpus.

Change since last patch. Fix the offline/online and suspend/resume
oops reported by Youquan Song <youquan.song@intel.com>

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>


# a1531acd 29-Jul-2008 Thomas Renninger <trenn@suse.de>

cpufreq acpi: only call _PPC after cpufreq ACPI init funcs got called already

Ingo Molnar provided a fix to not call _PPC at processor driver
initialization time in "[PATCH] ACPI: fix cpufreq regression" (git
commit e4233dec749a3519069d9390561b5636a75c7579)

But it can still happen that _PPC is called at processor driver
initialization time.

This patch should make sure that this is not possible anymore.

Signed-off-by: Thomas Renninger <trenn@suse.de>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Len Brown <lenb@kernel.org>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 48adcf14 19-May-2008 Adrian Bunk <bunk@kernel.org>

[CPUFREQ] cpufreq: remove CVS keywords

This patch removes CVS keywords that weren't updated for a long time
from comments.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Dave Jones <davej@redhat.com>


# 30d221db 18-Apr-2008 Alessandro Guido <alessandro.guido@gmail.com>

[CPUFREQ] allow use of the powersave governor as the default one

Allow use of the powersave cpufreq governor as the default one for EMBEDDED
configs.

Signed-off-by: Alessandro Guido <alessandro.guido@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dave Jones <davej@redhat.com>


# e8628dd0 18-Apr-2008 Darrick J. Wong <djwong@us.ibm.com>

[CPUFREQ] expose cpufreq coordination requirements regardless of coordination mechanism

Currently, affected_cpus shows which CPUs need to have their frequency
coordinated in software. When hardware coordination is in use, the contents
of this file appear the same as when no coordination is required. This can
lead to some confusion among user-space programs, for example, that do not
know that extra coordination is required to force a CPU core to a particular
speed to control power consumption.

To fix this, create a "related_cpus" attribute that always displays the
coordination map regardless of whatever coordination strategy the cpufreq
driver uses (sw or hw). If the cpufreq driver does not provide a value, fall
back to policy->cpus.

Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dave Jones <davej@redhat.com>


# 9e76988e 26-Oct-2007 Venki Pallipadi <venkatesh.pallipadi@intel.com>

[CPUFREQ] Eliminate cpufreq_userspace scaling_setspeed deadlock

Eliminate cpufreq_userspace scaling_setspeed deadlock.

Luming Yu recently uncovered yet another cpufreq related deadlock.
One thread that continuously switches the governors and the other thread that
repeatedly cats the contents of cpufreq directory causes both these threads to
go into a deadlock.

Detailed examination of the deadlock showed the exact flow before the deadlock
as:

Thread 1 Thread 2
________ ________
cats files under /sys/devices/.../cpufreq/
Set governor to userspace
Adds a new sysfs entry for
scaling_setspeed
cats files under /sys/devices/.../cpufreq/

Set governor to performance
Holds cpufreq_rw_sem in write
mode
Sends a STOP notify to
userspace governor
cat /sys/devices/.../cpufreq/scaling_setspeed
Gets a handle on the above sysfs entry with
sysfs_get_active
Blocks while trying to get cpufreq_rw_sem
in read mode
Remove a sysfs entry for
scaling_setspeed
Blocks on sysfs_deactivate
while waiting for earlier
get_active (on other thread)
to drain

At this point both threads go into deadlock and any other thread that tries to
do anything with sysfs cpufreq will also block.

There seems to be no easy way to avoid this deadlock as long as
cpufreq_userspace adds/removes the sysfs entry under same kobject as cpufreq.
Below patch moves scaling_setspeed to cpufreq.c, keeping it always and calling
back the governor on read/write. This is the cleanest fix I could think of,
even though adding two callbacks in governor structure just for this seems
unnecessary.

Note that the change makes scaling_setspeed under /sys/.../cpufreq permanent
and returns <unsupported> when governor is not userspace.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>


# 6070b5de 02-Oct-2007 Satyam Sharma <satyam@infradead.org>

[CPUFREQ] implement !CONFIG_CPU_FREQ stub for cpufreq_unregister_notifier()

Callsites such as arch/powerpc/oprofile/op_model_cell.c are having to
open-code #ifdef CONFIG_CPU_FREQ only to be able to get at the full definition
of cpufreq_unregister_notifier(), because no empty stub is available for the
!CONFIG_CPU_FREQ case. Let's provide one, to be able to remove such #ifdef's
from the rest of the kernel tree -- those will come in a subsequent patch.

Signed-off-by: Satyam Sharma <satyam@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dave Jones <davej@redhat.com>


# 6afde10c 02-Oct-2007 Thomas Renninger <trenn@suse.de>

[CPUFREQ] Only check for transition latency on problematic governors (kconfig fix)

Cc: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dave Jones <davej@redhat.com>


# 1c256245 02-Oct-2007 Thomas Renninger <trenn@suse.de>

[CPUFREQ] allow ondemand and conservative cpufreq governors to be used as default

Depending on the transition latency of the HW for cpufreq switches, the
ondemand or conservative governor cannot be used with certain cpufreq
drivers. Still the ondemand should be the default governor on a wide range
of systems. This patch allows this and lets the governor fallback to the
performance governor at cpufreq driver load time, if the driver does not
support fast enough frequency switching.

Main benefit is that on e.g. installation or other systems without
userspace support a working dynamic cpufreq support can be achieved on most
systems by simply loading the cpufreq driver. This is especially essential
for recent x86(_64) laptop hardware which may rely on working dynamic
cpufreq OS support.

Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Bryan Wu <bryan.wu@analog.com>
Cc: Andi Kleen <ak@suse.de>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dave Jones <davej@redhat.com>


# ff0ce684 26-Sep-2007 Linus Torvalds <torvalds@woody.linux-foundation.org>

Revert "[PATCH] x86-64: fix x86_64-mm-sched-clock-share"

This reverts commit 184c44d2049c4db7ef6ec65794546954da2c6a0e.

As noted by Dave Jones:
"Linus, please revert the above cset. It doesn't seem to be
necessary (it was added to fix a miscompile in 'make allnoconfig'
which doesn't seem to be repeatable with it reverted) and actively
breaks the ARM SA1100 framebuffer driver."

Requested-by: Dave Jones <davej@redhat.com>
Cc: Russell King <rmk+lkml@arm.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 184c44d2 02-May-2007 Andrew Morton <akpm@linux-foundation.org>

[PATCH] x86-64: fix x86_64-mm-sched-clock-share

Fix for the following patch. Provide dummy cpufreq functions when
CPUFREQ is not compiled in.

Cc: Andi Kleen <ak@suse.de>
Cc: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>


# 632786ce 19-Apr-2007 Thomas Renninger <trenn@suse.de>

[CPUFREQ] Remove deprecated /proc/acpi/processor/performance write support

Remove deprecated /proc/acpi/processor/performance write support

Writing to /proc/acpi/processor/xy/performance interferes with sysfs
cpufreq interface. Also removes buggy cpufreq_set_policy exported symbol.

Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Dave Jones <davej@redhat.com>


# 221dee28 26-Feb-2007 Linus Torvalds <torvalds@woody.linux-foundation.org>

Revert "[CPUFREQ] constify cpufreq_driver where possible."

This reverts commit aeeddc1435c37fa3fc844f31d39c185b08de4158, which was
half-baked and broken. It just resulted in compile errors, since
cpufreq_register_driver() still changes the 'driver_data' by setting
bits in the flags field. So claiming it is 'const' _really_ doesn't
work.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# aeeddc14 22-Feb-2007 Dave Jones <davej@redhat.com>

[CPUFREQ] constify cpufreq_driver where possible.

Not all cases are possible due to ->flags being set at runtime
on some drivers.

Signed-off-by: Dave Jones <davej@redhat.com>


# 5a01f2e8 05-Feb-2007 Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>

[CPUFREQ] Rewrite lock in cpufreq to eliminate cpufreq/hotplug related issues

Yet another attempt to resolve cpufreq and hotplug locking issues.

Patchset has 3 patches:
* Rewrite the lock infrastructure of cpufreq using a per cpu rwsem.
* Minor restructuring of work callback in ondemand driver.
* Use the new cpufreq rwsem infrastructure in ondemand work.

This patch:

Convert policy->lock to rwsem and move it to per_cpu area.
This rwsem will protect against both changing/accessing policy
related parameters and CPU hot plug/unplug.

[malattia@linux.it: fix oops in kref_put()]
Cc: Gautham R Shenoy <ego@in.ibm.com>
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Gautham R Shenoy <ego@in.ibm.com>
Signed-off-by: Mattia Dongili <malattia@linux.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dave Jones <davej@redhat.com>


# dfde5d62 03-Oct-2006 Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>

[CPUFREQ][8/8] acpi-cpufreq: Add support for freq feedback from hardware

Enable ondemand governor and acpi-cpufreq to use IA32_APERF and IA32_MPERF MSR
to get active frequency feedback for the last sampling interval. This will
make ondemand take right frequency decisions when hardware coordination of
frequency is going on.

Without APERF/MPERF, ondemand can take wrong decision at times due
to underlying hardware coordination or TM2.
Example:
* CPU 0 and CPU 1 are hardware cooridnated.
* CPU 1 running at highest frequency.
* CPU 0 was running at highest freq. Now ondemand reduces it to
some intermediate frequency based on utilization.
* Due to underlying hardware coordination with other CPU 1, CPU 0 continues to
run at highest frequency (as long as other CPU is at highest).
* When ondemand samples CPU 0 again next time, without actual frequency
feedback from APERF/MPERF, it will think that previous frequency change
was successful and can go to wrong target frequency. This is because it
thinks that utilization it has got this sampling interval is when running at
intermediate frequency, rather than actual highest frequency.

More information about IA32_APERF IA32_MPERF MSR:
Refer to IA-32 Intel® Architecture Software Developer's Manual at
http://developer.intel.com

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>


# 153d7f3f 26-Jul-2006 Arjan van de Ven <arjan@linux.intel.com>

[PATCH] Reorganize the cpufreq cpu hotplug locking to not be totally bizare

The patch below moves the cpu hotplugging higher up in the cpufreq
layering; this is needed to avoid recursive taking of the cpu hotplug
lock and to otherwise detangle the mess.

The new rules are:
1. you must do lock_cpu_hotplug() around the following functions:
__cpufreq_driver_target
__cpufreq_governor (for CPUFREQ_GOV_LIMITS operation only)
__cpufreq_set_policy
2. governer methods (.governer) must NOT take the lock_cpu_hotplug()
lock in any way; they are called with the lock taken already
3. if your governer spawns a thread that does things, like calling
__cpufreq_driver_target, your thread must honor rule #1.
4. the policy lock and other cpufreq internal locks nest within
the lock_cpu_hotplug() lock.

I'm not entirely happy about how the __cpufreq_governor rule ended up
(conditional locking rule depending on the argument) but basically all
callers pass this as a constant so it's not too horrible.

The patch also removes the cpufreq_governor() function since during the
locking audit it turned out to be entirely unused (so no need to fix it)

The patch works on my testbox, but it could use more testing
(otoh... it can't be much worse than the current code)

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 46f18e3a 25-Jun-2006 Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>

ACPI: HW P-state coordination support

Treat HW coordination as independent CPUs.
This enables per-cpu monintoring of P-states

http://bugzilla.kernel.org/show_bug.cgi?id=5737

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# 62c4f0a2 25-Apr-2006 David Woodhouse <dwmw2@infradead.org>

Don't include linux/config.h from anywhere else in include/

Signed-off-by: David Woodhouse <dwmw2@infradead.org>


# 3b2d9942 14-Dec-2005 Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>

P-state software coordination for ACPI core

http://bugzilla.kernel.org/show_bug.cgi?id=5737

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# 83933af4 14-Jan-2006 Arjan van de Ven <arjan@infradead.org>

[CPUFREQ] convert remaining cpufreq semaphore to a mutex

This one fell through the automation at first because it initializes the
semaphore to locked, but that's easily remedied

Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Dave Jones <davej@redhat.com>

drivers/cpufreq/cpufreq.c | 37 +++++++++++++++++++------------------
include/linux/cpufreq.h | 3 ++-
2 files changed, 21 insertions(+), 19 deletions(-)


# 95235ca2 02-Dec-2005 Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>

[CPUFREQ] CPU frequency display in /proc/cpuinfo

What is the value shown in "cpu MHz" of /proc/cpuinfo when CPUs are capable of
changing frequency?

Today the answer is: It depends.
On i386:
SMP kernel - It is always the boot frequency
UP kernel - Scales with the frequency change and shows that was last set.

On x86_64:
There is one single variable cpu_khz that gets written by all the CPUs. So,
the frequency set by last CPU will be seen on /proc/cpuinfo of all the
CPUs in the system. What you see also depends on whether you have constant_tsc
capable CPU or not.

On ia64:
It is always boot time frequency of a particular CPU that gets displayed.

The patch below changes this to:
Show the last known frequency of the particular CPU, when cpufreq is present. If
cpu doesnot support changing of frequency through cpufreq, then boot frequency
will be shown. The patch affects i386, x86_64 and ia64 architectures.

Signed-off-by: Venkatesh Pallipadi<venkatesh.pallipadi@intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>


# 4e57b681 30-Oct-2005 Tim Schmielau <tim@physik3.uni-rostock.de>

[PATCH] fix missing includes

I recently picked up my older work to remove unnecessary #includes of
sched.h, starting from a patch by Dave Jones to not include sched.h
from module.h. This reduces the number of indirect includes of sched.h
by ~300. Another ~400 pointless direct includes can be removed after
this disentangling (patch to follow later).
However, quite a few indirect includes need to be fixed up for this.

In order to feed the patches through -mm with as little disturbance as
possible, I've split out the fixes I accumulated up to now (complete for
i386 and x86_64, more archs to follow later) and post them before the real
patch. This way this large part of the patch is kept simple with only
adding #includes, and all hunks are independent of each other. So if any
hunk rejects or gets in the way of other patches, just drop it. My scripts
will pick it up again in the next round.

Signed-off-by: Tim Schmielau <tim@physik3.uni-rostock.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# e00d9967 07-Jul-2005 Bernard Blackham <bernard@blackham.com.au>

[PATCH] pm: fix u32 vs. pm_message_t confusion in cpufreq

Fix u32 vs pm_message_t confusion in cpufreq.

Signed-off-by: Bernard Blackham <bernard@blackham.com.au>
Signed-off-by: Pavel Machek <pavel@suse.cz>
Cc: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# b53cc6ea 31-May-2005 Dave Jones <davej@redhat.com>

[CPUFREQ] fix up comment in cpufreq.h

Fix up comment in cpufreq.h stating transition latency should be passed
in microseconds -- it was decided long ago to switch to nanoseconds.

Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Dave Jones <davej@redhat.com>


# 42d4dc3f 29-Apr-2005 Benjamin Herrenschmidt <benh@kernel.crashing.org>

[PATCH] Add suspend method to cpufreq core

In order to properly fix some issues with cpufreq vs. sleep on
PowerBooks, I had to add a suspend callback to the pmac_cpufreq driver.
I must force a switch to full speed before sleep and I switch back to
previous speed on resume.

I also added a driver flag to disable the warnings in suspend/resume
since it is expected in this case to have different speed (and I want it
to fixup the jiffies properly).

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 1da177e4 16-Apr-2005 Linus Torvalds <torvalds@ppc970.osdl.org>

Linux-2.6.12-rc2

Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.

Let it rip!