History log of /linux-master/drivers/cpufreq/cpufreq.c
Revision Date Author Comments
# f37a4d6b 12-Mar-2024 Sibi Sankar <quic_sibis@quicinc.com>

cpufreq: Fix per-policy boost behavior on SoCs using cpufreq_boost_set_sw()

In the existing code, per-policy flags don't have any impact i.e.
if cpufreq_driver boost is enabled and boost is disabled for one or
more of the policies, the cpufreq driver will behave as if boost is
enabled.

Fix this by incorporating per-policy boost flag in the policy->max
computation used in cpufreq_frequency_table_cpuinfo and setting the
default per-policy boost to mirror the cpufreq_driver boost flag.

Fixes: 218a06a79d9a ("cpufreq: Support per-policy performance boost")
Reported-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Reviewed-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Dhruva Gole <d-gole@ti.com>
Signed-off-by: Sibi Sankar <quic_sibis@quicinc.com>
Tested-by:Yipeng Zou <zouyipeng@huawei.com> <mailto:zouyipeng@huawei.com>
Reviewed-by: Yipeng Zou <zouyipeng@huawei.com> <mailto:zouyipeng@huawei.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# c4d61a52 29-Feb-2024 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Don't unregister cpufreq cooling on CPU hotplug

Offlining a CPU and bringing it back online is a common operation and it
happens frequently during system suspend/resume, where the non-boot CPUs
are hotplugged out during suspend and brought back at resume.

The cpufreq core already tries to make this path as fast as possible as
the changes are only temporary in nature and full cleanup of resources
isn't required in this case. For example the drivers can implement
online()/offline() callbacks to avoid a lot of tear down of resources.

On similar lines, there is no need to unregister the cpufreq cooling
device during suspend / resume, but only while the policy is getting
removed.

Moreover, unregistering the cpufreq cooling device is resulting in an
unwanted outcome, where the system suspend is eventually aborted in the
process. Currently, during system suspend the cpufreq core unregisters
the cooling device, which in turn removes a kobject using device_del()
and that generates a notification to the userspace via uevent broadcast.
This causes system suspend to abort in some setups.

This was also earlier reported (indirectly) by Roman [1]. Maybe there is
another way around to fixing that problem properly, but this change
makes sense anyways.

Move the registering and unregistering of the cooling device to policy
creation and removal times onlyy.

Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218521
Reported-by: Manaf Meethalavalappu Pallikunhi <quic_manafm@quicinc.com>
Reported-by: Roman Stratiienko <r.stratiienko@gmail.com>
Link: https://patchwork.kernel.org/project/linux-pm/patch/20220710164026.541466-1-r.stratiienko@gmail.com/ [1]
Tested-by: Manaf Meethalavalappu Pallikunhi <quic_manafm@quicinc.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Dhruva Gole <d-gole@ti.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# a755d0e2 27-Feb-2024 Qais Yousef <qyousef@layalina.io>

cpufreq: Honour transition_latency over transition_delay_us

Some platforms like Arm's Juno can have a high transition latency that
can be larger than the 2ms cap introduced. If a driver reports
a transition_latency that is higher than the cap, then use it as-is.

Update comment s/10/2/ to reflect the new cap of 2ms.

Reported-by: Pierre Gondois <pierre.gondois@arm.com>
Signed-off-by: Qais Yousef <qyousef@layalina.io>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# e13aa799 04-Feb-2024 Qais Yousef <qyousef@layalina.io>

cpufreq: Change default transition delay to 2ms

10ms is too high for today's hardware, even low end ones. This default
end up being used a lot on Arm machines at least. Pine64, mac mini and
pixel 6 all end up with 10ms rate_limit_us when using schedutil, and
it's too high for all of them.

Change the default to 2ms which should be 'pessimistic' enough for worst
case scenario, but not too high for platforms with fast DVFS hardware.

Signed-off-by: Qais Yousef <qyousef@layalina.io>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 599457ba 11-Dec-2023 Vincent Guittot <vincent.guittot@linaro.org>

cpufreq: Use the fixed and coherent frequency for scaling capacity

cpuinfo.max_freq can change at runtime because of boost as an example. This
implies that the value could be different from the frequency that has been
used to compute the capacity of a CPU.

The new arch_scale_freq_ref() returns a fixed and coherent frequency
that can be used to compute the capacity for a given frequency.

[ Also fix a arch_set_freq_scale() newline style wart in <linux/cpufreq.h>. ]

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Lukasz Luba <lukasz.luba@arm.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
Link: https://lore.kernel.org/r/20231211104855.558096-3-vincent.guittot@linaro.org


# e7a1b32e 05-Oct-2023 Pierre Gondois <pierre.gondois@arm.com>

cpufreq: Rebuild sched-domains when removing cpufreq driver

The Energy Aware Scheduler (EAS) relies on the schedutil governor.
When moving to/from the schedutil governor, sched domains must be
rebuilt to allow re-evaluating the enablement conditions of EAS.
This is done through sched_cpufreq_governor_change().

Having a cpufreq governor assumes a cpufreq driver is running.
Inserting/removing a cpufreq driver should trigger a re-evaluation
of EAS enablement conditions, avoiding to see EAS enabled when
removing a running cpufreq driver.

Rebuild the sched domains in schedutil's sugov_init()/sugov_exit(),
allowing to check EAS's enablement condition whenever schedutil
governor is initialized/exited from.
Move relevant code up in schedutil.c to avoid a split and conditional
function declaration.
Rename sched_cpufreq_governor_change() to sugov_eas_rebuild_sd().

Signed-off-by: Pierre Gondois <pierre.gondois@arm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 0faf84ca 12-Sep-2023 Justin Stitt <justinstitt@google.com>

cpufreq: Replace deprecated strncpy() with strscpy()

`strncpy` is deprecated for use on NUL-terminated destination strings [1].

We should prefer more robust and less ambiguous string interfaces.

Both `policy->last_governor` and `default_governor` are expected to be
NUL-terminated which is shown by their heavy usage with other string
apis like `strcmp`.

A suitable replacement is `strscpy` [2] due to the fact that it guarantees
NUL-termination on the destination buffer.

Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings [1]
Link: https://manpages.debian.org/testing/linux-manual-4.8/strscpy.9.en.html [2]
Link: https://github.com/KSPP/linux/issues/90
Cc: linux-hardening@vger.kernel.org
Signed-off-by: Justin Stitt <justinstitt@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Link: https://lore.kernel.org/r/20230913-strncpy-drivers-cpufreq-cpufreq-c-v1-1-f1608bfeff63@google.com
Signed-off-by: Kees Cook <keescook@chromium.org>


# 218a06a7 22-Aug-2023 Jie Zhan <zhanjie9@hisilicon.com>

cpufreq: Support per-policy performance boost

The boost control currently applies to the whole system. However, users
may prefer to boost a subset of cores in order to provide prioritized
performance to workloads running on the boosted cores.

Enable per-policy boost by adding a 'boost' sysfs interface under each
policy path. This can be found at:

/sys/devices/system/cpu/cpufreq/policy<*>/boost

Same to the global boost switch, writing 1/0 to the per-policy 'boost'
enables/disables boost on a cpufreq policy respectively.

The user view of global and per-policy boost controls should be:

1. Enabling global boost initially enables boost on all policies, and
per-policy boost can then be enabled or disabled individually, given that
the platform does support so.

2. Disabling global boost makes the per-policy boost interface illegal.

Signed-off-by: Jie Zhan <zhanjie9@hisilicon.com>
Reviewed-by: Wei Xu <xuwei5@hisilicon.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 61bfbf79 29-Aug-2023 Liao Chang <liaochang1@huawei.com>

cpufreq: Fix the race condition while updating the transition_task of policy

The field 'transition_task' of policy structure is used to track the
task which is performing the frequency transition. Using this field to
print a warning once detect a case where the same task is calling
_begin() again before completing the preivous frequency transition via
the _end().

However, there is a potential race condition in _end() and _begin() APIs
while updating the field 'transition_task' of policy, the scenario is
depicted below:

Task A Task B

/* 1st freq transition */
Invoke _begin() {
...
...
}
/* 2nd freq transition */
Invoke _begin() {
... //waiting for A to
... //clear
... //transition_ongoing
... //in _end() for
... //the 1st transition
|
Change the frequency |
|
Invoke _end() { |
... |
... |
transition_ongoing = false; V
transition_ongoing = true;
transition_task = current;
transition_task = NULL;
... //A overwrites the task
... //performing the transition
... //result in error warning.
}

To fix this race condition, the transition_lock of policy structure is
now acquired before updating policy structure in _end() API. Which ensure
that only one task can update the 'transition_task' field at a time.

Link: https://lore.kernel.org/all/b3c61d8a-d52d-3136-fbf0-d1de9f1ba411@huawei.com/
Fixes: ca654dc3a93d ("cpufreq: Catch double invocations of cpufreq_freq_transition_begin/end")
Signed-off-by: Liao Chang <liaochang1@huawei.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 1f464cb4 29-Aug-2023 Liao Chang <liaochang1@huawei.com>

cpufreq: Avoid printing kernel addresses in cpufreq_resume()

The pointer value of policy and driver structure are currently printed
in the error messages of cpufreq_resume(), this is not recommended and
helpful.

In order to be consistent with the error message in cpufreq_suspend()
and easier to understand, print the name of driver strcture and the
manage CPU of policy structure individually in the error messages of
cpufreq_resume().

Link: https://lore.kernel.org/all/b7be717c-41d8-bbbf-3e97-3799948ab757@huawei.com
Signed-off-by: Liao Chang <liaochang1@huawei.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# ba6ea77d 14-Aug-2023 Liao Chang <liaochang1@huawei.com>

cpufreq: Prefer to print cpuid in MIN/MAX QoS register error message

When a cpufreq_policy is allocated, the cpus, related_cpus and real_cpus
of policy are still unset. Therefore, it is preferable to print the
passed 'cpu' parameter instead of a empty 'cpus' cpumask in error
message when registering MIN/MAX QoS notifier fails.

Signed-off-by: Liao Chang <liaochang1@huawei.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>


# b4a11fa3 29-May-2023 Wyes Karny <wyes.karny@amd.com>

cpufreq: Fail driver register if it has adjust_perf without fast_switch

If fast_switch_possible flag is set by the scaling driver, the governor
is free to select fast_switch function even if adjust_perf is set. Some
scaling drivers which use adjust_perf don't set fast_switch thinking
that the governor would never fall back to fast_switch. But the governor
can fall back to fast_switch even in runtime if frequency invariance is
disabled due to some reason. This could crash the kernel if the driver
didn't set the fast_switch function pointer.

Therefore, fail driver registration if it has adjust_perf without
fast_switch.

Suggested-by: Rafael J. Wysocki <rafael@kernel.org>
Suggested-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Wyes Karny <wyes.karny@amd.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 2744a63c 13-Mar-2023 Greg Kroah-Hartman <gregkh@linuxfoundation.org>

cpufreq: move to use bus_get_dev_root()

Direct access to the struct bus_type dev_root pointer is going away soon
so replace that with a call to bus_get_dev_root() instead, which is what
it is there for.

Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Len Brown <lenb@kernel.org>
Cc: linux-pm@vger.kernel.org
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://lore.kernel.org/r/20230313182918.1312597-3-gregkh@linuxfoundation.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# 44295af5 18-Apr-2023 Sanjay Chandrashekara <sanjayc@nvidia.com>

cpufreq: use correct unit when verify cur freq

cpufreq_verify_current_freq checks() if the frequency returned by
the hardware has a slight delta with the valid frequency value
last set and returns "policy->cur" if the delta is within "1 MHz".
In the comparison, "policy->cur" is in "kHz" but it's compared
against HZ_PER_MHZ. So, the comparison range becomes "1 GHz".

Fix this by comparing against KHZ_PER_MHZ instead of HZ_PER_MHZ.

Fixes: f55ae08c8987 ("cpufreq: Avoid unnecessary frequency updates due to mismatch")
Signed-off-by: Sanjay Chandrashekara <sanjayc@nvidia.com>
[ sumit gupta: Commit message update ]
Signed-off-by: Sumit Gupta <sumitg@nvidia.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# a038895e 03-Apr-2023 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: drivers with target_index() must set freq_table

Since the cpufreq core directly uses freq_table, for cpufreq drivers
that set their target_index() callback, make it mandatory for them to
set the same.

Since this is set per policy and normally from policy->init(), do this
from cpufreq_table_validate_and_sort() which gets called right after
->init().

Reported-by: Yajun Deng <yajun.deng@linux.dev>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 877d5cd2 20-Mar-2023 qinyu <qinyu32@huawei.com>

cpufreq: warn about invalid vals to scaling_max/min_freq interfaces

When echo an invalid val to scaling_min_freq:
> echo 123abc123 > scaling_min_freq
It looks weird to have a return val of 0:
> echo $?
> 0

Sane people won't echo strings like that into these interfaces but fuzz
tests may do. Also, maybe it's better to inform people if input is
invalid.

After this:
> echo 123abc123 > scaling_min_freq
> -bash: echo: write error: Invalid argument

Signed-off-by: qinyu <qinyu32@huawei.com>
Tested-by: zhangxiaofeng <zhangxiaofeng46@huawei.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 108fcad9 07-Feb-2023 Thomas Weißschuh <linux@weissschuh.net>

cpufreq: Make kobj_type structure constant

Since commit ee6d3dd4ed48 ("driver core: make kobj_type constant.")
the driver core allows the usage of const struct kobj_type.

Take advantage of this to constify the structure definition to prevent
modification at runtime.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# dd329e1e 07-Feb-2023 Uwe Kleine-König <u.kleine-koenig@pengutronix.de>

cpufreq: Make cpufreq_unregister_driver() return void

All but a few drivers ignore the return value of
cpufreq_unregister_driver(). Those few that don't only call it after
cpufreq_register_driver() succeeded, in which case the call doesn't
fail.

Make the function return no value and add a WARN_ON for the case that
the function is called in an invalid situation (i.e. without a previous
successful call to cpufreq_register_driver()).

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Acked-by: Florian Fainelli <f.fainelli@gmail.com> # brcmstb-avs-cpufreq.c
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 5c510548 10-Nov-2022 Yongqiang Liu <liuyongqiang13@huawei.com>

cpufreq: Init completion before kobject_init_and_add()

In cpufreq_policy_alloc(), it will call uninitialed completion in
cpufreq_sysfs_release() when kobject_init_and_add() fails. And
that will cause a crash such as the following page fault in complete:

BUG: unable to handle page fault for address: fffffffffffffff8
[..]
RIP: 0010:complete+0x98/0x1f0
[..]
Call Trace:
kobject_put+0x1be/0x4c0
cpufreq_online.cold+0xee/0x1fd
cpufreq_add_dev+0x183/0x1e0
subsys_interface_register+0x3f5/0x4e0
cpufreq_register_driver+0x3b7/0x670
acpi_cpufreq_init+0x56c/0x1000 [acpi_cpufreq]
do_one_initcall+0x13d/0x780
do_init_module+0x1c3/0x630
load_module+0x6e67/0x73b0
__do_sys_finit_module+0x181/0x240
do_syscall_64+0x35/0x80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Fixes: 4ebe36c94aed ("cpufreq: Fix kobject memleak")
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: 5.2+ <stable@vger.kernel.org> # 5.2+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 6ca7076f 16-Aug-2022 Lukasz Luba <lukasz.luba@arm.com>

cpufreq: check only freq_table in __resolve_freq()

There is no need to check if the cpufreq driver implements callback
cpufreq_driver::target_index. The logic in the __resolve_freq uses
the frequency table available in the policy. It doesn't matter if the
driver provides 'target_index' or 'target' callback. It just has to
populate the 'policy->freq_table'.

Thus, check only frequency table during the frequency resolving call.

Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 68315f1a 04-Jul-2022 Pierre Gondois <pierre.gondois@arm.com>

cpufreq: Change order of online() CB and policy->cpus modification

From a state where all policy->related_cpus are offline, putting one
of the policy's CPU back online re-activates the policy by:
1. Calling cpufreq_driver->online()
2. Setting the CPU in policy->cpus

qcom_cpufreq_hw_cpu_online() makes use of policy->cpus. Thus 1. and 2.
should be inverted to avoid having a policy->cpus empty. The
qcom-cpufreq-hw is the only driver affected by this.

Signed-off-by: Pierre Gondois <pierre.gondois@arm.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>


# a2f6a7ac 07-Jul-2022 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Warn users while freeing active policy

With the new design in place, the show() and store() callbacks check if
the policy is active or not before proceeding any further to avoid
potential races. And in order to guarantee that cpufreq_policy_free()
must be called after clearing the policy->cpus mask, i.e. by marking the
policy inactive.

In order to avoid introducing a bug around this later, print a warning
message if we end up freeing an active policy.

Also update cpufreq_online() a bit to make sure we clear the cpus mask
for each error case before calling cpufreq_policy_free().

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 9ab9b9d3 26-May-2022 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Drop unnecessary cpus locking from store()

This change was introduced long back by commit 4f750c930822 ("cpufreq:
Synchronize the cpufreq store_*() routines with CPU hotplug").

Since then, both cpufreq and hotplug core have been reworked and have
much better locking in place. The race mentioned in commit 4f750c930822
isn't possible anymore.

Drop the unnecessary locking.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 336e5128 26-May-2022 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Optimize cpufreq_show_cpus()

Instead of specially adding a space for each CPU, except the first one,
lets add space for each of them and remove it at the end.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 514ff1bc 15-May-2022 Schspa Shi <schspa@gmail.com>

cpufreq: make interface functions and lock holding state clear

cpufreq_offline() calls offline() and exit() under the policy rwsem
But they are called outside the rwsem in cpufreq_online().

Make cpufreq_online() call offline() and exit() as well as online() and
init() under the policy rwsem to achieve a clear lock relationship.

All of the init() and online() implementations in the tree only
initialize the policy object without attempting to acquire the policy
rwsem and they won't call cpufreq APIs attempting to acquire it.

Signed-off-by: Schspa Shi <schspa@gmail.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
[ rjw: Changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# d4627a28 15-May-2022 Schspa Shi <schspa@gmail.com>

cpufreq: Abort show()/store() for half-initialized policies

If policy initialization fails after the sysfs files are created,
there is a possibility to end up running show()/store() callbacks
for half-initialized policies, which may have unpredictable
outcomes.

Abort show()/store() in such a case by making sure the policy is active.

Also dectivate the policy on such failures.

Signed-off-by: Schspa Shi <schspa@gmail.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# f339f354 11-May-2022 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq: Rearrange locking in cpufreq_remove_dev()

Currently, cpufreq_remove_dev() invokes the ->exit() driver callback
without holding the policy rwsem which is inconsistent with what
happens if ->exit() is invoked directly from cpufreq_offline().

It also manipulates the real_cpus mask and removes the CPU device
symlink without holding the policy rwsem, but cpufreq_offline() holds
the rwsem around the modifications thereof.

For consistency, modify cpufreq_remove_dev() to hold the policy rwsem
until the ->exit() callback has been called (or it has been determined
that it is not necessary to call it).

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>


# fddd8f86 11-May-2022 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq: Split cpufreq_offline()

Split the "core" part running under the policy rwsem out of
cpufreq_offline() to allow the locking in cpufreq_remove_dev() to be
rearranged more easily.

As a side-effect this eliminates the unlock label that's not needed
any more.

No expected functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>


# e1e962c5 11-May-2022 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq: Reorganize checks in cpufreq_offline()

Notice that cpufreq_offline() only needs to check policy_is_inactive()
once and rearrange the code in there to make that happen.

No expected functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>


# 5c84c1b8 11-May-2022 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Clear real_cpus mask from remove_cpu_dev_symlink()

add_cpu_dev_symlink() is responsible for setting the CPUs in the
real_cpus mask, the reverse of which should be done from
remove_cpu_dev_symlink() to make it look clean and avoid any breakage
later on.

Move the call to clear the mask to remove_cpu_dev_symlink().

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 85f0e42b 08-May-2022 Viresh Kumar <viresh.kumar@linaro.org>

Revert "cpufreq: Fix possible race in cpufreq online error path"

This reverts commit f346e96267cd76175d6c201b40f770c0116a8a04.

The commit tried to fix a possible real bug but it made it even worse.
The fix was simply buggy as now an error out to out_offline_policy or
out_exit_policy will try to release a semaphore which was never taken in
the first place. This works fine only if we failed late, i.e. via
out_destroy_policy.

Fixes: f346e96267cd ("cpufreq: Fix possible race in cpufreq online error path")
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# f55ae08c 04-May-2022 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Avoid unnecessary frequency updates due to mismatch

For some platforms, the frequency returned by hardware may be slightly
different from what is provided in the frequency table. For example,
hardware may return 499 MHz instead of 500 MHz. In such cases it is
better to avoid getting into unnecessary frequency updates, as we may
end up switching policy->cur between the two and sending unnecessary
pre/post update notifications, etc.

This patch has chosen allows the hardware frequency and table frequency
to deviate by 1 MHz for now, we may want to increase it a bit later on
if someone still complains.

Reported-by: Rex-BC Chen <rex-bc.chen@mediatek.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Tested-by: Jia-wei Chang <jia-wei.chang@mediatek.com>
Reviewed-by: Matthias Brugger <matthias.bgg@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# f346e962 20-Apr-2022 Schspa Shi <schspa@gmail.com>

cpufreq: Fix possible race in cpufreq online error path

When cpufreq online fails, the policy->cpus mask is not cleared and
policy->rwsem is released too early, so the driver can be invoked
via the cpuinfo_cur_freq sysfs attribute while its ->offline() or
->exit() callbacks are being run.

Take policy->clk as an example:

static int cpufreq_online(unsigned int cpu)
{
...
// policy->cpus != 0 at this time
down_write(&policy->rwsem);
ret = cpufreq_add_dev_interface(policy);
up_write(&policy->rwsem);

return 0;

out_destroy_policy:
for_each_cpu(j, policy->real_cpus)
remove_cpu_dev_symlink(policy, get_cpu_device(j));
up_write(&policy->rwsem);
...
out_exit_policy:
if (cpufreq_driver->exit)
cpufreq_driver->exit(policy);
clk_put(policy->clk);
// policy->clk is a wild pointer
...
^
|
Another process access
__cpufreq_get
cpufreq_verify_current_freq
cpufreq_generic_get
// acces wild pointer of policy->clk;
|
|
out_offline_policy: |
cpufreq_policy_free(policy); |
// deleted here, and will wait for no body reference
cpufreq_policy_put_kobj(policy);
}

Address this by modifying cpufreq_online() to release policy->rwsem
in the error path after the driver callbacks have run and to clear
policy->cpus before releasing the semaphore.

Fixes: 7106e02baed4 ("cpufreq: release policy->rwsem on error")
Signed-off-by: Schspa Shi <schspa@gmail.com>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 4f774c4a 27-Jan-2022 Bjorn Andersson <bjorn.andersson@linaro.org>

cpufreq: Reintroduce ready() callback

This effectively revert '4bf8e582119e ("cpufreq: Remove ready()
callback")', in order to reintroduce the ready callback.

This is needed in order to be able to leave the thermal pressure
interrupts in the Qualcomm CPUfreq driver disabled during
initialization, so that it doesn't fire while related_cpus are still 0.

Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
[ Viresh: Added the Chinese translation as well and updated commit msg ]
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>


# fe262d5c 28-Dec-2021 Greg Kroah-Hartman <gregkh@linuxfoundation.org>

cpufreq: use default_groups in kobj_type

There are currently 2 ways to create a set of sysfs files for a
kobj_type, through the default_attrs field, and the default_groups
field. Move the cpufreq code to use default_groups field which has been
the preferred way since aa30f47cf666 ("kobject: Add support for default
attribute groups to kobj_type") so that we can soon get rid of the
obsolete default_attrs field.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 521223d8 16-Dec-2021 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq: Fix initialization of min and max frequency QoS requests

The min and max frequency QoS requests in the cpufreq core are
initialized to whatever the current min and max frequency values are
at the init time, but if any of these values change later (for
example, cpuinfo.max_freq is updated by the driver), these initial
request values will be limiting the CPU frequency unnecessarily
unless they are changed by user space via sysfs.

To address this, initialize min_freq_req and max_freq_req to
FREQ_QOS_MIN_DEFAULT_VALUE and FREQ_QOS_MAX_DEFAULT_VALUE,
respectively, so they don't really limit anything until user
space updates them.

Reported-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Tested-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 1e81d3e0 01-Dec-2021 Tang Yizhou <tangyizhou@huawei.com>

cpufreq: Fix a comment in cpufreq_policy_free

Make the comment in blocking_notifier_call_chain() easier to
understand.

Signed-off-by: Tang Yizhou <tangyizhou@huawei.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 2c1b5a84 29-Nov-2021 Xiongfeng Wang <wangxiongfeng2@huawei.com>

cpufreq: Fix get_cpu_device() failure in add_cpu_dev_symlink()

When I hot added a CPU, I found 'cpufreq' directory was not created
below /sys/devices/system/cpu/cpuX/.

It is because get_cpu_device() failed in add_cpu_dev_symlink().

cpufreq_add_dev() is the .add_dev callback of a CPU subsys interface.
It will be called when the CPU device registered into the system.
The call chain is as follows:

register_cpu()
->device_register()
->device_add()
->bus_probe_device()
->cpufreq_add_dev()

But only after the CPU device has been registered, we can get the
CPU device by get_cpu_device(), otherwise it will return NULL.

Since we already have the CPU device in cpufreq_add_dev(), pass
it to add_cpu_dev_symlink().

I noticed that the 'kobj' of the CPU device has been added into
the system before cpufreq_add_dev().

Fixes: 2f0ba790df51 ("cpufreq: Fix creation of symbolic links to policy directories")
Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: All applicable <stable@vger.kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# b894d20e 08-Sep-2021 Vincent Donnefort <vincent.donnefort@arm.com>

cpufreq: Use CPUFREQ_RELATION_E in DVFS governors

Let the governors schedutil, conservative and ondemand to work, if possible
on efficient frequencies only.

Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 1f39fa0d 08-Sep-2021 Vincent Donnefort <vincent.donnefort@arm.com>

cpufreq: Introducing CPUFREQ_RELATION_E

This newly introduced flag can be applied by a governor to a CPUFreq
relation, when looking for a frequency within the policy table. The
resolution would then only walk through efficient frequencies.

Even with the flag set, the policy max limit will still be honoured. If no
efficient frequencies can be found within the limits of the policy, an
inefficient one would be returned.

Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 15171769 08-Sep-2021 Vincent Donnefort <vincent.donnefort@arm.com>

cpufreq: Make policy min/max hard requirements

When applying the policy min/max limits, the requested frequency is
simply clamped to not be out of range. It means, however, if one of the
boundaries isn't an available frequency, the frequency resolution can
return a value out of those limits, depending on the relation used.

e.g. freq{0,1,2} being available frequencies.

freq0 policy->min freq1 policy->max freq2
| | | | |
17kHz 18kHz 19kHz 20kHz 21kHz

__resolve_freq(21kHz, CPUFREQ_RELATION_L) -> 21kHz (out of bounds)
__resolve_freq(17kHz, CPUFREQ_RELATION_H) -> 17kHz (out of bounds)

If, during the policy init, we resolve the requested min/max to existing
frequencies, we ensure that any CPUFREQ_RELATION_* would resolve to a
frequency which is inside the policy min/max range.

Making the policy limits rigid helps to introduce the inefficient
frequencies support. Resolving an inefficient frequency to an efficient
one should not transgress policy->max (which can be set for thermal
reason) and having a value we can trust simplify this comparison.

Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 4bf8e582 01-Sep-2021 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Remove ready() callback

This isn't used anymore, get rid of it.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# c17495b0 09-Aug-2021 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Add callback to register with energy model

Many cpufreq drivers register with the energy model for each policy and
do exactly the same thing. Follow the footsteps of thermal-cooling, to
get it done from the cpufreq core itself.

Provide a new callback, which will be called, if present, by the cpufreq
core at the right moment (more on that in the code's comment). Also
provide a generic implementation that uses dev_pm_opp_of_register_em().

This also allows us to register with the EM at a later point of time,
compared to ->init(), from where the EM core can access cpufreq policy
directly using cpufreq_cpu_get() type of helpers and perform other work,
like marking few frequencies inefficient, this will be done separately.

Reviewed-by: Quentin Perret <qperret@google.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>


# 09681a07 03-Aug-2021 Sebastian Andrzej Siewior <bigeasy@linutronix.de>

cpufreq: Replace deprecated CPU-hotplug functions

The functions get_online_cpus() and put_online_cpus() have been
deprecated during the CPU hotplug rework. They map directly to
cpus_read_lock() and cpus_read_unlock().

Replace deprecated CPU-hotplug functions with the official version.
The behavior remains unchanged.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# b3beca76 29-Jun-2021 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Remove ->resolve_freq()

Commit e3c062360870 ("cpufreq: add cpufreq_driver_resolve_freq()")
introduced this callback, back in 2016, for drivers that provide the
->target() callback.

The kernel hasn't seen a single user of it in the past 5 years and
it is not likely to be used any time soon.

Remove it for now.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
[ rjw: Changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# f9ccdec2 29-Jun-2021 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Reuse cpufreq_driver_resolve_freq() in __cpufreq_driver_target()

__cpufreq_driver_target() open codes cpufreq_driver_resolve_freq(), lets
make the former reuse the later.

Separate out __resolve_freq() to accept relation as well as an argument
and use it at both the locations.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 3e0f897f 22-Jun-2021 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Remove the ->stop_cpu() driver callback

Now that all users of ->stop_cpu() have been migrated to using other
callbacks, drop it from the core.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
[ rjw: Minor edits in the subject and changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 3b718057 22-Jun-2021 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq: Make cpufreq_online() call driver->offline() on errors

In the CPU removal path the ->offline() callback provided by the
driver is always invoked before ->exit(), but in the cpufreq_online()
error path it is not, so ->exit() is expected to somehow know the
context in which it has been called and act accordingly.

That is less than straightforward, so make cpufreq_online() invoke
the driver's ->offline() callback, if present, on errors before
->exit() too.

This only potentially affects intel_pstate.

Fixes: 91a12e91dc39 ("cpufreq: Allow light-weight tear down and bring up of CPUs")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>


# 60943bbd 06-Apr-2021 Shaokun Zhang <zhangshaokun@hisilicon.com>

cpufreq: Remove unused for_each_policy macro

Macro 'for_each_policy' has become unused since commit
f963735a3ca3 ("cpufreq: Create for_each_{in}active_policy()"), so
remove it.

Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 4e6df217 18-Feb-2021 Yue Hu <huyue2@yulong.com>

cpufreq: Fix typo in kerneldoc comment

Change 'Terget' to 'Target'.

Should be Target.

Signed-off-by: Yue Hu <huyue2@yulong.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
[ rjw: Subject edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 5ae4a4b4 01-Feb-2021 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Remove CPUFREQ_STICKY flag

During cpufreq driver's registration, if the ->init() callback for all
the CPUs fail then there is not much point in keeping the driver around
as it will only account for more of unnecessary noise, for example
cpufreq core will try to suspend/resume the driver which never got
registered properly.

The removal of such a driver is avoided if the driver carries the
CPUFREQ_STICKY flag. This was added way back [1] in 2004 and perhaps no
one should ever need it now. A lot of drivers do set this flag, probably
because they just copied it from other drivers.

This was added earlier for some platforms [2] because their cpufreq
drivers were getting registered before the CPUs were registered with
subsys framework. And hence they used to fail.

The same isn't true anymore though. The current code flow in the kernel
is:

start_kernel()
-> kernel_init()
-> kernel_init_freeable()
-> do_basic_setup()
-> driver_init()
-> cpu_dev_init()
-> subsys_system_register() //For CPUs

-> do_initcalls()
-> cpufreq_register_driver()

Clearly, the CPUs will always get registered with subsys framework
before any cpufreq driver can get probed. Remove the flag and update the
relevant drivers.

Link: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git/commit/include/linux/cpufreq.h?id=7cc9f0d9a1ab04cedc60d64fd8dcf7df224a3b4d # [1]
Link: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git/commit/arch/arm/mach-sa1100/cpu-sa1100.c?id=f59d3bbe35f6268d729f51be82af8325d62f20f5 # [2]
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# ee2cc427 14-Dec-2020 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq: Add special-purpose fast-switching callback for drivers

First off, some cpufreq drivers (eg. intel_pstate) can pass hints
beyond the current target frequency to the hardware and there are no
provisions for doing that in the cpufreq framework. In particular,
today the driver has to assume that it should not allow the frequency
to fall below the one requested by the governor (or the required
capacity may not be provided) which may not be the case and which may
lead to excessive energy usage in some scenarios.

Second, the hints passed by these drivers to the hardware need not be
in terms of the frequency, so representing the utilization numbers
coming from the scheduler as frequency before passing them to those
drivers is not really useful.

Address the two points above by adding a special-purpose replacement
for the ->fast_switch callback, called ->adjust_perf, allowing the
governor to pass abstract performance level (rather than frequency)
values for the minimum (required) and target (desired) performance
along with the CPU capacity to compare them to.

Also update the schedutil governor to use the new callback instead
of ->fast_switch if present and if the utilization mertics are
frequency-invariant (that is requisite for the direct mapping
between the utilization and the CPU performance levels to be a
reasonable approximation).

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>


# b96f0384 25-Nov-2020 Wang ShaoBo <bobo.shaobowang@huawei.com>

cpufreq: Fix cpufreq_online() return value on errors

Make cpufreq_online() return negative error codes on all errors that
cause the policy to be destroyed, as appropriate.

Signed-off-by: Wang ShaoBo <bobo.shaobowang@huawei.com>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# ec06e586 18-Nov-2020 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq: Fix up several kerneldoc comments

Fix up the remaining kerneldoc comments that don't adhere to the
expected format and clarify some of them a bit.

No functional changes.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>


# ea9364bb 10-Nov-2020 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq: Add strict_target to struct cpufreq_policy

Add a new field to be set when the CPUFREQ_GOV_STRICT_TARGET flag is
set for the current governor to struct cpufreq_policy, so that the
drivers needing to check CPUFREQ_GOV_STRICT_TARGET do not have to
access the governor object during every frequency transition.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>


# 9a2a9ebc 10-Nov-2020 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq: Introduce governor flags

A new cpufreq governor flag will be added subsequently, so replace
the bool dynamic_switching fleid in struct cpufreq_governor with a
flags field and introduce CPUFREQ_GOV_DYNAMIC_SWITCHING to set for
the "dynamic switching" governors instead of it.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>


# 56a7ff75 22-Oct-2020 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq: Drop restore_freq from struct cpufreq_policy

The restore_freq field in struct cpufreq_policy is only used by
__target_index() in one place and a local variable in that function
may as well be used instead of it, so drop it and modify
__target_index() accordingly.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>


# a62f68f5 23-Oct-2020 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq: Introduce cpufreq_driver_test_flags()

Add a helper function to test the flags of the cpufreq driver in use
againt a given flags mask.

In particular, this will be needed to test the
CPUFREQ_NEED_UPDATE_LIMITS cpufreq driver flag in the schedutil
governor.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 1c534352 23-Oct-2020 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq: Introduce CPUFREQ_NEED_UPDATE_LIMITS driver flag

Generally, a cpufreq driver may need to update some internal upper
and lower frequency boundaries on policy max and min changes,
respectively, but currently this does not work if the target
frequency does not change along with the policy limit.

Namely, if the target frequency does not change along with the
policy min or max, the "target_freq == policy->cur" check in
__cpufreq_driver_target() prevents driver callbacks from being
invoked and they do not even have a chance to update the
corresponding internal boundary.

This particularly affects the "powersave" and "performance"
governors that always set the target frequency to one of the
policy limits and it never changes when the other limit is updated.

To allow cpufreq the drivers needing to update internal frequency
boundaries on policy limits changes to avoid this issue, introduce
a new driver flag, CPUFREQ_NEED_UPDATE_LIMITS, that (when set) will
neutralize the check mentioned above.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>


# 97148d0a 12-Oct-2020 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Improve code around unlisted freq check

The cpufreq core checks if the frequency programmed by the bootloaders
is not listed in the freq table and programs one from the table in such
a case. This is done only if the driver has set the
CPUFREQ_NEED_INITIAL_FREQ_CHECK flag.

Currently we print two separate messages, with almost the same content,
and do this with a pr_warn() which may be a bit too much as the driver
only asked us to check this as it expected this to be the case. Lower
down the severity of the print message by switching to pr_info() instead
and print a single message only.

Reported-by: Sumit Gupta <sumitg@nvidia.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Sumit Gupta <sumitg@nvidia.com>
Tested-by: Sumit Gupta <sumitg@nvidia.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# a20b7053 24-Sep-2020 Ionela Voinescu <ionela.voinescu@arm.com>

cpufreq,arm,arm64: restructure definitions of arch_set_freq_scale()

Compared to other arch_* functions, arch_set_freq_scale() has an atypical
weak definition that can be replaced by a strong architecture specific
implementation.

The more typical support for architectural functions involves defining
an empty stub in a header file if the symbol is not already defined in
architecture code. Some examples involve:
- #define arch_scale_freq_capacity topology_get_freq_scale
- #define arch_scale_freq_invariant topology_scale_freq_invariant
- #define arch_scale_cpu_capacity topology_get_cpu_scale
- #define arch_update_cpu_topology topology_update_cpu_topology
- #define arch_scale_thermal_pressure topology_get_thermal_pressure
- #define arch_set_thermal_pressure topology_set_thermal_pressure

Bring arch_set_freq_scale() in line with these functions by renaming it to
topology_set_freq_scale() in the arch topology driver, and by defining the
arch_set_freq_scale symbol to point to the new function for arm and arm64.

While there are other users of the arch_topology driver, this patch defines
arch_set_freq_scale for arm and arm64 only, due to their existing
definitions of arch_scale_freq_capacity. This is the getter function of the
frequency invariance scale factor and without a getter function, the
setter function - arch_set_freq_scale() has not purpose.

Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Sudeep Holla <sudeep.holla@arm.com> (BL_SWITCHER and topology parts)
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 08d8c65e 05-Oct-2020 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Move traces and update to policy->cur to cpufreq core

The cpufreq core handles the updates to policy->cur and recording of
cpufreq trace events for all the governors except schedutil's fast
switch case.

Move that as well to cpufreq core for consistency and readability.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 96f60cdd 05-Oct-2020 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: stats: Enable stats for fast-switch as well

Now that all the blockers are gone for enabling stats in fast-switching
case, enable it.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# ecddc3a0 01-Sep-2020 Valentin Schneider <valentin.schneider@arm.com>

arch_topology, cpufreq: constify arch_* cpumasks

The passed cpumask arguments to arch_set_freq_scale() and
arch_freq_counters_available() are only iterated over, so reflect this
in the prototype. This also allows to pass system cpumasks like
cpu_online_mask without getting a warning.

Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 874f6353 01-Sep-2020 Ionela Voinescu <ionela.voinescu@arm.com>

cpufreq: report whether cpufreq supports Frequency Invariance (FI)

Now that the update of the FI scale factor is done in cpufreq core for
selected functions - target(), target_index() and fast_switch(),
we can provide feedback to the task scheduler and architecture code
on whether cpufreq supports FI.

For this purpose provide an external function to expose whether the
cpufreq drivers support FI, by using a static key.

The logic behind the enablement of cpufreq-based invariance is as
follows:
- cpufreq-based invariance is disabled by default
- cpufreq-based invariance is enabled if any of the callbacks
above is implemented while the unsupported setpolicy() is not

The cpufreq_supports_freq_invariance() function only returns whether
cpufreq is instrumented with the arch_set_freq_scale() calls that
result in support for frequency invariance. Due to the lack of knowledge
on whether the implementation of arch_set_freq_scale() actually results
in the setting of a scale factor based on cpufreq information, it is up
to the architecture code to ensure the setting and provision of the
scale factor to the scheduler.

Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 1a0419b0 01-Sep-2020 Ionela Voinescu <ionela.voinescu@arm.com>

cpufreq: move invariance setter calls in cpufreq core

To properly scale its per-entity load-tracking signals, the task scheduler
needs to be given a frequency scale factor, i.e. some image of the current
frequency the CPU is running at. Currently, this scale can be computed
either by using counters (APERF/MPERF on x86, AMU on arm64), or by
piggy-backing on the frequency selection done by cpufreq.

For the latter, drivers have to explicitly set the scale factor
themselves, despite it being purely boiler-plate code: the required
information depends entirely on the kind of frequency switch callback
implemented by the driver, i.e. either of: target_index(), target(),
fast_switch() and setpolicy().

The fitness of those callbacks with regard to driving the Frequency
Invariance Engine (FIE) is studied below:

target_index()
==============
Documentation states that the chosen frequency "must be determined by
freq_table[index].frequency". It isn't clear if it *has* to be that
frequency, or if it can use that frequency value to do some computation
that ultimately leads to a different frequency selection. All drivers
go for the former, while the vexpress-spc-cpufreq has an atypical
implementation which is handled separately.

Therefore, the hook works on the assumption the core can use
freq_table[index].frequency.

target()
=======
This has been flagged as deprecated since:

commit 9c0ebcf78fde ("cpufreq: Implement light weight ->target_index() routine")

It also doesn't have that many users:

gx-suspmod.c:439: .target = cpufreq_gx_target,
s3c24xx-cpufreq.c:428: .target = s3c_cpufreq_target,
intel_pstate.c:2528: .target = intel_cpufreq_target,
cppc_cpufreq.c:401: .target = cppc_cpufreq_set_target,
cpufreq-nforce2.c:371: .target = nforce2_target,
sh-cpufreq.c:163: .target = sh_cpufreq_target,
pcc-cpufreq.c:573: .target = pcc_cpufreq_target,

Similarly to the path taken for target_index() calls in the cpufreq core
during a frequency change, all of the drivers above will mark the end of a
frequency change by a call to cpufreq_freq_transition_end().

Therefore, cpufreq_freq_transition_end() can be used as the location for
the arch_set_freq_scale() call to potentially inform the scheduler of the
frequency change.

This change maintains the previous functionality for the drivers that
implement the target_index() callback, while also adding support for the
few drivers that implement the deprecated target() callback.

fast_switch()
=============
This callback *has* to return the frequency that was selected.

setpolicy()
===========
This callback does not have any designated way of informing what was the
end choice. But there are only two drivers using setpolicy(), and none
of them have current FIE support:

drivers/cpufreq/longrun.c:281: .setpolicy = longrun_set_policy,
drivers/cpufreq/intel_pstate.c:2215: .setpolicy = intel_pstate_set_policy,

The intel_pstate is known to use counter-driven frequency invariance.

Conclusion
==========

Given that the significant majority of current FIE enabled drivers use
callbacks that lend themselves to triggering the setting of the FIE scale
factor in a generic way, move the invariance setter calls to cpufreq core.

As a result of setting the frequency scale factor in cpufreq core, after
callbacks that lend themselves to trigger it, remove this functionality
from the driver side.

To be noted that despite marking a successful frequency change, many
cpufreq drivers will consider the new frequency as the requested
frequency, although this is might not be the one granted by the hardware.

Therefore, the call to arch_set_freq_scale() is a "best effort" one, and
it is up to the architecture if the new frequency is used in the new
frequency scale factor setting (determined by the implementation of
arch_set_freq_scale()) or eventually used by the scheduler (determined
by the implementation of arch_scale_freq_capacity()). The architecture
is in a better position to decide if it has better methods to obtain
more accurate information regarding the current frequency and use that
information instead (for example, the use of counters).

Also, the implementation to arch_set_freq_scale() will now have to handle
error conditions (current frequency == 0) in order to prevent the
overhead in cpufreq core when the default arch_set_freq_scale()
implementation is used.

Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com>
Suggested-by: Valentin Schneider <valentin.schneider@arm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 681fe684 26-Aug-2020 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: No need to verify cpufreq_driver in show_scaling_cur_freq()

"cpufreq_driver" is guaranteed to be valid here, no need to check it
here.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# f6ebbcf0 06-Aug-2020 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq: intel_pstate: Implement passive mode with HWP enabled

Allow intel_pstate to work in the passive mode with HWP enabled and
make it set the HWP minimum performance limit (HWP floor) to the
P-state value given by the target frequency supplied by the cpufreq
governor, so as to prevent the HWP algorithm and the CPU scheduler
from working against each other, at least when the schedutil governor
is in use, and update the intel_pstate documentation accordingly.

Among other things, this allows utilization clamps to be taken
into account, at least to a certain extent, when intel_pstate is
in use and makes it more likely that sufficient capacity for
deadline tasks will be provided.

After this change, the resulting behavior of an HWP system with
intel_pstate in the passive mode should be close to the behavior
of the analogous non-HWP system with intel_pstate in the passive
mode, except that the HWP algorithm is generally allowed to make the
CPU run at a frequency above the floor P-state set by intel_pstate in
the entire available range of P-states, while without HWP a CPU can
run in a P-state above the requested one if the latter falls into the
range of turbo P-states (referred to as the turbo range) or if the
P-states of all CPUs in one package are coordinated with each other
at the hardware level.

[Note that in principle the HWP floor may not be taken into account
by the processor if it falls into the turbo range, in which case the
processor has a license to choose any P-state, either below or above
the HWP floor, just like a non-HWP processor in the case when the
target P-state falls into the turbo range.]

With this change applied, intel_pstate in the passive mode assumes
complete control over the HWP request MSR and concurrent changes of
that MSR (eg. via the direct MSR access interface) are overridden by
it.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>


# 292072c3 29-Jul-2020 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: cached_resolved_idx can not be negative

It is not possible for cached_resolved_idx to be invalid here as the
cpufreq core always sets index to a positive value.

Change its type to unsigned int and fix qcom usage a bit.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>


# a9909c21 15-Jul-2020 Lee Jones <lee.jones@linaro.org>

cpufreq: cpufreq: Demote lots of function headers unworthy of kerneldoc status

Also provide missing function parameter description for 'cpu' and 'policy'.

Fixes the following W=1 kernel build warning(s):

drivers/cpufreq/cpufreq.c:60: warning: cannot understand function prototype: 'struct cpufreq_driver *cpufreq_driver; '
drivers/cpufreq/cpufreq.c:90: warning: Function parameter or member 'cpufreq_policy_notifier_list' not described in 'BLOCKING_NOTIFIER_HEAD'
drivers/cpufreq/cpufreq.c:312: warning: Function parameter or member 'val' not described in 'adjust_jiffies'
drivers/cpufreq/cpufreq.c:312: warning: Function parameter or member 'ci' not described in 'adjust_jiffies'
drivers/cpufreq/cpufreq.c:538: warning: Function parameter or member 'policy' not described in 'cpufreq_driver_resolve_freq'
drivers/cpufreq/cpufreq.c:686: warning: Function parameter or member 'file_name' not described in 'show_one'
drivers/cpufreq/cpufreq.c:686: warning: Function parameter or member 'object' not described in 'show_one'
drivers/cpufreq/cpufreq.c:731: warning: Function parameter or member 'file_name' not described in 'store_one'
drivers/cpufreq/cpufreq.c:731: warning: Function parameter or member 'object' not described in 'store_one'
drivers/cpufreq/cpufreq.c:741: warning: Function parameter or member 'policy' not described in 'show_cpuinfo_cur_freq'
drivers/cpufreq/cpufreq.c:741: warning: Function parameter or member 'buf' not described in 'show_cpuinfo_cur_freq'
drivers/cpufreq/cpufreq.c:754: warning: Function parameter or member 'policy' not described in 'show_scaling_governor'
drivers/cpufreq/cpufreq.c:754: warning: Function parameter or member 'buf' not described in 'show_scaling_governor'
drivers/cpufreq/cpufreq.c:770: warning: Function parameter or member 'policy' not described in 'store_scaling_governor'
drivers/cpufreq/cpufreq.c:770: warning: Function parameter or member 'buf' not described in 'store_scaling_governor'
drivers/cpufreq/cpufreq.c:770: warning: Function parameter or member 'count' not described in 'store_scaling_governor'
drivers/cpufreq/cpufreq.c:806: warning: Function parameter or member 'policy' not described in 'show_scaling_driver'
drivers/cpufreq/cpufreq.c:806: warning: Function parameter or member 'buf' not described in 'show_scaling_driver'
drivers/cpufreq/cpufreq.c:815: warning: Function parameter or member 'policy' not described in 'show_scaling_available_governors'
drivers/cpufreq/cpufreq.c:815: warning: Function parameter or member 'buf' not described in 'show_scaling_available_governors'
drivers/cpufreq/cpufreq.c:859: warning: Function parameter or member 'policy' not described in 'show_related_cpus'
drivers/cpufreq/cpufreq.c:859: warning: Function parameter or member 'buf' not described in 'show_related_cpus'
drivers/cpufreq/cpufreq.c:867: warning: Function parameter or member 'policy' not described in 'show_affected_cpus'
drivers/cpufreq/cpufreq.c:867: warning: Function parameter or member 'buf' not described in 'show_affected_cpus'
drivers/cpufreq/cpufreq.c:901: warning: Function parameter or member 'policy' not described in 'show_bios_limit'
drivers/cpufreq/cpufreq.c:901: warning: Function parameter or member 'buf' not described in 'show_bios_limit'
drivers/cpufreq/cpufreq.c:1625: warning: Function parameter or member 'dev' not described in 'cpufreq_remove_dev'
drivers/cpufreq/cpufreq.c:1625: warning: Function parameter or member 'sif' not described in 'cpufreq_remove_dev'
drivers/cpufreq/cpufreq.c:2380: warning: Function parameter or member 'cpu' not described in 'cpufreq_get_policy'
drivers/cpufreq/cpufreq.c:2771: warning: Function parameter or member 'driver' not described in 'cpufreq_unregister_driver'

Signed-off-by: Lee Jones <lee.jones@linaro.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 3a7e4fbb 29-Jun-2020 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Remove the weakly defined cpufreq_default_governor()

The default cpufreq governor is chosen with the help of a "choice"
option in the Kconfig which will always end up selecting one of
the governors and so the weakly defined definition of
cpufreq_default_governor() will never get called.

Moreover, this makes us skip the checking of the return value of
that routine as it will always be non NULL.

If the Kconfig option changes in future, then we will start getting
a link error instead (and it won't go unnoticed as in the case of the
weak definition).

Suggested-by: Quentin Perret <qperret@google.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 8412b456 29-Jun-2020 Quentin Perret <qperret@google.com>

cpufreq: Specify default governor on command line

Currently, the only way to specify the default CPUfreq governor is
via Kconfig options, which suits users who can build the kernel
themselves perfectly.

However, for those who use a distro-like kernel (such as Android,
with the Generic Kernel Image project), the only way to use a
non-default governor is to boot to userspace, and to then switch
using the sysfs interface. Being able to specify the default governor
on the command line, like is the case for cpuidle, would allow those
users to specify their governor of choice earlier on, and to simplify
the userspace boot procedure slighlty.

To support this use-case, add a kernel command line parameter
allowing the default governor for CPUfreq to be specified, which
takes precedence over the built-in default.

This implementation has one notable limitation: the default governor
must be registered before the driver. This is solved for builtin
governors and drivers using appropriate *_initcall() functions. And
in the modular case, this must be reflected as a constraint on the
module loading order.

Signed-off-by: Quentin Perret <qperret@google.com>
[ Viresh: Converted 'default_governor' to a string and parsing it only
at initcall level, and several updates to
cpufreq_init_policy(). ]
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
[ rjw: Changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 8cc46ae5 29-Jun-2020 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Fix locking issues with governors

The locking around governors handling isn't adequate currently.

The list of governors should never be traversed without the locking
in place. Also governor modules must not be removed while the code
in them is still in use.

Reported-by: Quentin Perret <qperret@google.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: All applicable <stable@vger.kernel.org>
[ rjw: Changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# cf6fada7 29-May-2020 Xiongfeng Wang <wangxiongfeng2@huawei.com>

cpufreq: change '.set_boost' to act on one policy

Macro 'for_each_active_policy()' is defined internally. To avoid some
cpufreq driver needing this macro to iterate over all the policies in
'.set_boost' callback, we redefine '.set_boost' to act on only one
policy and pass the policy as an argument.

'cpufreq_boost_trigger_state()' iterates over all the policies to set
boost for the system.

This is preparation for adding SW BOOST support for CPPC.

To protect Boost enable/disable by sysfs from CPU online/offline,
add 'cpu_hotplug_lock' before calling '.set_boost' for each CPU.

Also move the lock from 'set_boost()' to 'store_cpb()' in
acpi_cpufreq.

Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com>
Suggested-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
[ rjw: Subject & changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 552abb88 17-May-2020 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq: Fix up cpufreq_boost_set_sw()

After commit 18c49926c4bf ("cpufreq: Add QoS requests for userspace
constraints") the return value of freq_qos_update_request(), that can
be 1, passed by cpufreq_boost_set_sw() to its caller sometimes
confuses the latter, which only expects to see 0 or negative error
codes, so notice that cpufreq_boost_set_sw() can return an error code
(which should not be -EINVAL for that matter) as soon as the first
policy without a frequency table is found (because either all policies
have a frequency table or none of them have it) and rework it to meet
its caller's expectations.

Fixes: 18c49926c4bf ("cpufreq: Add QoS requests for userspace constraints")
Reported-by: Serge Semin <Sergey.Semin@baikalelectronics.ru>
Reported-by: Xiongfeng Wang <wangxiongfeng2@huawei.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: 5.3+ <stable@vger.kernel.org> # 5.3+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# bbce8eaa 05-Mar-2020 Ionela Voinescu <ionela.voinescu@arm.com>

cpufreq: add function to get the hardware max frequency

Add weak function to return the hardware maximum frequency of a CPU,
with the default implementation returning cpuinfo.max_freq, which is
the best information we can generically get from the cpufreq framework.

The default can be overwritten by a strong function in platforms
that want to provide an alternative implementation, with more accurate
information, obtained either from hardware or firmware.

Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>


# f5739cb0 26-Feb-2020 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq: Fix policy initialization for internal governor drivers

Before commit 1e4f63aecb53 ("cpufreq: Avoid creating excessively
large stack frames") the initial value of the policy field in struct
cpufreq_policy set by the driver's ->init() callback was implicitly
passed from cpufreq_init_policy() to cpufreq_set_policy() if the
default governor was neither "performance" nor "powersave". After
that commit, however, cpufreq_init_policy() must take that case into
consideration explicitly and handle it as appropriate, so make that
happen.

Fixes: 1e4f63aecb53 ("cpufreq: Avoid creating excessively large stack frames")
Link: https://lore.kernel.org/linux-pm/39fb762880c27da110086741315ca8b111d781cd.camel@gmail.com/
Reported-by: Artem Bityutskiy <dedekind1@gmail.com>
Cc: 5.4+ <stable@vger.kernel.org> # 5.4+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>


# 183edb20 03-Feb-2020 Yangtao Li <tiny.windzz@gmail.com>

cpufreq: Make cpufreq_global_kobject static

The cpufreq_global_kobject is only used internally by cpufreq.c
after commit 2361be236662 ("cpufreq: Don't create empty
/sys/devices/system/cpu/cpufreq directory").

Make it static.

Signed-off-by: Yangtao Li <tiny.windzz@gmail.com>
[ rjw: Add empty line after cpufreq_global_kobject definition ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 1e4f63ae 26-Jan-2020 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq: Avoid creating excessively large stack frames

In the process of modifying a cpufreq policy, the cpufreq core makes
a copy of it including all of the internals which is stored on the
CPU stack. Because struct cpufreq_policy is relatively large, this
may cause the size of the stack frame to exceed the 2 KB limit and
so the GCC complains when -Wframe-larger-than= is used.

In fact, it is not necessary to copy the entire policy structure
in order to modify it, however.

First, because cpufreq_set_policy() obtains the min and max policy
limits from frequency QoS now, it is not necessary to pass the limits
to it from the callers. The only things that need to be passed to it
from there are the new governor pointer or (if there is a built-in
governor in the driver) the "policy" value representing the governor
choice. They both can be passed as individual arguments, though, so
make cpufreq_set_policy() take them this way and rework its callers
accordingly. This avoids making copies of cpufreq policies in the
callers of cpufreq_set_policy().

Second, cpufreq_set_policy() still needs to pass the new policy
data to the ->verify() callback of the cpufreq driver whose task
is to sanitize the min and max policy limits. It still does not
need to make a full copy of struct cpufreq_policy for this purpose,
but it needs to pass a few items from it to the driver in case they
are needed (different drivers have different needs in that respect
and all of them have to be covered). For this reason, introduce
struct cpufreq_policy_data to hold copies of the members of
struct cpufreq_policy used by the existing ->verify() driver
callbacks and pass a pointer to a temporary structure of that
type to ->verify() (instead of passing a pointer to full struct
cpufreq_policy to it).

While at it, notice that intel_pstate and longrun don't really need
to verify the "policy" value in struct cpufreq_policy, so drop those
check from them to avoid copying "policy" into struct
cpufreq_policy_data (which allows it to be slightly smaller).

Also while at it fix up white space in a couple of places and make
cpufreq_set_policy() static (as it can be so).

Fixes: 3000ce3c52f8 ("cpufreq: Use per-policy frequency QoS")
Link: https://lore.kernel.org/linux-pm/CAMuHMdX6-jb1W8uC2_237m8ctCpsnGp=JCxqt8pCWVqNXHmkVg@mail.gmail.com
Reported-by: kbuild test robot <lkp@intel.com>
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: 5.4+ <stable@vger.kernel.org> # 5.4+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>


# 5720821b 20-Nov-2019 Frederic Weisbecker <frederic@kernel.org>

cpufreq: Use vtime aware kcpustat accessors for user time

We can now safely read user and guest kcpustat fields on nohz_full CPUs.
Use the appropriate accessors.

Reported-by: Yauheni Kaliuta <yauheni.kaliuta@redhat.com>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wanpeng Li <wanpengli@tencent.com>
Link: https://lkml.kernel.org/r/20191121024430.19938-5-frederic@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 46770be0 13-Nov-2019 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Register drivers only after CPU devices have been registered

The cpufreq core heavily depends on the availability of the struct
device for CPUs and if they aren't available at the time cpufreq driver
is registered, we will never succeed in making cpufreq work.

This happens due to following sequence of events:

- cpufreq_register_driver()
- subsys_interface_register()
- return 0; //successful registration of driver

... at a later point of time

- register_cpu();
- device_register();
- bus_probe_device();
- sif->add_dev();
- cpufreq_add_dev();
- get_cpu_device(); //FAILS
- per_cpu(cpu_sys_devices, num) = &cpu->dev; //used by get_cpu_device()
- return 0; //CPU registered successfully

Because the per-cpu variable cpu_sys_devices is set only after the CPU
device is regsitered, cpufreq will never be able to get it when
cpufreq_add_dev() is called.

This patch avoids this failure by making sure device structure of at
least CPU0 is available when the cpufreq driver is registered, else
return -EPROBE_DEFER.

Reported-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Co-developed-by: Amit Kucheria <amit.kucheria@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Tested-by: Amit Kucheria <amit.kucheria@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# e6e8df07 06-Nov-2019 Kai Shen <shenkai8@huawei.com>

cpufreq: Add NULL checks to show() and store() methods of cpufreq

Add NULL checks to show() and store() in cpufreq.c to avoid attempts
to invoke a NULL callback.

Though some interfaces of cpufreq are set as read-only, users can
still get write permission using chmod which can lead to a kernel
crash, as follows:

chmod +w /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq
echo 1 > /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq

This bug was found in linux 4.19.

Signed-off-by: Kai Shen <shenkai8@huawei.com>
Reported-by: Feilong Lin <linfeilong@huawei.com>
Reviewed-by: Feilong Lin <linfeilong@huawei.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
[ rjw: Subject & changelog ]
Cc: All applicable <stable@vger.kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 737ffb27 22-Oct-2019 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Clarify the comment in cpufreq_set_policy()

One of the responsibility of the ->verify() callback is to make sure
that the policy's min frequency is <= max frequency as this isn't
guaranteed by the QoS framework which gave us those values.

Update the comment in cpufreq_set_policy() to clarify that.

Suggested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
[ rjw: Minor changes of the new comment ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 49bb001e 15-Oct-2019 Frederic Weisbecker <frederic@kernel.org>

cpufreq: Use vtime aware kcpustat accessor to fetch CPUTIME_SYSTEM

Now that we have a vtime safe kcpustat accessor for CPUTIME_SYSTEM, use
it to start fixing frozen kcpustat values on nohz_full CPUs.

Reported-by: Yauheni Kaliuta <yauheni.kaliuta@redhat.com>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Jacek Anaszewski <jacek.anaszewski@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J . Wysocki <rjw@rjwysocki.net>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wanpeng Li <wanpengli@tencent.com>
Link: https://lkml.kernel.org/r/20191016025700.31277-14-frederic@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 6941051d 18-Oct-2019 Sudeep Holla <sudeep.holla@arm.com>

cpufreq: Cancel policy update work scheduled before freeing

Scheduled policy update work may end up racing with the freeing of the
policy and unregistering the driver.

One possible race is as below, where the cpufreq_driver is unregistered,
but the scheduled work gets executed at later stage when, cpufreq_driver
is NULL (i.e. after freeing the policy and driver).

Unable to handle kernel NULL pointer dereference at virtual address 0000001c
pgd = (ptrval)
[0000001c] *pgd=80000080204003, *pmd=00000000
Internal error: Oops: 206 [#1] SMP THUMB2
Modules linked in:
CPU: 0 PID: 34 Comm: kworker/0:1 Not tainted 5.4.0-rc3-00006-g67f5a8081a4b #86
Hardware name: ARM-Versatile Express
Workqueue: events handle_update
PC is at cpufreq_set_policy+0x58/0x228
LR is at dev_pm_qos_read_value+0x77/0xac
Control: 70c5387d Table: 80203000 DAC: fffffffd
Process kworker/0:1 (pid: 34, stack limit = 0x(ptrval))
(cpufreq_set_policy) from (refresh_frequency_limits.part.24+0x37/0x48)
(refresh_frequency_limits.part.24) from (handle_update+0x2f/0x38)
(handle_update) from (process_one_work+0x16d/0x3cc)
(process_one_work) from (worker_thread+0xff/0x414)
(worker_thread) from (kthread+0xff/0x100)
(kthread) from (ret_from_fork+0x11/0x28)

Fixes: 67d874c3b2c6 ("cpufreq: Register notifiers with the PM QoS framework")
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
[ rjw: Cancel the work before dropping the QoS requests ]
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 3000ce3c 15-Oct-2019 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq: Use per-policy frequency QoS

Replace the CPU device PM QoS used for the management of min and max
frequency constraints in cpufreq (and its users) with per-policy
frequency QoS to avoid problems with cpufreq policies covering
more then one CPU.

Namely, a cpufreq driver is registered with the subsys interface
which calls cpufreq_add_dev() for each CPU, starting from CPU0, so
currently the PM QoS notifiers are added to the first CPU in the
policy (i.e. CPU0 in the majority of cases).

In turn, when the cpufreq driver is unregistered, the subsys interface
doing that calls cpufreq_remove_dev() for each CPU, starting from CPU0,
and the PM QoS notifiers are only removed when cpufreq_remove_dev() is
called for the last CPU in the policy, say CPUx, which as a rule is
not CPU0 if the policy covers more than one CPU. Then, the PM QoS
notifiers cannot be removed, because CPUx does not have them, and
they are still there in the device PM QoS notifiers list of CPU0,
which prevents new PM QoS notifiers from being registered for CPU0
on the next attempt to register the cpufreq driver.

The same issue occurs when the first CPU in the policy goes offline
before unregistering the driver.

After this change it does not matter which CPU is the policy CPU at
the driver registration time and whether or not it is online all the
time, because the frequency QoS is per policy and not per CPU.

Fixes: 67d874c3b2c6 ("cpufreq: Register notifiers with the PM QoS framework")
Reported-by: Dmitry Osipenko <digetx@gmail.com>
Tested-by: Dmitry Osipenko <digetx@gmail.com>
Reported-by: Sudeep Holla <sudeep.holla@arm.com>
Tested-by: Sudeep Holla <sudeep.holla@arm.com>
Diagnosed-by: Viresh Kumar <viresh.kumar@linaro.org>
Link: https://lore.kernel.org/linux-pm/5ad2624194baa2f53acc1f1e627eb7684c577a19.1562210705.git.viresh.kumar@linaro.org/T/#md2d89e95906b8c91c15f582146173dce2e86e99f
Link: https://lore.kernel.org/linux-pm/20191017094612.6tbkwoq4harsjcqv@vireshk-i7/T/#m30d48cc23b9a80467fbaa16e30f90b3828a5a29b
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>


# 65650b35 08-Oct-2019 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq: Avoid cpufreq_suspend() deadlock on system shutdown

It is incorrect to set the cpufreq syscore shutdown callback pointer
to cpufreq_suspend(), because that function cannot be run in the
syscore stage of system shutdown for two reasons: (a) it may attempt
to carry out actions depending on devices that have already been shut
down at that point and (b) the RCU synchronization carried out by it
may not be able to make progress then.

The latter issue has been present since commit 45975c7d21a1 ("rcu:
Define RCU-sched API in terms of RCU for Tree RCU PREEMPT builds"),
but the former one has been there since commit 90de2a4aa9f3 ("cpufreq:
suspend cpufreq governors on shutdown") regardless.

Fix that by dropping cpufreq_syscore_ops altogether and making
device_shutdown() call cpufreq_suspend() directly before shutting
down devices, which is along the lines of what system-wide power
management does.

Fixes: 45975c7d21a1 ("rcu: Define RCU-sched API in terms of RCU for Tree RCU PREEMPT builds")
Fixes: 90de2a4aa9f3 ("cpufreq: suspend cpufreq governors on shutdown")
Reported-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Tested-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: 4.0+ <stable@vger.kernel.org> # 4.0+


# df0eea44 23-Jul-2019 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Remove CPUFREQ_ADJUST and CPUFREQ_NOTIFY policy notifier events

No driver makes reference to these events now, remove them and the code
related to them.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# e9a7cc1d 21-Aug-2019 Florian Fainelli <f.fainelli@gmail.com>

cpufreq: Print driver name if cpufreq_suspend() fails

Instead of printing the policy, which is incidentally a kernel pointer,
so with limited interest, print the cpufreq driver name that failed to
be suspend, which is more useful for debugging.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 62c23a89 13-Aug-2019 Colin Ian King <colin.king@canonical.com>

cpufreq: remove redundant assignment to ret

Variable ret is initialized to a value that is never read and it is
re-assigned later. The initialization is redundant and can be removed.

Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 6a149036 23-Jul-2019 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Add policy create/remove notifiers back

This effectively reverts some changes made by commit f9f41e3ef99
("cpufreq: Remove policy create/remove notifiers").

We have a new use case for policy create/remove notifiers (for
allocating/freeing QoS requests per policy), so add them back.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
[ rjw: Subject & changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# e61a4125 08-Aug-2019 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: dev_pm_qos_update_request() can return 1 on success

dev_pm_qos_update_request() can return 1 on success, so don't treat
it as an error.

Fixes: 18c49926c4bf ("cpufreq: Add QoS requests for userspace constraints")
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# c4dcc8a1 15-Jul-2019 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Make cpufreq_generic_init() return void

It always returns 0 (success) and its return type should really be void.

Over that, many drivers have added error handling code based on its
return value, which is not required at all.

Change its return type to void and update all the callers.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 18c49926 05-Jul-2019 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Add QoS requests for userspace constraints

This implements QoS requests to manage userspace configuration of min
and max frequency.

Reviewed-by: Matthias Kaehlcke <mka@chromium.org>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Tested-by: syzbot <syzbot+de771ae9390dffed7266@syzkaller.appspotmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# c57b25bd 04-Jul-2019 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: intel_pstate: Reuse refresh_frequency_limits()

The implementation of intel_pstate_update_max_freq() is quite similar to
refresh_frequency_limits(), lets reuse it.

Finding minimum of policy->user_policy.max and policy->cpuinfo.max_freq
in intel_pstate_update_max_freq() is redundant as cpufreq_set_policy()
will call the ->verify() callback of intel-pstate driver, which will do
this comparison anyway and so dropping it from
intel_pstate_update_max_freq() doesn't harm.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 67d874c3 08-Jul-2019 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Register notifiers with the PM QoS framework

Register notifiers for min/max frequency constraints with the PM QoS
framework. The constraints are also taken into consideration in
cpufreq_set_policy().

This also relocates cpufreq_policy_put_kobj() as it is required to be
called from cpufreq_policy_alloc() now.

refresh_frequency_limits() is updated to avoid calling
cpufreq_set_policy() for inactive policies and handle_update() is
updated to have proper locking in place.

No constraints are added until now though.

Reviewed-by: Matthias Kaehlcke <mka@chromium.org>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Tested-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 70a59fde 19-Jun-2019 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Avoid calling cpufreq_verify_current_freq() from handle_update()

On some occasions cpufreq_verify_current_freq() schedules a work whose
callback is handle_update(), which further calls cpufreq_update_policy()
which may end up calling cpufreq_verify_current_freq() again.

On the other hand, when cpufreq_update_policy() is called from
handle_update(), the pointer to the cpufreq policy is already
available, but cpufreq_cpu_acquire() is still called to get it in
cpufreq_update_policy(), which should be avoided as well.

To fix these issues, create a new helper, refresh_frequency_limits(),
and make both handle_update() call it cpufreq_update_policy().

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
[ rjw: Rename reeval_frequency_limits() as refresh_frequency_limits() ]
[ rjw: Changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 5980752e 19-Jun-2019 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Consolidate cpufreq_update_current_freq() and __cpufreq_get()

Their implementations are quite similar, so modify
cpufreq_update_current_freq() somewhat and call it from
__cpufreq_get().

Also rename cpufreq_update_current_freq() to
cpufreq_verify_current_freq(), as that's what it is doing.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
[ rjw: Subject & changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 98015228 27-Jun-2019 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Don't skip frequency validation for has_target() drivers

CPUFREQ_CONST_LOOPS was introduced in a very old commit from pre-2.6
kernel release by commit 6a4a93f9c0d5 ("[CPUFREQ] Fix 'out of sync'
issue").

Basically, that commit does two things:

- It adds the frequency verification code (which is quite similar to
what we have today as well).

- And it sets the CPUFREQ_CONST_LOOPS flag only for setpolicy drivers,
rightly so based on the code we had then. The idea was to avoid
frequency validation for setpolicy drivers as the cpufreq core doesn't
know what frequency the hardware is running at and so no point in
doing frequency verification.

The problem happened when we started to use the same CPUFREQ_CONST_LOOPS
flag for constant loops-per-jiffy thing as well and many has_target()
drivers started using the same flag and unknowingly skipped the
verification of frequency. There is no logical reason behind skipping
frequency validation because of the presence of CPUFREQ_CONST_LOOPS
flag otherwise.

Fix this issue by skipping frequency validation only for setpolicy
drivers and always doing it for has_target() drivers irrespective of
the presence or absence of CPUFREQ_CONST_LOOPS flag.

cpufreq_notify_transition() is only called for has_target() type driver
and not for set_policy type, and the check is simply redundant. Remove
it as well.

Also remove () around freq comparison statement as they aren't required
and checkpatch also warns for them.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 5ddc6d4e 19-Jun-2019 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Use has_target() instead of !setpolicy

For code consistency, use has_target() instead of !setpolicy everywhere,
as it is already done at several places. Maybe we should also use
"!has_target()" instead of "cpufreq_driver->setpolicy" where we need to
check if the driver supports setpolicy, so to use only one expression
for this kind of differentiation.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 407d0fff 19-Jun-2019 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Remove redundant !setpolicy check

cpufreq_start_governor() is only called for !setpolicy case, checking it
again is not required.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# bcc61569 25-Jun-2019 Daniel Lezcano <daniel.lezcano@linaro.org>

cpufreq: Move the IS_ENABLED(CPU_THERMAL) macro into a stub

cpufreq_online() and cpufreq_offline() [un]register the driver as
a cooling device. This is done if the driver is flagged as a cooling
device in addition with an IS_ENABLED() check to compile out the branching
code.

Group this test in a stub function added in the cpufreq header instead
of having the IS_ENABLED() in the code.

Suggested-by: Rafael J. Wysocki <rafael@kernel.org>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# d2912cb1 04-Jun-2019 Thomas Gleixner <tglx@linutronix.de>

treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500

Based on 2 normalized pattern(s):

this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license version 2 as
published by the free software foundation

this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license version 2 as
published by the free software foundation #

extracted by the scancode license scanner the SPDX license identifier

GPL-2.0-only

has been chosen to replace the boilerplate/reference in 4122 file(s).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Enrico Weigelt <info@metux.net>
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Allison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190604081206.933168790@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# ab05d97a 29-Apr-2019 Yue Hu <huyue2@yulong.com>

cpufreq: Don't find governor for setpolicy drivers in cpufreq_init_policy()

In cpufreq_init_policy() we will check if there's last_governor for target
and setpolicy type. However last_governor is set only if has_target() is
true in cpufreq_offline(). That means find last_governor for setpolicy
type is pointless. Also new_policy.governor will not be used if ->setpolicy
callback is set in cpufreq_set_policy().

Moreover, there's duplicate ->setpolicy check in using default policy path.
Let's add a new helper function to avoid it. Also update comments.

Signed-off-by: Yue Hu <huyue2@yulong.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 2acb9bda 09-May-2019 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq: Explain the kobject_put() in cpufreq_policy_alloc()

It may not be particularly clear why the kobject_put() after
failing kobject_init_and_add() in cpufreq_policy_alloc() is not
redundant, so add a comment to explain that.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>


# df24014a 29-Apr-2019 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Call transition notifier only once for each policy

Currently, the notifiers are called once for each CPU of the policy->cpus
cpumask. It would be more optimal if the notifier can be called only
once and all the relevant information be provided to it. Out of the 23
drivers that register for the transition notifiers today, only 4 of them
do per-cpu updates and the callback for the rest can be called only once
for the policy without any impact.

This would also avoid multiple function calls to the notifier callbacks
and reduce multiple iterations of notifier core's code (which does
locking as well).

This patch adds pointer to the cpufreq policy to the struct
cpufreq_freqs, so the notifier callback has all the information
available to it with a single call. The five drivers which perform
per-cpu updates are updated to use the cpufreq policy. The freqs->cpu
field is redundant now and is removed.

Acked-by: David S. Miller <davem@davemloft.net> (sparc)
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 4ebe36c9 30-Apr-2019 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Fix kobject memleak

Currently the error return path from kobject_init_and_add() is not
followed by a call to kobject_put() - which means we are leaking the
kobject.

Fix it by adding a call to kobject_put() in the error path of
kobject_init_and_add().

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Tobin C. Harding <tobin@kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 4db7c34c 19-Apr-2019 Yue Hu <huyue2@yulong.com>

cpufreq: Move ->get callback check outside of __cpufreq_get()

Currenly, __cpufreq_get() called by show_cpuinfo_cur_freq() will check
->get callback. That is needless since cpuinfo_cur_freq attribute will
not be created if ->get is not set. So let's drop it in __cpufreq_get().
Also keep this check in cpufreq_get().

Signed-off-by: Yue Hu <huyue2@yulong.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# b23aa311 15-Apr-2019 Yue Hu <huyue2@yulong.com>

cpufreq: Remove needless bios_limit check in show_bios_limit()

Initially, bios_limit attribute will be created if driver->bios_limit
is set in cpufreq_add_dev_interface(). So remove the redundant check
for latter show operation.

Signed-off-by: Yue Hu <huyue2@yulong.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# d75f773c 25-Mar-2019 Sakari Ailus <sakari.ailus@linux.intel.com>

treewide: Switch printk users from %pf and %pF to %ps and %pS, respectively

%pF and %pf are functionally equivalent to %pS and %ps conversion
specifiers. The former are deprecated, therefore switch the current users
to use the preferred variant.

The changes have been produced by the following command:

git grep -l '%p[fF]' | grep -v '^\(tools\|Documentation\)/' | \
while read i; do perl -i -pe 's/%pf/%ps/g; s/%pF/%pS/g;' $i; done

And verifying the result.

Link: http://lkml.kernel.org/r/20190325193229.23390-1-sakari.ailus@linux.intel.com
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: sparclinux@vger.kernel.org
Cc: linux-um@lists.infradead.org
Cc: xen-devel@lists.xenproject.org
Cc: linux-acpi@vger.kernel.org
Cc: linux-pm@vger.kernel.org
Cc: drbd-dev@lists.linbit.com
Cc: linux-block@vger.kernel.org
Cc: linux-mmc@vger.kernel.org
Cc: linux-nvdimm@lists.01.org
Cc: linux-pci@vger.kernel.org
Cc: linux-scsi@vger.kernel.org
Cc: linux-btrfs@vger.kernel.org
Cc: linux-f2fs-devel@lists.sourceforge.net
Cc: linux-mm@kvack.org
Cc: ceph-devel@vger.kernel.org
Cc: netdev@vger.kernel.org
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Acked-by: David Sterba <dsterba@suse.com> (for btrfs)
Acked-by: Mike Rapoport <rppt@linux.ibm.com> (for mm/memblock.c)
Acked-by: Bjorn Helgaas <bhelgaas@google.com> (for drivers/pci)
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>


# 89f98d7e 08-Apr-2019 Yue Hu <huyue2@yulong.com>

cpufreq: Remove cpufreq_driver check in cpufreq_boost_supported()

Currently there are three calling paths for cpufreq_boost_supported() in
all as below, we can see the cpufreq_driver null check is needless since
it is already checked before.

<path1>
cpufreq_enable_boost_support()
|-> if (!cpufreq_driver)
|-> cpufreq_boost_supported()

<path2>
cpufreq_register_driver()
|-> if (!driver_data ...
|-> cpufreq_driver = driver_data
|-> cpufreq_boost_supported()
|-> remove_boost_sysfs_file()
|-> cpufreq_boost_supported()

<path3>
cpufreq_unregister_driver()
|-> if (!cpufreq_driver ...
|-> remove_boost_sysfs_file()
|-> cpufreq_boost_supported()

Signed-off-by: Yue Hu <huyue2@yulong.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 9083e498 25-Mar-2019 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq: intel_pstate: Update max frequency on global turbo changes

While the cpuinfo.max_freq value doesn't really matter for
intel_pstate in the active mode, in the passive mode it is used by
governors as the maximum physical frequency of the CPU and the
results of governor computations generally depend on it. Also it
is made available to user space via sysfs and it should match the
current HW configuration.

For this reason, make intel_pstate update cpuinfo.max_freq for all
CPUs if it detects a global change of turbo frequency settings from
"disable" to "enable" or the other way associated with a _PPC change
notification from the platform firmware.

Note that policy_is_inactive(), cpufreq_cpu_acquire(),
cpufreq_cpu_release(), and cpufreq_set_policy() need to be made
available to it for this purpose.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=200759
Reported-by: Gabriele Mazzotta <gabriele.mzt@gmail.com>
Tested-by: Gabriele Mazzotta <gabriele.mzt@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>


# 540a3758 25-Mar-2019 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq: Add cpufreq_cpu_acquire() and cpufreq_cpu_release()

It sometimes is necessary to find a cpufreq policy for a given CPU
and acquire its rwsem (for writing) immediately after that, so
introduce cpufreq_cpu_acquire() as a helper for that and the
complementary cpufreq_cpu_release().

Make cpufreq_update_policy() use the new functions.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>


# 5a25e3f7 25-Mar-2019 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq: intel_pstate: Driver-specific handling of _PPC updates

In some cases, the platform firmware disables or enables turbo
frequencies for all CPUs globally before triggering a _PPC change
notification for one of them. Obviously, that global change affects
all CPUs, not just the notified one, and it needs to be acted upon by
cpufreq.

The intel_pstate driver is able to detect such global changes of
the settings, but it also needs to update policy limits for all
CPUs if that happens, in particular if turbo frequencies are
enabled globally - to allow them to be used.

For this reason, introduce a new cpufreq driver callback to be
invoked on _PPC notifications, if present, instead of simply
calling cpufreq_update_policy() for the notified CPU and make
intel_pstate use it to trigger policy updates for all CPUs
in the system if global settings change.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=200759
Reported-by: Gabriele Mazzotta <gabriele.mzt@gmail.com>
Tested-by: Gabriele Mazzotta <gabriele.mzt@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>


# 5d094fea 05-Mar-2019 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq: Improve kerneldoc comments for cpufreq_cpu_get/put()

Fix the formatting of the cpufreq_cpu_get() and cpufreq_cpu_put()
kerneldoc comments and rework them to be somewhat easier to follow.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 167a38dc 19-Feb-2019 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq: Pass updated policy to driver ->setpolicy() callback

The invocation of the ->setpolicy() cpufreq driver callback should
be equivalent to calling cpufreq_governor_limits(policy) for drivers
with internal governors, but in fact it isn't so, because the
temporary new_policy object is passed to it instead of the updated
policy.

That is a bit confusing, so make cpufreq_set_policy() pass the
updated policy to the driver ->setpolicy() callback.

No intentional changes of behavior.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>


# 2bb4059e 19-Feb-2019 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq: Fix two debug messages in cpufreq_set_policy()

Remove the redundant "cpufreq:" prefix from two debug messages in
cpufreq_set_policy().

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>


# 348a2ec5 19-Feb-2019 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq: Reorder and simplify cpufreq_update_policy()

In cpufreq_update_policy(), instead of updating new_policy.cur
separately, which is kind of confusing, because cpufreq_set_policy()
doesn't take that value into account directly anyway, make the copy
of the existing policy after calling cpufreq_update_current_freq().

No intentional changes of behavior.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>


# a0dbb819 19-Feb-2019 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq: Add kerneldoc comments for two core functions

Add kerneldoc comments describing cpufreq_set_policy() and
cpufreq_update_policy() as they have not been properly documented
so far and they really need to be documented.

While at it, fix white space around the cpufreq_set_policy() header.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>


# a9a22b57 14-Feb-2019 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Replace double NOT (!!) with single NOT (!)

Double NOT (!!) operation is normally done to convert a non-zero value
to 1 and keep zero as is, but that isn't the requirement in this case.
All we wanted was to make sure that only one of the two routines isn't
set, i.e. either both function pointers are set or both are unset.

This can be done with a single NOT (!) operation as well.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 91a12e91 12-Feb-2019 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Allow light-weight tear down and bring up of CPUs

The cpufreq core doesn't remove the cpufreq policy anymore on CPU
offline operation, rather that happens when the CPU device gets
unregistered from the kernel. This allows faster recovery when the CPU
comes back online. This is also very useful during system wide
suspend/resume where we offline all non-boot CPUs during suspend and
then bring them back on resume.

This commit takes the same idea a step ahead to allow drivers to do
light weight tear-down and bring-up during CPU offline and online
operations.

A new set of callbacks is introduced, online/offline(). online() gets
called when the first CPU of an inactive policy is brought up and
offline() gets called when all the CPUs of a policy are offlined.

The existing init/exit() callback get called on policy
creation/destruction. They also get called instead of online/offline()
callbacks if the online/offline() callbacks aren't provided.

This also moves around some code to get executed only for the new-policy
case going forward.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 5c238a8b 29-Jan-2019 Amit Kucheria <amit.kucheria@linaro.org>

cpufreq: Auto-register the driver as a thermal cooling device if asked

All cpufreq drivers do similar things to register as a cooling device.
Provide a cpufreq driver flag so drivers can just ask the cpufreq core
to register the cooling device on their behalf. This allows us to get
rid of duplicated code in the drivers.

In order to allow this, we add a struct thermal_cooling_device pointer
to struct cpufreq_policy so that drivers don't need to store it in a
private data structure.

Suggested-by: Stephen Boyd <swboyd@chromium.org>
Suggested-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Amit Kucheria <amit.kucheria@linaro.org>
Reviewed-by: Matthias Kaehlcke <mka@chromium.org>
Tested-by: Matthias Kaehlcke <mka@chromium.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 625c85a6 24-Jan-2019 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Use struct kobj_attribute instead of struct global_attr

The cpufreq_global_kobject is created using kobject_create_and_add()
helper, which assigns the kobj_type as dynamic_kobj_ktype and show/store
routines are set to kobj_attr_show() and kobj_attr_store().

These routines pass struct kobj_attribute as an argument to the
show/store callbacks. But all the cpufreq files created using the
cpufreq_global_kobject expect the argument to be of type struct
attribute. Things work fine currently as no one accesses the "attr"
argument. We may not see issues even if the argument is used, as struct
kobj_attribute has struct attribute as its first element and so they
will both get same address.

But this is logically incorrect and we should rather use struct
kobj_attribute instead of struct global_attr in the cpufreq core and
drivers and the show/store callbacks should take struct kobj_attribute
as argument instead.

This bug is caught using CFI CLANG builds in android kernel which
catches mismatch in function prototypes for such callbacks.

Reported-by: Donghee Han <dh.han@samsung.com>
Reported-by: Sangkyu Kim <skwith.kim@samsung.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 21469df4 06-Jan-2019 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Don't update new_policy on failures

The local variable "new_policy" hasn't been used in the error path of
cpufreq_online() since commit f9f41e3ef99a (cpufreq: Remove policy
create/remove notifiers). Don't update it in that error path.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
[ rjw: Changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 2f661962 07-Jan-2019 Sudeep Holla <sudeep.holla@arm.com>

cpufreq: check if policy is inactive early in __cpufreq_get()

cpuinfo_cur_freq gets current CPU frequency as detected by hardware
while scaling_cur_freq last known CPU frequency. Some platforms may not
allow checking the CPU frequency of an offline CPU or the associated
resources may have been released via cpufreq_exit when the CPU gets
offlined, in which case the policy would have been invalidated already.
If we attempt to get current frequency from the hardware, it may result
in hang or crash.

For example on Juno, I see:

Unable to handle kernel NULL pointer dereference at virtual address 0000000000000188
[0000000000000188] pgd=0000000000000000
Internal error: Oops: 96000004 [#1] PREEMPT SMP
Modules linked in:
CPU: 5 PID: 4202 Comm: cat Not tainted 4.20.0-08251-ga0f2c0318a15-dirty #87
Hardware name: ARM LTD ARM Juno Development Platform/ARM Juno Development Platform
pstate: 40000005 (nZcv daif -PAN -UAO)
pc : scmi_cpufreq_get_rate+0x34/0xb0
lr : scmi_cpufreq_get_rate+0x34/0xb0
Call trace:
scmi_cpufreq_get_rate+0x34/0xb0
__cpufreq_get+0x34/0xc0
show_cpuinfo_cur_freq+0x24/0x78
show+0x40/0x60
sysfs_kf_seq_show+0xc0/0x148
kernfs_seq_show+0x44/0x50
seq_read+0xd4/0x480
kernfs_fop_read+0x15c/0x208
__vfs_read+0x60/0x188
vfs_read+0x94/0x150
ksys_read+0x6c/0xd8
__arm64_sys_read+0x24/0x30
el0_svc_common+0x78/0x100
el0_svc_handler+0x38/0x78
el0_svc+0x8/0xc
---[ end trace 3d1024e58f77f6b2 ]---

So fix the issue by checking if the policy is invalid early in
__cpufreq_get before attempting to get the current frequency.

Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 531b5c9f 03-Dec-2018 Quentin Perret <qperret@qperret.net>

sched/topology: Make Energy Aware Scheduling depend on schedutil

Energy Aware Scheduling (EAS) is designed with the assumption that
frequencies of CPUs follow their utilization value. When using a CPUFreq
governor other than schedutil, the chances of this assumption being true
are small, if any. When schedutil is being used, EAS' predictions are at
least consistent with the frequency requests. Although those requests
have no guarantees to be honored by the hardware, they should at least
guide DVFS in the right direction and provide some hope in regards to the
EAS model being accurate.

To make sure EAS is only used in a sane configuration, create a strong
dependency on schedutil being used. Since having sugov compiled-in does
not provide that guarantee, make CPUFreq call a scheduler function on
governor changes hence letting it rebuild the scheduling domains, check
the governors of the online CPUs, and enable/disable EAS accordingly.

Signed-off-by: Quentin Perret <quentin.perret@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: adharmap@codeaurora.org
Cc: chris.redpath@arm.com
Cc: currojerez@riseup.net
Cc: dietmar.eggemann@arm.com
Cc: edubezval@gmail.com
Cc: gregkh@linuxfoundation.org
Cc: javi.merino@kernel.org
Cc: joel@joelfernandes.org
Cc: juri.lelli@redhat.com
Cc: morten.rasmussen@arm.com
Cc: patrick.bellasi@arm.com
Cc: pkondeti@codeaurora.org
Cc: skannan@codeaurora.org
Cc: smuckle@google.com
Cc: srinivas.pandruvada@linux.intel.com
Cc: thara.gopinath@linaro.org
Cc: tkjos@google.com
Cc: valentin.schneider@arm.com
Cc: vincent.guittot@linaro.org
Cc: viresh.kumar@linaro.org
Link: https://lkml.kernel.org/r/20181203095628.11858-9-quentin.perret@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 0e7ea2f3 07-Sep-2018 Igor Stoppa <igor.stoppa@gmail.com>

cpufreq: remove unnecessary unlikely()

WARN_ON() already contains an unlikely(), so it's not necessary to wrap it
into another.

Signed-off-by: Igor Stoppa <igor.stoppa@huawei.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 9b3d9bb3 24-Jul-2018 Waiman Long <longman@redhat.com>

cpufreq: Fix a circular lock dependency problem

With lockdep turned on, the following circular lock dependency problem
was reported:

[ 57.470040] ======================================================
[ 57.502900] WARNING: possible circular locking dependency detected
[ 57.535208] 4.18.0-0.rc3.1.el8+7.x86_64+debug #1 Tainted: G
[ 57.577761] ------------------------------------------------------
[ 57.609714] tuned/1505 is trying to acquire lock:
[ 57.633808] 00000000559deec5 (cpu_hotplug_lock.rw_sem){++++}, at: store+0x27/0x120
[ 57.672880]
[ 57.672880] but task is already holding lock:
[ 57.702184] 000000002136ca64 (kn->count#118){++++}, at: kernfs_fop_write+0x1d0/0x410
[ 57.742176]
[ 57.742176] which lock already depends on the new lock.
[ 57.742176]
[ 57.785220]
[ 57.785220] the existing dependency chain (in reverse order) is:
:
[ 58.932512] other info that might help us debug this:
[ 58.932512]
[ 58.973344] Chain exists of:
[ 58.973344] cpu_hotplug_lock.rw_sem --> subsys mutex#5 --> kn->count#118
[ 58.973344]
[ 59.030795] Possible unsafe locking scenario:
[ 59.030795]
[ 59.061248] CPU0 CPU1
[ 59.085377] ---- ----
[ 59.108160] lock(kn->count#118);
[ 59.124935] lock(subsys mutex#5);
[ 59.156330] lock(kn->count#118);
[ 59.186088] lock(cpu_hotplug_lock.rw_sem);
[ 59.208541]
[ 59.208541] *** DEADLOCK ***

In the cpufreq_register_driver() function, the lock sequence is:

cpus_read_lock --> kn->count

For the cpufreq sysfs store method, the lock sequence is:

kn->count --> cpus_read_lock

These sequences are actually safe as they are taking a share lock on
cpu_hotplug_lock. However, the current lockdep code doesn't check for
share locking when detecting circular lock dependency. Fixing that
could be a substantial effort.

Instead, we can work around this problem by using cpus_read_trylock()
in the store method which is much simpler. The chance of not getting
the read lock is very small. If that happens, the userspace application
that writes the sysfs file will get an error.

Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 601b2185 24-Jul-2018 Ruchi Kandoi <kandoiruchi@google.com>

cpufreq: trace frequency limits change

systrace used for tracing for Android systems has carried a patch for
many years in the Android tree that traces when the cpufreq limits
change. With the help of this information, systrace can know when the
policy limits change and can visually display the data. Lets add
upstream support for the same.

Signed-off-by: Ruchi Kandoi <kandoiruchi@google.com>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# cc85de36 24-May-2018 Sebastian Andrzej Siewior <bigeasy@linutronix.de>

cpufreq: Use static SRCU initializer

Use the static SRCU initializer for `cpufreq_transition_notifier_list'.
This avoids the init_cpufreq_transition_notifier_list() initcall. Its
only purpose is to initialize the SRCU notifier once during boot and set
another variable which is used as an indicator whether the init was
perfromed before cpufreq_register_notifier() was used.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# c7d1f119 26-May-2018 Tao Wang <kevin.wangtao@hisilicon.com>

cpufreq: Fix new policy initialization during limits updates via sysfs

If the policy limits are updated via cpufreq_update_policy() and
subsequently via sysfs, the limits stored in user_policy may be
set incorrectly.

For example, if both min and max are set via sysfs to the maximum
available frequency, user_policy.min and user_policy.max will also
be the maximum. If a policy notifier triggered by
cpufreq_update_policy() lowers both the min and the max at this
point, that change is not reflected by the user_policy limits, so
if the max is updated again via sysfs to the same lower value,
then user_policy.max will be lower than user_policy.min which
shouldn't happen. In particular, if one of the policy CPUs is
then taken offline and back online, cpufreq_set_policy() will
fail for it due to a failing limits check.

To prevent that from happening, initialize the min and max fields
of the new_policy object to the ones stored in user_policy that
were previously set via sysfs.

Signed-off-by: Kevin Wangtao <kevin.wangtao@hisilicon.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
[ rjw: Subject & changelog ]
Cc: All applicable <stable@vger.kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 20b5324d 10-May-2018 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: optimize cpufreq_notify_transition()

cpufreq_notify_transition() calls __cpufreq_notify_transition() for each
CPU of a policy. There is a lot of code in __cpufreq_notify_transition()
though which isn't required to be executed for each CPU, like checking
about disabled cpufreq or irqs, adjusting jiffies, updating cpufreq
stats and some debug print messages.

This commit merges __cpufreq_notify_transition() into
cpufreq_notify_transition() and modifies cpufreq_notify_transition() to
execute minimum amount of code for each CPU.

Also fix the kerneldoc for cpufreq_notify_transition() while at it.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 92c99d15 25-Feb-2018 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Don't validate cpufreq table from cpufreq_generic_init()

The cpufreq table is already validated by the cpufreq core and none of
the users of cpufreq_generic_init() have any dependency on it to
validate the table as well.

Don't validate the cpufreq table anymore from cpufreq_generic_init().

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# d417e069 21-Feb-2018 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Validate frequency table in the core

By design, cpufreq drivers are responsible for calling
cpufreq_frequency_table_cpuinfo() from their ->init()
callbacks to validate the frequency table.

However, if a cpufreq driver is buggy and fails to do so properly, it
lead to unexpected behavior of the driver or the cpufreq core at a
later point in time. It would be better if the core could
validate the frequency table during driver initialization.

To that end, introduce cpufreq_table_validate_and_sort() and make
the cpufreq core call it right after invoking the ->init() callback
of the driver and destroy the cpufreq policy if the table is invalid.

For the time being the validation of the table happens twice, once
from the driver and then from the core. The individual drivers will
be updated separately to drop table validation if they don't need it
for other reasons.

The frequency table is marked "sorted" or "unsorted" by the new helper
now instead of in cpufreq_table_validate_and_show(), as it should only
be done after validating the table (which the drivers won't do going
forward).

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
[ rjw: Subject/changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# b24b6478 21-Feb-2018 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Reorder cpufreq_online() error code path

Ideally the de-allocation of resources should happen in the exact
opposite order in which they were allocated. It helps maintain the code
in long term, even if nothing really breaks with incorrect ordering.

That wasn't followed in cpufreq_online() and it has some
inconsistencies. For example, the symlinks were created from within
the locked region while they are removed only after putting the locks.
Also ->exit() should have been called only after the symlinks are
removed and the lock is dropped, as that was the case when ->init()
was first called.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
[ rjw: Subject ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 703cbaa6 23-Jan-2018 Bo Yan <byan@nvidia.com>

cpufreq: Skip cpufreq resume if it's not suspended

cpufreq_resume can be called even without preceding cpufreq_suspend.
This can happen in following scenario:

suspend_devices_and_enter
--> dpm_suspend_start
--> dpm_prepare
--> device_prepare : this function errors out
--> dpm_suspend: this is skipped due to dpm_prepare failure
this means cpufreq_suspend is skipped over
--> goto Recover_platform, due to previous error
--> goto Resume_devices
--> dpm_resume_end
--> dpm_resume
--> cpufreq_resume

In case schedutil is used as frequency governor, cpufreq_resume will
eventually call sugov_start, which does following:

memset(sg_cpu, 0, sizeof(*sg_cpu));
....

This effectively erases function pointer for frequency update, causing
crash later on. The function pointer would have been set correctly if
subsequent cpufreq_add_update_util_hook runs successfully, but that
function returns earlier because cpufreq_suspend was not called:

if (WARN_ON(per_cpu(cpufreq_update_util_data, cpu)))
return;

The fix is to check cpufreq_suspended first, if it's false, that means
cpufreq_suspend was not called in the first place, so do not resume
cpufreq.

Signed-off-by: Bo Yan <byan@nvidia.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
[ rjw: Dropped printing a message ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# a8b149d3 23-Nov-2017 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq: Fix governor module removal race

It is possible to remove a cpufreq governor module after
cpufreq_parse_governor() has returned success in
store_scaling_governor() and before cpufreq_set_policy()
acquires a reference to it, because the governor list is
not protected during that period and nothing prevents the
governor from being unregistered then.

Prevent that from happening by acquiring an extra reference
to the governor module temporarily in cpufreq_parse_governor(),
under cpufreq_governor_mutex, and dropping it in
store_scaling_governor(), when cpufreq_set_policy() returns.

Note that the second cpufreq_parse_governor() call site is fine,
because it only cares about the policy member of new_policy.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 70d1ff71 22-Nov-2017 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq: Drop pointless return statement

Drop a pointless return statement from cpufreq_unregister_governor().

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# ae0ff89f 22-Nov-2017 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq: Pass policy pointer to cpufreq_parse_governor()

Pass policy pointer to cpufreq_parse_governor() instead of passing
pointers to two members of it so as to make the code slightly more
straightforward.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 045149e6 22-Nov-2017 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq: Clean up cpufreq_parse_governor()

Drop an unnecessary local variable from cpufreq_parse_governor()
and rearrange the code in there to make it easier to follow.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# e7d5459d 26-Sep-2017 Dietmar Eggemann <dietmar.eggemann@arm.com>

cpufreq: provide default frequency-invariance setter function

Frequency-invariant accounting support based on the ratio of current
frequency and maximum supported frequency is an optional feature an arch
can implement.

Since there are cpufreq drivers (e.g. cpufreq-dt) which can be build for
different arch's a default implementation of the frequency-invariance
setter function arch_set_freq_scale() is needed.

This default implementation is an empty weak function which will be
overwritten by a strong function in case the arch provides one.

The setter function passes the cpumask of related (to the frequency
change) cpus (online and offline cpus), the (new) current frequency and
the maximum supported frequency.

Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# e948bc8f 16-Aug-2017 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Cap the default transition delay value to 10 ms

If transition_delay_us isn't defined by the cpufreq driver, the default
value of transition delay (time after which the cpufreq governor will
try updating the frequency again) is currently calculated by multiplying
transition_latency (nsec) with LATENCY_MULTIPLIER (1000) and then
converting this time to usec. That gives the exact same value as
transition_latency, just that the time unit is usec instead of nsec.

With acpi-cpufreq for example, transition_latency is set to around 10
usec and we get transition delay as 10 ms. Which seems to be a
reasonable amount of time to reevaluate the frequency again.

But for platforms where frequency switching isn't that fast (like ARM),
the transition_latency varies from 500 usec to 3 ms, and the transition
delay becomes 500 ms to 3 seconds. Of course, that is a pretty bad
default value to start with.

We can try to come across a better formula (instead of multiplying with
LATENCY_MULTIPLIER) to solve this problem, but will that be worth it ?

This patch tries a simple approach and caps the maximum value of default
transition delay to 10 ms. Of course, userspace can still come in and
change this value anytime or individual drivers can rather provide
transition_delay_us instead.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 209887e6 08-Aug-2017 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Return 0 from ->fast_switch() on errors

CPUFREQ_ENTRY_INVALID is a special symbol which is used to specify that
an entry in the cpufreq table is invalid. But using it outside of the
scope of the cpufreq table looks a bit incorrect.

We can represent an invalid frequency by writing it as 0 instead if we
need. Note that it is already done that way for the return value of the
->get() callback.

Lets do the same for ->fast_switch() and not use CPUFREQ_ENTRY_INVALID
outside of the scope of cpufreq table.

Also update the comment over cpufreq_driver_fast_switch() to clearly
mention what this returns.

None of the drivers return CPUFREQ_ENTRY_INVALID as of now from
->fast_switch() callback and so we don't need to update any of those.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# fc4c709f 19-Jul-2017 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Allow dynamic switching with CPUFREQ_ETERNAL latency

With the recent updates, CPUFREQ_ETERNAL is only used by the drivers
which don't know their transition latency but want to use dynamic
switching.

Anyway, the routine cpufreq_policy_transition_delay_us() caps the value
of transition latency to 10 ms now and that can be used safely with such
platforms.

Remove the check from cpufreq_init_governor() and allow dynamic
switching for such configurations as well.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# fe829ed8 19-Jul-2017 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Add CPUFREQ_NO_AUTO_DYNAMIC_SWITCHING cpufreq driver flag

The policy->transition_latency field is used for multiple purposes
today and its not straight forward at all. This is how it is used:

A. Set the correct transition_latency value.

B. Set it to CPUFREQ_ETERNAL because:
1. We don't want automatic dynamic switching (with
ondemand/conservative) to happen at all.
2. We don't know the transition latency.

This patch handles the B.1. case in a more readable way. A new flag for
the cpufreq drivers is added to disallow use of cpufreq governors which
have dynamic_switching flag set.

All the current cpufreq drivers which are setting transition_latency
unconditionally to CPUFREQ_ETERNAL are updated to use it. They don't
need to set transition_latency anymore.

There shouldn't be any functional change after this patch.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# ed4676e2 19-Jul-2017 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Replace "max_transition_latency" with "dynamic_switching"

There is no limitation in the ondemand or conservative governors which
disallow the transition_latency to be greater than 10 ms.

The max_transition_latency field is rather used to disallow automatic
dynamic frequency switching for platforms which didn't wanted these
governors to run.

Replace max_transition_latency with a boolean (dynamic_switching) and
check for transition_latency == CPUFREQ_ETERNAL along with that. This
makes it pretty straight forward to read/understand now.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# aa7519af 19-Jul-2017 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Use transition_delay_us for legacy governors as well

The policy->transition_delay_us field is used only by the schedutil
governor currently, and this field describes how fast the driver wants
the cpufreq governor to change CPUs frequency. It should rather be a
common thing across all governors, as it doesn't have any schedutil
dependency here.

Create a new helper cpufreq_policy_transition_delay_us() to get the
transition delay across all governors.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# f8475cef 23-Jun-2017 Len Brown <len.brown@intel.com>

x86: use common aperfmperf_khz_on_cpu() to calculate KHz using APERF/MPERF

The goal of this change is to give users a uniform and meaningful
result when they read /sys/...cpufreq/scaling_cur_freq
on modern x86 hardware, as compared to what they get today.

Modern x86 processors include the hardware needed
to accurately calculate frequency over an interval --
APERF, MPERF, and the TSC.

Here we provide an x86 routine to make this calculation
on supported hardware, and use it in preference to any
driver driver-specific cpufreq_driver.get() routine.

MHz is computed like so:

MHz = base_MHz * delta_APERF / delta_MPERF

MHz is the average frequency of the busy processor
over a measurement interval. The interval is
defined to be the time between successive invocations
of aperfmperf_khz_on_cpu(), which are expected to to
happen on-demand when users read sysfs attribute
cpufreq/scaling_cur_freq.

As with previous methods of calculating MHz,
idle time is excluded.

base_MHz above is from TSC calibration global "cpu_khz".

This x86 native method to calculate MHz returns a meaningful result
no matter if P-states are controlled by hardware or firmware
and/or if the Linux cpufreq sub-system is or is-not installed.

When this routine is invoked more frequently, the measurement
interval becomes shorter. However, the code limits re-computation
to 10ms intervals so that average frequency remains meaningful.

Discerning users are encouraged to take advantage of
the turbostat(8) utility, which can gracefully handle
concurrent measurement intervals of arbitrary length.

Signed-off-by: Len Brown <len.brown@intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 6c770036 26-May-2017 David Arcari <darcari@redhat.com>

cpufreq: cpufreq_register_driver() should return -ENODEV if init fails

For a driver that does not set the CPUFREQ_STICKY flag, if all of the
->init() calls fail, cpufreq_register_driver() should return an error.
This will prevent the driver from loading.

Fixes: ce1bcfe94db8 (cpufreq: check cpufreq_policy_list instead of scanning policies for all CPUs)
Cc: 4.0+ <stable@vger.kernel.org> # 4.0+
Signed-off-by: David Arcari <darcari@redhat.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# a92551e4 24-May-2017 Sebastian Andrzej Siewior <bigeasy@linutronix.de>

cpufreq: Use cpuhp_setup_state_nocalls_cpuslocked()

cpufreq holds get_online_cpus() while invoking cpuhp_setup_state_nocalls()
to make subsys_interface_register() and the registration of hotplug calls
atomic versus cpu hotplug.

cpuhp_setup_state_nocalls() invokes get_online_cpus() as well. This is
correct, but prevents the conversion of the hotplug locking to a percpu
rwsem.

Use cpuhp_setup/remove_state_nocalls_cpuslocked() to avoid the nested
call. Convert *_online_cpus() to the new interfaces while at it.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: linux-pm@vger.kernel.org
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20170524081547.731628408@linutronix.de


# c4a3fa26 08-Apr-2017 Chen Yu <yu.c.chen@intel.com>

cpufreq: Bring CPUs up even if cpufreq_online() failed

There is a report that after commit 27622b061eb4 ("cpufreq: Convert
to hotplug state machine"), the normal CPU offline/online cycle
fails on some platforms.

According to the ftrace result, this problem was triggered on
platforms using acpi-cpufreq as the default cpufreq driver,
and due to the lack of some ACPI freq method (eg. _PCT),
cpufreq_online() failed and returned a negative value, so the CPU
hotplug state machine rolled back the CPU online process. Actually,
from the user's perspective, the failure of cpufreq_online() should
not prevent that CPU from being brought up, although cpufreq might
not work on that CPU.

BTW, during system startup cpufreq_online() is not invoked via CPU
online but by the cpufreq device creation process, so the APs can be
brought up even though cpufreq_online() fails in that stage.

This patch ignores the return value of cpufreq_online/offline() and
lets the cpufreq framework deal with the failure. cpufreq_online()
itself will do a proper rollback in that case and if _PCT is missing,
the ACPI cpufreq driver will print a warning if the corresponding
debug options have been enabled.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=194581
Fixes: 27622b061eb4 ("cpufreq: Convert to hotplug state machine")
Reported-and-tested-by: Tomasz Maciej Nowak <tmn505@gmail.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: 4.9+ <stable@vger.kernel.org> # 4.9+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 2f0ba790 27-Mar-2017 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq: Fix creation of symbolic links to policy directories

The cpufreq core only tries to create symbolic links from CPU
directories in sysfs to policy directories in cpufreq_add_dev(),
either when a given CPU is registered or when the cpufreq driver
is registered, whichever happens first. That is not sufficient,
however, because cpufreq_add_dev() may be called for an offline CPU
whose policy object has not been created yet and, quite obviously,
the symbolic cannot be added in that case.

Fix that by making cpufreq_online() attempt to add symbolic links to
policy objects for the CPUs in the related_cpus mask of every new
policy object created by it.

The cpufreq_driver_lock locking around the for_each_cpu() loop
in cpufreq_online() is dropped, because it is not necessary and the
code is somewhat simpler without it. Moreover, failures to create
a symbolic link will not be regarded as hard errors any more and
the CPUs without those links will not be taken offline automatically,
but that should not be problematic in practice.

Reported-and-tested-by: Prashanth Prakash <pprakash@codeaurora.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: 4.9+ <stable@vger.kernel.org> # 4.9+


# ff010472 21-Mar-2017 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Restore policy min/max limits on CPU online

On CPU online the cpufreq core restores the previous governor (or
the previous "policy" setting for ->setpolicy drivers), but it does
not restore the min/max limits at the same time, which is confusing,
inconsistent and real pain for users who set the limits and then
suspend/resume the system (using full suspend), in which case the
limits are reset on all CPUs except for the boot one.

Fix this by making cpufreq_online() restore the limits when an inactive
policy is brought online.

The commit log and patch are inspired from Rafael's earlier work.

Reported-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: 4.3+ <stable@vger.kernel.org> # 4.3+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 9b4f603e 14-Mar-2017 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq: Fix and clean up show_cpuinfo_cur_freq()

There is a missing newline in show_cpuinfo_cur_freq(), so add it,
but while at it clean that function up somewhat too.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: All applicable <stable@vger.kernel.org>


# d82f2692 28-Feb-2017 Len Brown <len.brown@intel.com>

cpufreq: Add the "cpufreq.off=1" cmdline option

Add the "cpufreq.off=1" cmdline option.

At boot-time, this allows a user to request CONFIG_CPU_FREQ=n
behavior from a kernel built with CONFIG_CPU_FREQ=y.

This is analogous to the existing "cpuidle.off=1" option
and CONFIG_CPU_IDLE=y

This capability is valuable when we need to debug end-user
issues in the BIOS or in Linux. It is also convenient
for enabling comparisons, which may otherwise require a new kernel,
or help from BIOS SETUP, which may be buggy or unavailable.

Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# f4510146 13-Feb-2017 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq: Do not clear real_cpus mask on policy init

If new_policy is set in cpufreq_online(), the policy object has just
been created and its real_cpus mask has been zeroed on allocation,
and the driver's ->init() callback should not touch it.

It doesn't need to be cleared again, so don't do that.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>


# 052f573f 04-Jan-2017 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Remove CPUFREQ_START notifier event

Its not used anymore, remove it.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# f9f41e3e 04-Jan-2017 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Remove policy create/remove notifiers

Those were added by:

commit fcd7af917abb ("cpufreq: stats: handle cpufreq_unregister_driver()
and suspend/resume properly")

but aren't used anymore since:

commit 1aefc75b2449 ("cpufreq: stats: Make the stats code non-modular").

Remove them. Also remove the redundant parameter to the respective
routines.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 7fb1327e 30-Jan-2017 Frederic Weisbecker <fweisbec@gmail.com>

sched/cputime: Convert kcpustat to nsecs

Kernel CPU stats are stored in cputime_t which is an architecture
defined type, and hence a bit opaque and requiring accessors and mutators
for any operation.

Converting them to nsecs simplifies the code and is one step toward
the removal of cputime_t in the core code.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Link: http://lkml.kernel.org/r/1485832191-26889-4-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 30248fef 18-Nov-2016 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq: Make cpufreq_update_policy() void

The return value of cpufreq_update_policy() is never used, so make
it void.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>


# 182e36af 18-Nov-2016 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq: Avoid using inactive policies

There are two places in the cpufreq core in which low-level driver
callbacks may be invoked for an inactive cpufreq policy, which isn't
guaranteed to work in general. Both are due to possible races with
CPU offline.

First, in cpufreq_get(), the policy may become inactive after
the check against policy->cpus in cpufreq_cpu_get() and before
policy->rwsem is acquired, in which case using it going forward may
not be correct.

Second, an analogous situation is possible in cpufreq_update_policy().

Avoid using inactive policies by adding policy_is_inactive() checks
to the code in the above places.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>


# 5372e054 20-Sep-2016 Sebastian Andrzej Siewior <bigeasy@linutronix.de>

cpufreq: Fix up conversion to hotplug state machine

The function cpufreq_register_driver() returns zero on success and since
commit 27622b061eb4 ("cpufreq: Convert to hotplug state machine")
erroneously a positive number. Due to the "if (x) assume_error" construct
all callers assumed an error and as a consequence the cpu freq kworker
crashes with a NULL pointer dereference.

Reset the return value back to zero in the success case.

Fixes: 27622b061eb4 ("cpufreq: Convert to hotplug state machine")
Reported-by: Borislav Petkov <bp@alien8.de>
Reported-and-tested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: peterz@infradead.org
Cc: rjw@rjwysocki.net
Link: http://lkml.kernel.org/r/20160920145628.lp2bmq72ip3oiash@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 27622b06 06-Sep-2016 Sebastian Andrzej Siewior <bigeasy@linutronix.de>

cpufreq: Convert to hotplug state machine

Install the callbacks via the state machine.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: linux-pm@vger.kernel.org
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Viresh Kumar <viresh.kumar@linaro.or
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20160906170457.32393-13-bigeasy@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 26619804 11-Sep-2016 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: create link to policy only for registered CPUs

If a cpufreq driver is registered very early in the boot stage (e.g.
registered from postcore_initcall()), then cpufreq core may generate
kernel warnings for it.

In this case, the CPUs are brought online, then the cpufreq driver is
registered, and then the CPU topology devices are registered. However,
by the time cpufreq_add_dev() gets called, the cpu device isn't stored
in the per-cpu variable (cpu_sys_devices,) which is read by
get_cpu_device().

So the cpufreq core fails to get device for the CPU, for which
cpufreq_add_dev() was called in the first place and we will hit a
WARN_ON(!cpu_dev).

Even if we reuse the 'dev' parameter passed to cpufreq_add_dev() to
avoid that warning, there might be other CPUs online that share the
policy with the cpu for which cpufreq_add_dev() is called. Eventually
get_cpu_device() will return NULL for them as well, and we will hit the
same WARN_ON() again.

In order to fix these issues, change cpufreq core to create links to the
policy for a cpu only when cpufreq_add_dev() is called for that CPU.

Reuse the 'real_cpus' mask to track that as well.

Note that cpufreq_remove_dev() already handles removal of the links for
individual CPUs and cpufreq_add_dev() has aligned with that now.

Reported-by: Russell King <rmk+kernel@arm.linux.org.uk>
Tested-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 3689ad7e 30-Aug-2016 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq: Drop unnecessary check from cpufreq_policy_alloc()

Since cpufreq_policy_alloc() doesn't use its dev variable for
anything useful, drop that variable from there along with the
NULL check against it.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>


# ae2c1ca6 21-Jul-2016 Steve Muckle <steve.muckle@linaro.org>

cpufreq: export cpufreq_driver_resolve_freq()

Export cpufreq_driver_resolve_freq() since governors may be compiled as
modules.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Steve Muckle <smuckle@linaro.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# abe8bd02 21-Jul-2016 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Disallow ->resolve_freq() for drivers providing ->target_index()

The handlers provided by cpufreq core are sufficient for resolving the
frequency for drivers providing ->target_index(), as the core already
has the frequency table and so ->resolve_freq() isn't required for such
platforms.

This patch disallows drivers with ->target_index() callback to use the
->resolve_freq() callback.

Also, it fixes a potential kernel crash for drivers providing ->target()
but no ->resolve_freq().

Fixes: e3c062360870 "cpufreq: add cpufreq_driver_resolve_freq()"
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# e3c06236 13-Jul-2016 Steve Muckle <steve.muckle@linaro.org>

cpufreq: add cpufreq_driver_resolve_freq()

Cpufreq governors may need to know what a particular target frequency
maps to in the driver without necessarily wanting to set the frequency.
Support this operation via a new cpufreq API,
cpufreq_driver_resolve_freq(). This API returns the lowest driver
frequency equal or greater than the target frequency
(CPUFREQ_RELATION_L), subject to any policy (min/max) or driver
limitations. The mapping is also cached in the policy so that a
subsequent fast_switch operation can avoid repeating the same lookup.

The API will call a new cpufreq driver callback, resolve_freq(), if it
has been registered by the driver. Otherwise the frequency is resolved
via cpufreq_frequency_table_target(). Rather than require ->target()
style drivers to provide a resolve_freq() callback it is left to the
caller to ensure that the driver implements this callback if necessary
to use cpufreq_driver_resolve_freq().

Suggested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Steve Muckle <smuckle@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 8d540ea7 27-Jun-2016 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq: Drop redundant check from cpufreq_update_current_freq()

Both callers of cpufreq_update_current_freq(), cpufreq_update_policy()
and cpufreq_start_governor(), check cpufreq_suspended before calling
that function, so drop the redundant cpufreq_suspended check from it.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>


# 742c87bf 27-Jun-2016 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq: Avoid false-positive WARN_ON()s in cpufreq_update_policy()

CPU notifications from the firmware coming in when cpufreq is
suspended cause cpufreq_update_current_freq() to return 0 which
triggers the WARN_ON() in cpufreq_update_policy() for no reason.

Avoid that by checking cpufreq_suspended before calling
cpufreq_update_current_freq().

Fixes: c9d9c929e674 (cpufreq: Abort cpufreq_update_current_freq() for cpufreq_suspended set)
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: 4.6+ <stable@vger.kernel.org> # 4.6+


# d218ed77 02-Jun-2016 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Return index from cpufreq_frequency_table_target()

This routine can't fail unless the frequency table is invalid and
doesn't contain any valid entries.

Make it return the index and WARN() in case it is used for an invalid
table.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 23727845 02-Jun-2016 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Drop 'freq_table' argument of __target_index()

It is already present as part of the policy and so no need to pass it
from the caller. Also, 'freq_table' is guaranteed to be valid in this
function and so no need to check it.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 7ab4aabb 02-Jun-2016 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Drop freq-table param to cpufreq_frequency_table_target()

The policy already has this pointer set, use it instead.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# f8bfc116 02-Jun-2016 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Remove cpufreq_frequency_get_table()

Most of the callers of cpufreq_frequency_get_table() already have the
pointer to a valid 'policy' structure and they don't really need to go
through the per-cpu variable first and then a check to validate the
frequency, in order to find the freq-table for the policy.

Directly use the policy->freq_table field instead for them.

Only one user of that API is left after above changes, cpu_cooling.c and
it accesses the freq_table in a racy way as the policy can get freed in
between.

Fix it by using cpufreq_cpu_get() properly.

Since there are no more users of cpufreq_frequency_get_table() left, get
rid of it.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Javi Merino <javi.merino@arm.com> (cpu_cooling.c)
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 1aefc75b 31-May-2016 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq: stats: Make the stats code non-modular

The modularity of cpufreq_stats is quite problematic.

First off, the usage of policy notifiers for the initialization
and cleanup in the cpufreq_stats module is inherently racy with
respect to CPU offline/online and the initialization and cleanup
of the cpufreq driver.

Second, fast frequency switching (used by the schedutil governor)
cannot be enabled if any transition notifiers are registered, so
if the cpufreq_stats module (that registers a transition notifier
for updating transition statistics) is loaded, the schedutil governor
cannot use fast frequency switching.

On the other hand, allowing cpufreq_stats to be built as a module
doesn't really add much value. Arguably, there's not much reason
for that code to be modular at all.

For the above reasons, make the cpufreq stats code non-modular,
modify the core to invoke functions provided by that code directly
and drop the notifiers from it.

Make the stats sysfs attributes appear empty if fast frequency
switching is enabled as the statistics will not be updated in that
case anyway (and returning -EBUSY from those attributes breaks
powertop).

While at it, clean up Kconfig help for the CPU_FREQ_STAT and
CPU_FREQ_STAT_DETAILS options.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>


# 910c6e88 25-May-2016 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Use clamp_val() in __cpufreq_driver_target()

Use clamp_val() instead of open coding it.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 388612ba 23-May-2016 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Send START policy notification after sending CREATE

The sequence got a bit wrong as we are sending CPUFREQ_START
notifications even before we have sent CPUFREQ_CREATE_POLICY.

Fix it.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 9a15fb2c 18-May-2016 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq: Drop the 'initialized' field from struct cpufreq_governor

The 'initialized' field in struct cpufreq_governor is only used by
the conservative governor (as a usage counter) and the way that
happens is far from straightforward and arguably incorrect.

Namely, the value of 'initialized' is checked by
cpufreq_dbs_governor_init() and cpufreq_dbs_governor_exit() and
the results of those checks are passed (as the second argument) to
the ->init() and ->exit() callbacks in struct dbs_governor. Those
callbacks are only implemented by the ondemand and conservative
governors and ondemand doesn't use their second argument at all.
In turn, the conservative governor uses it to decide whether or not
to either register or unregister a transition notifier.

That whole mechanism is not only unnecessarily convoluted, but also
racy, because the 'initialized' field of struct cpufreq_governor is
updated in cpufreq_init_governor() and cpufreq_exit_governor() under
policy->rwsem which doesn't help if one of these functions is run
twice in parallel for different policies (which isn't impossible in
principle), for example.

Instead of it, add a proper usage counter to the conservative
governor and update it from cs_init() and cs_exit() which is
guaranteed to be non-racy, as those functions are only called
under gov_dbs_data_mutex which is global.

With that in place, drop the 'initialized' field from struct
cpufreq_governor as it is not used any more.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>


# e788892b 02-Jun-2016 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq: governor: Get rid of governor events

The design of the cpufreq governor API is not very straightforward,
as struct cpufreq_governor provides only one callback to be invoked
from different code paths for different purposes. The purpose it is
invoked for is determined by its second "event" argument, causing it
to act as a "callback multiplexer" of sorts.

Unfortunately, that leads to extra complexity in governors, some of
which implement the ->governor() callback as a switch statement
that simply checks the event argument and invokes a separate function
to handle that specific event.

That extra complexity can be eliminated by replacing the all-purpose
->governor() callback with a family of callbacks to carry out specific
governor operations: initialization and exit, start and stop and policy
limits updates. That also turns out to reduce the code size too, so
do it.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>


# b9af6948 01-Jun-2016 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq: Fix clamp_val() usage in cpufreq_driver_fast_switch()

The return value of clamp_val() has to be stored actually.

Fixes: b7898fda5bc7 (cpufreq: Support for fast frequency switching)
Reported-by: Steve Muckle <steve.muckle@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>


# a92604b4 13-May-2016 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq: Split cpufreq_governor() into simpler functions

The cpufreq_governor() routine is used by the cpufreq core to invoke
the current governor's ->governor() callback with appropriate arguments
and do some housekeeping related to that. Unfortunately, the way it
mixes different governor events in one code path makes it rather hard
to follow the code.

For this reason, split cpufreq_governor() into five simpler functions
that each will handle just one specific governor event and put all of
the code related to the given event into its own function.

This change is a prerequisite for a redesign of the cpufreq governor
API that will be done subsequently.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>


# 9e8c0a89 13-May-2016 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq: governor: Check transition latecy at init time only

It is not necessary to check the governor's max_transition_latency
attribute every time cpufreq_governor() runs, so check it only if
the event argument is CPUFREQ_GOV_POLICY_INIT.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>


# d6ff44d6 13-May-2016 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq: governor: CPUFREQ_GOV_LIMITS never fails

None of the cpufreq governors currently in the tree will ever fail
an invocation of the ->governor() callback with the event argument
equal to CPUFREQ_GOV_LIMITS (unless invoked with incorrect arguments
which doesn't matter anyway) and had it ever failed, the result of
it wouldn't have been very clean.

For this reason, rearrange the code in the core to ignore the return
value of cpufreq_governor() when called with event equal to
CPUFREQ_GOV_LIMITS.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>


# 3834abb4 16-May-2016 Pankaj Gupta <Pankaj.Gupta@spreadtrum.com>

cpufreq: simplified goto out in cpufreq_register_driver()

simplified goto out in cpufreq_register_driver for increasing
code readability

Signed-off-by: Pankaj Gupta <pankaj.gupta@spreadtrum.com>
Signed-off-by: Sanjeev Yadav <sanjeev.yadav@spreadtrum.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 45482c70 12-May-2016 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq: governor: CPUFREQ_GOV_STOP never fails

None of the cpufreq governors currently in the tree will ever fail
an invocation of the ->governor() callback with the event argument
equal to CPUFREQ_GOV_STOP (unless invoked with incorrect arguments
which doesn't matter anyway) and it is rather difficult to imagine
a valid reason for such a failure.

Accordingly, rearrange the code in the core to make it clear that
this call never fails.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>


# 36be3418 12-May-2016 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq: governor: CPUFREQ_GOV_POLICY_EXIT never fails

None of the cpufreq governors currently in the tree will ever fail
an invocation of the ->governor() callback with the event argument
equal to CPUFREQ_GOV_POLICY_EXIT (unless invoked with incorrect
arguments which doesn't matter anyway) and it wouldn't really
make sense to fail it, because the caller won't be able to handle
that failure in a meaningful way.

Accordingly, rearrange the code in the core to make it clear that
this call never fails.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>


# ba41e1bc 01-May-2016 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq: intel_pstate: Fix HWP on boot CPU after system resume

Commit 41cfd64cf49fc "Update frequencies of policy->cpus only from
->set_policy()" changed the way the intel_pstate driver's ->set_policy
callback updates the HWP (hardware-managed P-states) settings.
A side effect of it is that if those settings are modified on the
boot CPU during system suspend and wakeup, they will never be
restored during subsequent system resume.

To address this problem, allow cpufreq drivers that don't provide
->target or ->target_index callbacks to use ->suspend and ->resume
callbacks and add a ->resume callback to intel_pstate to restore
the HWP settings on the CPUs that belong to the given policy.

Fixes: 41cfd64cf49fc "Update frequencies of policy->cpus only from ->set_policy()"
Tested-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>


# c9d9c929 09-Apr-2016 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq: Abort cpufreq_update_current_freq() for cpufreq_suspended set

Since governor operations are generally skipped if cpufreq_suspended
is set, cpufreq_start_governor() should do nothing in that case.

That function is called in the cpufreq_online() path, and may also
be called from cpufreq_offline() in some cases, which are invoked
by the nonboot CPUs disabing/enabling code during system suspend
to RAM and resume. That happens when all devices have been
suspended, so if the cpufreq driver relies on things like I2C to
get the current frequency, it may not be ready to do that then.

To prevent problems from happening for this reason, make
cpufreq_update_current_freq(), which is the only function invoked
by cpufreq_start_governor() that doesn't check cpufreq_suspended
already, return 0 upfront if cpufreq_suspended is set.

Fixes: 3bbf8fe3ae08 (cpufreq: Always update current frequency before startig governor)
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>


# a794d613 06-Apr-2016 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq: Rearrange cpufreq_add_dev()

Reorganize the code in cpufreq_add_dev() to avoid using the ret
variable and reduce the indentation level in it.

No functional changes.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>


# cd73e9b0 06-Apr-2016 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq: Simplify switch () in cpufreq_cpu_callback()

Merge two switch entries that do the same thing in
cpufreq_cpu_callback().

No functional changes.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>


# 6c9d9c81 07-Apr-2016 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq: Call cpufreq_disable_fast_switch() in sugov_exit()

Due to differences in the cpufreq core's handling of runtime CPU
offline and nonboot CPUs disabling during system suspend-to-RAM,
fast frequency switching gets disabled after a suspend-to-RAM and
resume cycle on all of the nonboot CPUs.

To prevent that from happening, move the invocation of
cpufreq_disable_fast_switch() from cpufreq_exit_governor() to
sugov_exit(), as the schedutil governor is the only user of fast
frequency switching today anyway.

That simply prevents cpufreq_disable_fast_switch() from being called
without invoking the ->governor callback for the CPUFREQ_GOV_POLICY_EXIT
event (which happens during system suspend now).

Fixes: b7898fda5bc7 (cpufreq: Support for fast frequency switching)
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>


# b7898fda 29-Mar-2016 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq: Support for fast frequency switching

Modify the ACPI cpufreq driver to provide a method for switching
CPU frequencies from interrupt context and update the cpufreq core
to support that method if available.

Introduce a new cpufreq driver callback, ->fast_switch, to be
invoked for frequency switching from interrupt context by (future)
governors supporting that feature via (new) helper function
cpufreq_driver_fast_switch().

Add two new policy flags, fast_switch_possible, to be set by the
cpufreq driver if fast frequency switching can be used for the
given policy and fast_switch_enabled, to be set by the governor
if it is going to use fast frequency switching for the given
policy. Also add a helper for setting the latter.

Since fast frequency switching is inherently incompatible with
cpufreq transition notifiers, make it possible to set the
fast_switch_enabled only if there are no transition notifiers
already registered and make the registration of new transition
notifiers fail if fast_switch_enabled is set for at least one
policy.

Implement the ->fast_switch callback in the ACPI cpufreq driver
and make it set fast_switch_possible during policy initialization
as appropriate.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>


# 3bbf8fe3 21-Mar-2016 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq: Always update current frequency before startig governor

Make policy->cur match the current frequency returned by the driver's
->get() callback before starting the governor in case they went out of
sync in the meantime and drop the piece of code attempting to
resync policy->cur with the real frequency of the boot CPU from
cpufreq_resume() as it serves no purpose any more (and it's racy and
super-ugly anyway).

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>


# 999f5729 21-Mar-2016 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq: Introduce cpufreq_update_current_freq()

Move the part of cpufreq_update_policy() that obtains the current
frequency from the driver and updates policy->cur if necessary to
a separate function, cpufreq_get_current_freq().

That should not introduce functional changes and subsequent change
set will need it.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>


# 0a300767 21-Mar-2016 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq: Introduce cpufreq_start_governor()

Starting a governor in cpufreq always follows the same pattern
involving two calls to cpufreq_governor(), one with the event
argument set to CPUFREQ_GOV_START and one with that argument set to
CPUFREQ_GOV_LIMITS.

Introduce cpufreq_start_governor() that will carry out those two
operations and make all places where governors are started use it.

That slightly modifies the behavior of cpufreq_set_policy() which
now also will go back to the old governor if the second call to
cpufreq_governor() (the one with event equal to CPUFREQ_GOV_LIMITS)
fails, but that really is how it should work in the first place.

Also cpufreq_resume() will now pring an error message if the
CPUFREQ_GOV_LIMITS call to cpufreq_governor() fails, but that
makes it follow cpufreq_add_policy_cpu() and cpufreq_offline()
in that respect.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>


# c75361c0 11-Mar-2016 Richard Cochran <rcochran@linutronix.de>

cpufreq: Make cpufreq_quick_get() safe to call

The function, cpufreq_quick_get, accesses the global 'cpufreq_driver' and
its fields without taking the associated lock, cpufreq_driver_lock.

Without the locking, nothing guarantees that 'cpufreq_driver' remains
consistent during the call. This patch fixes the issue by taking the lock
before accessing the data structure.

Signed-off-by: Richard Cochran <rcochran@linutronix.de>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# adaf9fcd 10-Mar-2016 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq: Move scheduler-related code to the sched directory

Create cpufreq.c under kernel/sched/ and move the cpufreq code
related to the scheduler to that file and to sched.h.

Redefine cpufreq_update_util() as a static inline function to avoid
function calls at its call sites in the scheduler code (as suggested
by Peter Zijlstra).

Also move the definition of struct update_util_data and declaration
of cpufreq_set_update_util_data() from include/linux/cpufreq.h to
include/linux/sched.h.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>


# edd4a893 03-Mar-2016 Viresh Kumar <viresh.kumar@linaro.org>

Revert "cpufreq: postfix policy directory with the first CPU in related_cpus"

Revert commit 3510fac45492 (cpufreq: postfix policy directory with the
first CPU in related_cpus).

Earlier, the policy->kobj was added to the kobject core, before ->init()
callback was called for the cpufreq drivers. Which allowed those drivers
to add or remove, driver dependent, sysfs files/directories to the same
kobj from their ->init() and ->exit() callbacks.

That isn't possible anymore after commit 3510fac45492.

Now, there is no other clean alternative that people can adopt.

Its better to revert the earlier commit to allow cpufreq drivers to
create/remove sysfs files from ->init() and ->exit() callbacks.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 08f511fd 03-Mar-2016 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq: Reduce cpufreq_update_util() overhead a bit

Use the observation that cpufreq_update_util() is only called
by the scheduler with rq->lock held, so the callers of
cpufreq_set_update_util_data() can use synchronize_sched()
instead of synchronize_rcu() to wait for cpufreq_update_util()
to complete. Moreover, if they are updated to do that,
rcu_read_(un)lock() calls in cpufreq_update_util() might be
replaced with rcu_read_(un)lock_sched(), respectively, but
those aren't really necessary, because the scheduler calls
that function from RCU-sched read-side critical sections
already.

In addition to that, if cpufreq_set_update_util_data() checks
the func field in the struct update_util_data before setting
the per-CPU pointer to it, the data->func check may be dropped
from cpufreq_update_util() as well.

Make the above changes to reduce the overhead from
cpufreq_update_util() in the scheduler paths invoking it
and to make the cleanup after removing its callbacks less
heavy-weight somewhat.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>


# 242aa883 22-Feb-2016 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Remove 'policy->governor_enabled'

The entire sequence of events (like INIT/START or STOP/EXIT) for which
cpufreq_governor() is called, is guaranteed to be protected by
policy->rwsem now.

The additional checks that were added earlier (as we were forced to drop
policy->rwsem before calling cpufreq_governor() for EXIT event), aren't
required anymore.

Over that, they weren't sufficient really. They just take care of
START/STOP events, but not INIT/EXIT and the state machine was never
maintained properly by them.

Kill the unnecessary checks and policy->governor_enabled field.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# a1317e09 22-Feb-2016 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Rename __cpufreq_governor() to cpufreq_governor()

The __ at the beginning of the routine aren't really necessary at all.
Rename it to cpufreq_governor() instead.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 11eb69b9 22-Feb-2016 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Relocate handle_update() to kill its declaration

handle_update() is declared at the top of the file as its user appear
before its definition. Relocate the routine to get rid of this.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 99522fe6 11-Feb-2016 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Remove cpufreq_governor_lock

We used to drop policy->rwsem just before calling __cpufreq_governor()
in some cases earlier and so it was possible that __cpufreq_governor()
ran concurrently via separate threads for the same policy.

In order to guarantee valid state transitions for governors,
'governor_enabled' was required to be protected using some locking
and cpufreq_governor_lock was added for that.

But now __cpufreq_governor() is always called under policy->rwsem,
and 'governor_enabled' is protected against races even without
cpufreq_governor_lock.

Get rid of the extra lock now.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Tested-by: Juri Lelli <juri.lelli@arm.com>
Tested-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
[ rjw : Changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 49f18560 11-Feb-2016 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Call __cpufreq_governor() with policy->rwsem held

The cpufreq core code is not consistent with respect to invoking
__cpufreq_governor() under policy->rwsem.

Changing all code to always hold policy->rwsem around
__cpufreq_governor() invocations will allow us to remove
cpufreq_governor_lock that is used today because we can't
guarantee that __cpufreq_governor() isn't executed twice in
parallel for the same policy.

We should also ensure that policy->rwsem is held across governor
state changes.

For example, while adding a CPU to the policy in the CPU online path,
we need to stop the governor, change policy->cpus, start the governor
and then refresh its limits. The complete sequence must be guaranteed
to complete without interruptions by concurrent governor state
updates. That can be achieved by holding policy->rwsem around those
sequences of operations.

Also note that after this patch cpufreq_driver->stop_cpu() and
->exit() will get called under policy->rwsem which wasn't the case
earlier. That shouldn't have any side effects, though.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Tested-by: Juri Lelli <juri.lelli@arm.com>
Tested-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
[ rjw: Changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 69cee714 11-Feb-2016 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Merge cpufreq_offline_prepare/finish routines

Commit 1aee40ac9c86 (cpufreq: Invoke __cpufreq_remove_dev_finish()
after releasing cpu_hotplug.lock) split the cpufreq's CPU offline
routine in two pieces, one of them to be run with CPU offline/online
locked and the other to be called later. The reason for that split
was a possible deadlock scenario involving cpufreq sysfs attributes
and CPU offline.

However, the handling of CPU offline in cpufreq has changed since
then. Policy sysfs attributes are never removed during CPU offline,
so there's no need to worry about accessing them during CPU offline,
because that can't lead to any deadlocks now. Governor sysfs
attributes are still removed in __cpufreq_governor(_EXIT), but
there is a new kobject type for them now and its show/store
callbacks don't lock CPU offline/online (they don't need to do
that).

This means that the CPU offline code in cpufreq doesn't need to
be split any more, so combine cpufreq_offline_prepare() with
cpufreq_offline_finish().

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
[ rjw: Changelog ]
Tested-by: Juri Lelli <juri.lelli@arm.com>
Tested-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 68e80dae 08-Feb-2016 Viresh Kumar <viresh.kumar@linaro.org>

Revert "cpufreq: Drop rwsem lock around CPUFREQ_GOV_POLICY_EXIT"

Earlier, when the struct freq-attr was used to represent governor
attributes, the standard cpufreq show/store sysfs attribute callbacks
were applied to the governor tunable attributes and they always acquire
the policy->rwsem lock before carrying out the operation. That could
have resulted in an ABBA deadlock if governor tunable attributes are
removed under policy->rwsem while one of them is being accessed
concurrently (if sysfs attributes removal wins the race, it will wait
for the access to complete with policy->rwsem held while the attribute
callback will block on policy->rwsem indefinitely).

We attempted to address this issue by dropping policy->rwsem around
governor tunable attributes removal (that is, around invocations of the
->governor callback with the event arg equal to CPUFREQ_GOV_POLICY_EXIT)
in cpufreq_set_policy(), but that opened up race conditions that had not
been possible with policy->rwsem held all the time.

The previous commit, "cpufreq: governor: New sysfs show/store callbacks
for governor tunables", fixed the original ABBA deadlock by adding new
governor specific show/store callbacks.

We don't have to drop rwsem around invocations of governor event
CPUFREQ_GOV_POLICY_EXIT anymore, and original fix can be reverted now.

Fixes: 955ef4833574 (cpufreq: Drop rwsem lock around CPUFREQ_GOV_POLICY_EXIT)
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reported-by: Juri Lelli <juri.lelli@arm.com>
Tested-by: Juri Lelli <juri.lelli@arm.com>
Tested-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 34e2c555 15-Feb-2016 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq: Add mechanism for registering utilization update callbacks

Introduce a mechanism by which parts of the cpufreq subsystem
("setpolicy" drivers or the core) can register callbacks to be
executed from cpufreq_update_util() which is invoked by the
scheduler's update_load_avg() on CPU utilization changes.

This allows the "setpolicy" drivers to dispense with their timers
and do all of the computations they need and frequency/voltage
adjustments in the update_load_avg() code path, among other things.

The update_load_avg() changes were suggested by Peter Zijlstra.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Ingo Molnar <mingo@kernel.org>


# 6019d23a 25-Feb-2016 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq: Rearrange __cpufreq_driver_target()

Drop a pointless label at a return statement from
__cpufreq_driver_target() and rearrange that function
to reduce the indentation level.

No intentional functional changes.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>


# fd7dc7e6 20-Feb-2016 Eric Biggers <ebiggers3@gmail.com>

cpufreq: simplify for_each_suitable_policy() macro

Signed-off-by: Eric Biggers <ebiggers3@gmail.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 63af4055 20-Feb-2016 Eric Biggers <ebiggers3@gmail.com>

cpufreq: fix comment about return value of cpufreq_register_driver()

The comment has been incorrect since commit 4dea5806d332
("cpufreq: return EEXIST instead of EBUSY for second registering").

Signed-off-by: Eric Biggers <ebiggers3@gmail.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 6541aef0 12-Feb-2016 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq: Drop unnecessary checks from show() and store()

The show() and store() routines in the cpufreq core don't need to
check if the struct freq_attr they want to use really provides the
callbacks they need as expected (if that's not the case, it means
a bug in the code anyway), so change them to avoid doing that.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>


# de1df26b 04-Feb-2016 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq: Clean up default and fallback governor setup

The preprocessor magic used for setting the default cpufreq governor
(and for using the performance governor as a fallback one for that
matter) is really nasty, so replace it with __weak functions and
overrides.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Saravana Kannan <skannan@codeaurora.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>


# 2dadfd75 26-Jan-2016 Gautham R Shenoy <ego@linux.vnet.ibm.com>

cpufreq: Use list_is_last() to check last entry of the policy list

Currently next_policy() explicitly checks if a policy is the last
policy in the cpufreq_policy_list. Use the standard list_is_last
primitive instead.

Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 7a6c79f2 26-Dec-2015 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq: Simplify core code related to boost support

Notice that the boost_supported field in struct cpufreq_driver is
redundant, because the driver's ->set_boost callback may be left
unset if "boost" is not supported. Moreover, the only driver
populating the ->set_boost callback is acpi_cpufreq, so make it
avoid populating that callback if "boost" is not supported, rework
the core to check ->set_boost instead of boost_supported to
verify "boost" support and drop boost_supported which isn't
used any more.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>


# 41669da0 26-Dec-2015 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq: Make cpufreq_boost_supported() static

cpufreq_boost_supported() is not used outside of cpufreq.c, so make
it static.

While at it, refactor it as a one-liner (which it really is).

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>


# 69030dd1 01-Dec-2015 Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>

cpufreq: use last policy after online for drivers with ->setpolicy

For cpufreq drivers which use setpolicy interface, after offline->online
the policy is set to default. This can be reproduced by setting the
default policy of intel_pstate or longrun to ondemand and then change to
"performance". After offline and online, the setpolicy will be called with
the policy=ondemand.

For drivers using governors this condition is handled by storing
last_governor, during offline and restoring during online. The same should
be done for drivers using setpolicy interface. Storing last_policy during
offline and restoring during online.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# f344dae0 20-Nov-2015 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Always remove sysfs cpuX/cpufreq link on ->remove_dev()

Subsys interface's ->remove_dev() is called when the cpufreq driver is
unregistering or the CPU is getting physically removed. We keep removing
the cpuX/cpufreq link for all CPUs except the last one, which is a
mistake as all CPUs contain a link now.

Because of this, one CPU from each policy will still contain a link (to
an already removed policyX directory), after the cpufreq driver is
unregistered.

Fix that by removing the link first and then only see if the policy is
required to be freed. That will make sure that no links are left out.

Fixes: 96bdda61f58b ("cpufreq: create cpu/cpufreq/policyX directories")
Reported-and-tested-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 3510fac4 15-Oct-2015 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: postfix policy directory with the first CPU in related_cpus

The sysfs policy directory is postfixed currently with the CPU number
for which the policy was created, which isn't necessarily the first CPU
in related_cpus mask.

To make it more consistent and predictable, lets postfix the policy with
the first cpu in related-cpus mask.

Suggested-by: Saravana Kannan <skannan@codeaurora.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Saravana Kannan <skannan@codeaurora.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 96bdda61 15-Oct-2015 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: create cpu/cpufreq/policyX directories

The cpufreq sysfs interface had been a bit inconsistent as one of the
CPUs for a policy had a real directory within its sysfs 'cpuX' directory
and all other CPUs had links to it. That also made the code a bit
complex as we need to take care of moving the sysfs directory if the CPU
containing the real directory is getting physically hot-unplugged.

Solve this by creating 'policyX' directories (per-policy) in
/sys/devices/system/cpu/cpufreq/ directory, where X is the CPU for which
the policy was first created.

This also removes the need of keeping kobj_cpu and we can remove it now.

Suggested-by: Saravana Kannan <skannan@codeaurora.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Saravana Kannan <skannan@codeaurora.org>
Acked-by: is more of a general agreement from the person that he is
Reviewed-by: is a more strict tag and implies that the reviewer has
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# c82bd444 15-Oct-2015 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: remove cpufreq_sysfs_{create|remove}_file()

They don't do anything special now, remove the unnecessary wrapper.

Reviewed-by: Saravana Kannan <skannan@codeaurora.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 8eec1020 15-Oct-2015 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: create cpu/cpufreq at boot time

Later patches will need to create policy specific directories in
/sys/devices/system/cpu/cpufreq/ directory and so the cpufreq directory
wouldn't be ever empty.

And so no fun creating/destroying it on need basis anymore. Create it
once on system boot.

Reviewed-by: Saravana Kannan <skannan@codeaurora.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 0998a03a 15-Oct-2015 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Use cpumask_copy instead of cpumask_or to copy a mask

->related_cpus is empty at this point of time and copying ->cpus to it
or orring ->related_cpus with ->cpus would result in the same value. But
cpumask_copy makes it rather clear.

Reviewed-by: Saravana Kannan <skannan@codeaurora.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# e625742f 12-Oct-2015 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Drop redundant check for inactive policies

We just made sure policy->cpu is online and this check will always fail
as the policy is active. Drop it.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Saravana Kannan <skannan@codeaurora.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 55582bcc 07-Oct-2015 Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>

cpufreq: prevent lockup on reading scaling_available_frequencies

When scaling_available_frequencies is read on an offlined cpu, then
either lockup or junk values are displayed. This is caused by
freed freq_table, which policy is using.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 1f0bd44e 15-Sep-2015 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq: acpi-cpufreq: Use cpufreq_cpu_get_raw() in ->get()

cpufreq_cpu_get() called by get_cur_freq_on_cpu() is overkill,
because the ->get() callback is always invoked in a context in
which all of the conditions checked by cpufreq_cpu_get() are
guaranteed to be satisfied.

Use cpufreq_cpu_get_raw() instead of it and drop the
corresponding cpufreq_cpu_put() from get_cur_freq_on_cpu().

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>


# 201f3716 08-Sep-2015 Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>

cpufreq: allow cpufreq_generic_suspend() to work without suspend frequency

Some cpufreq drivers may set suspend frequency only for
selected setups but still would like to use the generic
suspend handler. Thus don't treat !policy->suspend_freq
condition as an incorrect one.

Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 63431f78 27-Jul-2015 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Use __func__ to print function's name

Its better to use __func__ to print functions name instead of writing
the name in the print statement. This also has the advantage that a
change in function's name doesn't force us to change the print message
as well.

Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# d075a88e 02-Sep-2015 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: staticize cpufreq_cpu_get_raw()

cpufreq_cpu_get_raw() isn't used by any external users, staticize it.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 36dfef23 02-Aug-2015 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: drop !cpufreq_driver check from cpufreq_parse_governor()

Driver is guaranteed to be present on a call to cpufreq_parse_governor()
and there is no need to check for !cpufreq_driver. Drop it.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 88dc4384 02-Aug-2015 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: remove redundant 'policy' field from user_policy

Its always same as policy->policy, and there is no need to keep another
copy of it. Remove it.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# e27f8bd2 02-Aug-2015 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: remove redundant 'governor' field from user_policy

Its always same as policy->governor, and there is no need to keep
another copy of it. Remove it.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 14ca0bdf 02-Aug-2015 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: update user_policy.* on success

'user_policy' caches properties of a policy that are set by userspace.
And these must be updated only if cpufreq core was successful in
updating them based on request from user space.

In store_scaling_governor(), we are updating user_policy.policy and
user_policy.governor even if cpufreq_set_policy() failed. That's
incorrect.

Fix this by updating user_policy.* only if we were successful in
updating the properties.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 8fa5b631 02-Aug-2015 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: use memcpy() to copy policy

cpufreq_get_policy() is useful if the pointer to policy isn't available
in advance. But if it is available, then there is no need to call
cpufreq_get_policy(). Directly use memcpy() to copy the policy.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 6bfb7c74 02-Aug-2015 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: remove redundant CPUFREQ_INCOMPATIBLE notifier event

What's being done from CPUFREQ_INCOMPATIBLE, can also be done with
CPUFREQ_ADJUST. There is nothing special with CPUFREQ_INCOMPATIBLE
notifier.

Kill CPUFREQ_INCOMPATIBLE and fix its usage sites.

This also updates the numbering of notifier events to remove holes.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 44139ed4 29-Jul-2015 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Allow drivers to enable boost support after registering driver

In some cases it wouldn't be known at time of driver registration, if
the driver needs to support boost frequencies.

For example, while getting boost information from DT with opp-v2
bindings, we need to parse the bindings for all the CPUs to know if
turbo/boost OPPs are supported or not.

One way out to do that efficiently is to delay supporting boost mode
(i.e. creating /sys/devices/system/cpu/cpufreq/boost file), until the
time OPP bindings are parsed.

At that point, the driver can enable boost support. This can be done at
->init(), where the frequency table is created.

To do that, the driver requires few APIs from cpufreq core that let him
do this. This patch provides these APIs.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 71db87ba 30-Jul-2015 Viresh Kumar <viresh.kumar@linaro.org>

bus: subsys: update return type of ->remove_dev() to void

Its return value is not used by the subsys core and nothing meaningful
can be done with it, even if we want to use it. The subsys device is
anyway getting removed.

Update prototype of ->remove_dev() to make its return type as void. Fix
all usage sites as well.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# fba9573b 30-Jul-2015 Pan Xinhui <xinhuix.pan@intel.com>

cpufreq: Correct a freq check in cpufreq_set_policy()

This check was originally added by commit 9c9a43ed2734 ("[CPUFREQ]
return error when failing to set minfreq").It attempt to return an error
on obviously incorrect limits when we echo xxx >.../scaling_max,min_freq
Actually we just need check if new_policy->min > new_policy->max.
Because at least one of max/min is copied from cpufreq_get_policy().

For example, when we echo xxx > .../scaling_min_freq, new_policy is
copied from policy in cpufreq_get_policy. new_policy->max is same with
policy->max. new_policy->min is set to a new value.

Let me explain it in deduction method, first statement in if ():
new_policy->min > policy->max
policy->max == new_policy->max
==> new_policy->min > new_policy->max

second statement in if():
new_policy->max < policy->min
policy->max < policy->min
==>new_policy->min > new_policy->max (induction method)

So we have proved that we only need check if new_policy->min >
new_policy->max.

After apply this patch, we can also modify ->min and ->max at same time
if new freq range is very much different from current freq range. For
example, if current freq range is 480000-960000, then we want to set
this range to 1120000-2240000, we would fail in the past because
new_policy->min > policy->max. As long as the cpufreq range is valid, we
has no reason to reject the user. So correct the check to avoid such
case.

Signed-off-by: Pan Xinhui <xinhuix.pan@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# fdd320da 29-Jul-2015 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq: Lock CPU online/offline in cpufreq_register_driver()

To protect against races with concurrent CPU online/offline, call
get_online_cpus() before registering a cpufreq driver.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>


# 194d99c7 28-Jul-2015 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq: Replace recover_policy with new_policy in cpufreq_online()

The recover_policy is unsed in cpufreq_online() to indicate whether
a new policy object is created or an existing one is reinitialized.

The "recover" part of the name is slightly confusing (it should be
"reinitialization" rather than "recovery") and the logical not (!)
operator is applied to it in almost all of the checks it is used in,
so replace that variable with a new one called "new_policy" that
will be true in the case of a new policy creation.

While at it, drop one of the labels that is jumped to from only
one spot.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>


# 0b275352 28-Jul-2015 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq: Separate CPU device registration from CPU online

To separate the CPU online interface from the CPU device
registration, split cpufreq_online() out of cpufreq_add_dev()
and make cpufreq_cpu_callback() call the former, while
cpufreq_add_dev() itself will only be used as the CPU device
addition subsystem interface callback.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Suggested-by: Russell King <linux@arm.linux.org.uk>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>


# a34e63b1 27-Jul-2015 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq: Pass CPU number to cpufreq_policy_alloc()

Change cpufreq_policy_alloc() to take a CPU number instead of a CPU
device pointer as its argument, as it is the only function called by
cpufreq_add_dev() taking a device pointer argument at this point.

That will allow us to split the CPU online part from cpufreq_add_dev()
more cleanly going forward.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>


# 4d1f3a5b 27-Jul-2015 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq: Do not update related_cpus on every policy activation

The related_cpus mask includes CPUs whose cpufreq_cpu_data per-CPU
pointers have been set the the given policy. Since those pointers
are only set at the policy creation time and unset when the policy
is deleted, the related_cpus should not be updated between those
two operations.

For this reason, avoid updating it whenever the first of the
"related" CPUs goes online.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>


# d9612a49 27-Jul-2015 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq: Drop unused dev argument from two functions

The dev argument of cpufreq_add_policy_cpu() and
cpufreq_add_dev_interface() is not used by any of them,
so drop it.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>


# d4d854d6 27-Jul-2015 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq: Drop unnecessary label from cpufreq_add_dev()

The leftover out_release_rwsem label in cpufreq_add_dev() is not
necessary any more and confusing, so drop it.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>


# 11ce707e 27-Jul-2015 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq: Drop cpufreq_policy_restore()

Notice that when cpufreq_policy_restore() is called, its per-CPU
cpufreq_cpu_data variable has been already dereferenced and if that
variable is not NULL, the policy local pointer in cpufreq_add_dev()
contains its value.

Therefore it is not necessary to dereference it again and the
policy pointer can be used directly. Moreover, if that pointer
is not NULL, the policy is inactive (or the previous check would
have made us return from cpufreq_add_dev()) so the restoration
code from cpufreq_policy_restore() can be moved to that point
in cpufreq_add_dev().

Do that and drop cpufreq_policy_restore().

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>


# 15c0b4d2 27-Jul-2015 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq: Rework two functions related to CPU offline

Since __cpufreq_remove_dev_prepare() and __cpufreq_remove_dev_finish()
are about CPU offline rather than about CPU removal, rename them to
cpufreq_offline_prepare() and cpufreq_offline_finish(), respectively.

Also change their argument from a struct device pointer to a CPU
number, because they use the CPU number only internally anyway
and make them void as their return values are ignored.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>


# 559ed407 25-Jul-2015 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq: Avoid attempts to create duplicate symbolic links

After commit 87549141d516 (cpufreq: Stop migrating sysfs files on
hotplug) there is a problem with CPUs that share cpufreq policy
objects with other CPUs and are initially offline.

Say CPU1 shares a policy with CPU0 which is online and is registered
first. As part of the registration process, cpufreq_add_dev() is
called for it. It creates the policy object and a symbolic link
to it from the CPU1's sysfs directory. If CPU1 is registered
subsequently and it is offline at that time, cpufreq_add_dev() will
attempt to create a symbolic link to the policy object for it, but
that link is present already, so a warning about that will be
triggered.

To avoid that warning, make cpufreq use an additional CPU mask
containing related CPUs that are actually present for each policy
object. That mask is initialized when the policy object is populated
after its creation (for the first online CPU using it) and it includes
CPUs from the "policy CPUs" mask returned by the cpufreq driver's
->init() callback that are physically present at that time. Symbolic
links to the policy are created only for the CPUs in that mask.

If cpufreq_add_dev() is invoked for an offline CPU, it checks the
new mask and only creates the symlink if the CPU was not in it (the
CPU is added to the mask at the same time).

In turn, cpufreq_remove_dev() drops the given CPU from the new mask,
removes its symlink to the policy object and returns, unless it is
the CPU owning the policy object. In that case, the policy object
is moved to a new CPU's sysfs directory or deleted if the CPU being
removed was the last user of the policy.

While at it, notice that cpufreq_remove_dev() can't fail, because
its return value is ignored, so make it ignore return values from
__cpufreq_remove_dev_prepare() and __cpufreq_remove_dev_finish()
and prevent these functions from aborting on errors returned by
__cpufreq_governor(). Also drop the now unused sif argument from
them.

Fixes: 87549141d516 (cpufreq: Stop migrating sysfs files on hotplug)
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reported-and-tested-by: Russell King <linux@arm.linux.org.uk>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>


# 454d3a25 22-Jul-2015 Sebastian Andrzej Siewior <bigeasy@linutronix.de>

cpufreq: Remove cpufreq_rwsem

cpufreq_rwsem was introduced in commit 6eed9404ab3c4 ("cpufreq: Use
rwsem for protecting critical sections) in order to replace
try_module_get() on the cpu-freq driver. That try_module_get() worked
well until the refcount was so heavily used that module removal became
more or less impossible.

Though when looking at the various (undocumented) protection
mechanisms in that code, the randomly sprinkeled around cpufreq_rwsem
locking sites are superfluous.

The policy, which is acquired in cpufreq_cpu_get() and released in
cpufreq_cpu_put() is sufficiently protected already.

cpufreq_cpu_get(cpu)
/* Protects against concurrent driver removal */
read_lock_irqsave(&cpufreq_driver_lock, flags);
policy = per_cpu(cpufreq_cpu_data, cpu);
kobject_get(&policy->kobj);
read_unlock_irqrestore(&cpufreq_driver_lock, flags);

The reference on the policy serializes versus module unload already:

cpufreq_unregister_driver()
subsys_interface_unregister()
__cpufreq_remove_dev_finish()
per_cpu(cpufreq_cpu_data) = NULL;
cpufreq_policy_put_kobj()

If there is a reference held on the policy, i.e. obtained prior to the
unregister call, then cpufreq_policy_put_kobj() will wait until that
reference is dropped. So once subsys_interface_unregister() returns
there is no policy pointer in flight and no new reference can be
obtained. So that rwsem protection is useless.

The other usage of cpufreq_rwsem in show()/store() of the sysfs
interface is redundant as well because sysfs already does the proper
kobject_get()/put() pairs.

That leaves CPU hotplug versus module removal. The current
down_write() around the write_lock() in cpufreq_unregister_driver() is
silly at best as it protects actually nothing.

The trivial solution to this is to prevent hotplug across
cpufreq_unregister_driver completely.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 4bc384ae 18-Jul-2015 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: propagate errors returned from __cpufreq_governor()

Return codes aren't honored properly in cpufreq_set_policy(). This can
lead to two problems:
- wrong errors propagated to sysfs
- we try to do next state-change even if the previous one failed

cpufreq_governor_dbs() now returns proper errors on all invalid
state-transition requests and this code should honor that.

Reviewed-and-tested-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 7f0fa40f 08-Jul-2015 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Properly handle errors from cpufreq_init_policy()

cpufreq_init_policy() can fail, and we don't do anything except a call
to ->exit() on that. The policy should be freed if this happens.

Do it properly.

Reported-and-tested-by: "Jon Medhurst (Tixy)" <tixy@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 8101f997 08-Jul-2015 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: cpufreq_add_dev: name goto labels based on what they do

These labels are are named in two ways normally:
- Based on what caused to jump to such labels
- Based on what we do under such labels

We follow the first naming convention today and that leads to multiple
labels for doing the same work. Fix it by switching to the second way of
naming them.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 5a31d594 09-Jul-2015 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Allow freq_table to be obtained for offline CPUs

Users of freq table may want to access it for any CPU from
policy->related_cpus mask. One such user is cpu-cooling layer. It gets a
list of 'clip_cpus' (equivalent to policy->related_cpus) during
registration and tries to get freq_table for the first CPU of this mask.

If the CPU, for which it tries to fetch freq_table, is offline,
cpufreq_frequency_get_table() fails. This happens because it relies on
cpufreq_cpu_get_raw() for its functioning which returns policy only for
online CPUs.

The fix is to access the policy data structure for the given CPU
directly (which also returns a valid policy for offline CPUs), but the
policy itself has to be active (meaning that at least one CPU using it
is online) for the frequency table to be returned.

Because we will be using 'cpufreq_cpu_data' now, which is internal to
the cpufreq core, move cpufreq_frequency_get_table() to cpufreq.c.

Reported-and-tested-by: Pi-Cheng Chen <pi-cheng.chen@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 35afd02e 09-Jul-2015 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Initialize the governor again while restoring policy

When all CPUs of a policy are hot-unplugged, we EXIT the governor but
don't mark policy->governor as NULL. This was done in order to keep last
used governor's information intact in sysfs, while the CPUs are offline.

But we also need to clear policy->governor when restoring the policy.

Because policy->governor still points to the last governor while policy
is restored, following sequence of event happens:
- cpufreq_init_policy() called while restoring policy
- find_governor() matches last_governor string for present governors and
returns last used governor's pointer, say ondemand. policy->governor
already has the same address, unless the governor was removed in
between.
- cpufreq_set_policy() is called with both old/new policies governor set
as ondemand.
- Because governors matched, we skip governor initialization and return
after calling __cpufreq_governor(CPUFREQ_GOV_LIMITS). Because the
governor wasn't initialized for this policy, it returned -EBUSY.
- cpufreq_init_policy() exits the policy on this error, but doesn't
destroy it properly (should be fixed separately).
- And so we enter a scenario where the policy isn't completely
initialized but used.

Fix this by setting policy->governor to NULL while restoring the policy.

Reported-and-tested-by: Pi-Cheng Chen <pi-cheng.chen@linaro.org>
Reported-and-tested-by: "Jon Medhurst (Tixy)" <tixy@linaro.org>
Reported-and-tested-by: Steven Rostedt <rostedt@goodmis.org>
Fixes: 18bf3a124ef8 (cpufreq: Mark policy->governor = NULL for inactive policies)
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 37829029 08-Jun-2015 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Remove cpufreq_update_policy()

cpufreq_update_policy() was kept as a separate routine earlier as it was
handling migration of sysfs directories, which isn't the case anymore.
It is only updating policy->cpu now and is called by a single caller.

The WARN_ON() isn't really required anymore, as we are just updating the
cpu now, not moving the sysfs directories.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 9591becb 09-Jun-2015 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Restart governor as soon as possible

__cpufreq_remove_dev_finish() is doing two things today:
- Restarts the governor if some CPUs from concerned policy are still
online.
- Frees the policy if all CPUs are offline.

The first task of restarting the governor can be moved to
__cpufreq_remove_dev_prepare() to restart the governor early. There is
no race between _prepare() and _finish() as they would be handling
completely different cases. _finish() will only be required if we are
going to free the policy and that has nothing to do with restarting the
governor.

Original-by: Saravana Kannan <skannan@codeaurora.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 3654c5cc 08-Jun-2015 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Call cpufreq_policy_put_kobj() from cpufreq_policy_free()

cpufreq_policy_put_kobj() is actually part of freeing the policy and can
be called from cpufreq_policy_free() directly instead of a separate
call.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 2fc3384d 08-Jun-2015 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Initialize policy->kobj while allocating policy

policy->kobj is required to be initialized once in the lifetime of a
policy. Currently we are initializing it from __cpufreq_add_dev() and
that doesn't look to be the best place for doing so as we have to do
this on special cases (like: !recover_policy).

We can initialize it from a more obvious place cpufreq_policy_alloc()
and that will make code look cleaner, specially the error handling part.

The error handling part of __cpufreq_add_dev() was doing almost the same
thing while recover_policy is true or false. Fix that as well by always
calling cpufreq_policy_put_kobj() with an additional parameter to skip
notification part of it.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 87549141 09-Jun-2015 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Stop migrating sysfs files on hotplug

When we hot-unplug a cpu, we remove its sysfs cpufreq directory and if
the outgoing cpu was the owner of policy->kobj earlier then we migrate
the sysfs directory to under another online cpu.

There are few disadvantages this brings:
- Code Complexity
- Slower hotplug/suspend/resume
- sysfs file permissions are reset after all policy->cpus are offlined
- CPUFreq stats history lost after all policy->cpus are offlined
- Special management of sysfs stuff during suspend/resume

To overcome these, this patch modifies the way sysfs directories are
managed:
- Select sysfs kobjects owner while initializing policy and don't change
it during hotplugs. Track it with kobj_cpu created earlier.

- Create symlinks for all related CPUs (can be offline) instead of
affected CPUs on policy initialization and remove them only when the
policy is freed.

- Free policy structure only on the removal of cpufreq-driver and not
during hotplug/suspend/resume, detected by checking 'struct
subsys_interface *' (Valid only when called from
subsys_interface_unregister() while unregistering driver).

Apart from this, special care is taken to handle physical hoplug of CPUs
as we wouldn't remove sysfs links or remove policies on logical
hotplugs. Physical hotplug happens in the following sequence.

Hot removal:
- CPU is offlined first, ~ 'echo 0 >
/sys/devices/system/cpu/cpuX/online'
- Then its device is removed along with all sysfs files, cpufreq core
notified with cpufreq_remove_dev() callback from subsys-interface..

Hot addition:
- First the device along with its sysfs files is added, cpufreq core
notified with cpufreq_add_dev() callback from subsys-interface..
- CPU is onlined, ~ 'echo 1 > /sys/devices/system/cpu/cpuX/online'

We call the same routines with both hotplug and subsys callbacks, and we
sense physical hotplug with cpu_offline() check in subsys callback. We
can handle most of the stuff with regular hotplug callback paths and
add/remove cpufreq sysfs links or free policy from subsys callbacks.

Original-by: Saravana Kannan <skannan@codeaurora.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 11e584cf 09-Jun-2015 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Don't allow updating inactive policies from sysfs

Later commits would change the way policies are managed today. Policies
wouldn't be freed on cpu hotplug (currently they aren't freed only for
suspend), and while the CPU is offline, the sysfs cpufreq files would
still be present.

User may accidentally try to update the sysfs files in following
directory: '/sys/devices/system/cpu/cpuX/cpufreq/'. And that would
result in undefined behavior as policy wouldn't be active then.

Apart from updating the store() routine, we also update __cpufreq_get()
which can call cpufreq_out_of_sync(). The later routine tries to update
policy->cur and starts notifying kernel about it.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Saravana Kannan <skannan@codeaurora.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 9d16f207 17-May-2015 Saravana Kannan <skannan@codeaurora.org>

cpufreq: Track cpu managing sysfs kobjects separately

In order to prepare for the next few commits, that will stop migrating
sysfs files on cpu hotplug, this patch starts managing sysfs-cpu
separately.

The behavior is still the same as we are still migrating sysfs files on
hotplug, later commits would change that.

Signed-off-by: Saravana Kannan <skannan@codeaurora.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 58405af6 22-May-2015 Shailendra Verma <shailendra.capricorn@gmail.com>

cpufreq: Fix for typos in two comments

Signed-off-by: Shailendra Verma <shailendra.capricorn@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 18bf3a12 11-May-2015 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Mark policy->governor = NULL for inactive policies

Later commits would change the way policies are managed today. Policies
wouldn't be freed on cpu hotplug (currently they aren't freed on
suspend), and while the CPU is offline, the sysfs cpufreq files would
still be present.

Because we don't mark policy->governor as NULL, it still contains
pointer of the last used governor. And if the governor is removed, while
all the CPUs of a policy are hotplugged out, this pointer wouldn't be
valid anymore. And if we try to read the 'scaling_governor', etc. from
sysfs, it will result in kernel OOPs.

To prevent this, mark policy->governor as NULL for all inactive policies
while the governor is removed from kernel.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 4573237b 11-May-2015 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Manage governor usage history with 'policy->last_governor'

History of which governor was used last is common to all CPUs within a
policy and maintaining it per-cpu isn't the best approach for sure.

Apart from wasting memory, this also increases the complexity of
managing this data structure as it has to be updated for all CPUs.

To make that somewhat simpler, lets store this information in a new
field 'last_governor' in struct cpufreq_policy and update it on removal
of last cpu of a policy.

As a side-effect it also solves an old problem, consider a system with
two clusters 0 & 1. And there is one policy per cluster.

Cluster 0: CPU0 and 1.
Cluster 1: CPU2 and 3.

- CPU2 is first brought online, and governor is set to performance
(default as cpufreq_cpu_governor wasn't set).
- Governor is changed to ondemand.
- CPU2 is taken offline and cpufreq_cpu_governor is updated for CPU2.
- CPU3 is brought online.
- Because cpufreq_cpu_governor wasn't set for CPU3, the default governor
performance is picked for CPU3.

This patch fixes the bug as we now have a single variable to update for
policy.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 9104bb26 11-May-2015 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Don't traverse all active policies to find policy for a cpu

We reach here while adding policy for a CPU and enter into the 'if'
block only if a policy already exists for the CPU.

As cpufreq_cpu_data is set for all policy->related_cpus now, when the
policy is first added, we can use that to find the CPU's policy instead
of traversing the list of all active policies.

Acked-by: Saravana Kannan <skannan@codeaurora.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 3914d379 08-May-2015 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Get rid of cpufreq_cpu_data_fallback

We can extract the same information from cpufreq_cpu_data as it is also
available for inactive policies now. And so don't need
cpufreq_cpu_data_fallback anymore.

Also add a WARN_ON() for the case where we try to restore from an active
policy.

Acked-by: Saravana Kannan <skannan@codeaurora.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 988bed09 08-May-2015 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Don't clear cpufreq_cpu_data and policy list for inactive policies

Now that we can check policy->cpus to find if policy is active or not,
we don't need to clean cpufreq_cpu_data and delete policy from the list
on light weight tear down of policies (like in suspend).

To make it consistent and clean, set cpufreq_cpu_data for all related
CPUs when the policy is first created and clean it only while it is
freed.

Also update cpufreq_cpu_get_raw() to check if cpu is part of
policy->cpus mask, so that we don't end up getting policies for offline
CPUs.

In order to make sure that no users of 'policy' are using an inactive
policy, use cpufreq_cpu_get_raw() instead of directly accessing
cpufreq_cpu_data.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# f963735a 11-May-2015 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Create for_each_{in}active_policy()

policy->cpus is cleared unconditionally now on hotplug-out of a CPU and
it can be checked to know if a policy is active or not. Create helper
routines to iterate over all active/inactive policies, based on
policy->cpus field.

Replace all instances of for_each_policy() with for_each_active_policy()
to make them iterate only for active policies. (We haven't made changes
yet to keep inactive policies in the same list, but that will be
followed in a later patch).

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 303ae723 19-Feb-2015 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Clear policy->cpus even for the last CPU

We clear policy->cpus mask while CPUs are hotplugged out. We do it for all CPUs
except the last CPU of the policy. I don't remember what the rationale behind
that was, but I couldn't think of anything that will break if we remove this
conditional clearing and always clear policy->cpus.

The benefit we get out of it is, we can know if a policy is active or not by
checking if this field is empty or not. That will be used by later commits.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Saravana Kannan <skannan@codeaurora.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# bb29ae15 19-Feb-2015 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Keep a single path for adding managed CPUs

There are two cases when we may try to add CPUs we're already handling:
- On boot, the first cpu has marked all policy->cpus managed and so we
will find policy for all other policy->cpus later on.
- When a managed cpu is hotplugged out and later brought back in.

Currently, separate paths and checks take care of the two. While the
first one is detected by testing cpu against 'policy->cpus', the other
one is detected by testing cpu against 'policy->related_cpus'.

We can handle them both via a single path and there is no need to do
special checking for the first one.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Saravana Kannan <skannan@codeaurora.org>
[ rjw: Changelog, comments ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 1b947c90 19-Feb-2015 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Throw warning when we try to get policy for an invalid CPU

Simply returning here with an error is not enough. It shouldn't be allowed at
all to try calling cpufreq_cpu_get() for an invalid CPU.

Add a WARN here to make it clear that it wouldn't be acceptable at all.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Saravana Kannan <skannan@codeaurora.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 23faf0b7 19-Feb-2015 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Merge __cpufreq_add_dev() and cpufreq_add_dev()

cpufreq_add_dev() is an unnecessary wrapper over __cpufreq_add_dev(). Merge
them.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Saravana Kannan <skannan@codeaurora.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 50e9c852 19-Feb-2015 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Add doc style comment about cpufreq_cpu_{get|put}()

This clearly states what the code inside these routines is doing and how these
must be used.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Saravana Kannan <skannan@codeaurora.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# c75de0ac 01-Apr-2015 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Schedule work for the first-online CPU on resume

All CPUs leaving the first-online CPU are hotplugged out on suspend and
and cpufreq core stops managing them.

On resume, we need to call cpufreq_update_policy() for this CPU's policy
to make sure its frequency is in sync with cpufreq's cached value, as it
might have got updated by hardware during suspend/resume.

The policies are always added to the top of the policy-list. So, in
normal circumstances, CPU 0's policy will be the last one in the list.
And so the code checks for the last policy.

But there are cases where it will fail. Consider quad-core system, with
policy-per core. If CPU0 is hotplugged out and added back again, the
last policy will be on CPU1 :(

To fix this in a proper way, always look for the policy of the first
online CPU. That way we will be sure that we are calling
cpufreq_update_policy() for the only CPU that wasn't hotplugged out.

Cc: 3.15+ <stable@vger.kernel.org> # 3.15+
Fixes: 2f0aea936360 ("cpufreq: suspend governors on system suspend/hibernate")
Reported-by: Saravana Kannan <skannan@codeaurora.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Saravana Kannan <skannan@codeaurora.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# f7b27061 27-Jan-2015 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Create for_each_governor()

To make code more readable and less error prone, lets create a helper macro for
iterating over all available governors.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Saravana Kannan <skannan@codeaurora.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# b4f0676f 27-Jan-2015 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Create for_each_policy()

To make code more readable and less error prone, lets create a helper macro for
iterating over all active policies.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 1e63eaf0 27-Jan-2015 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Drop cpufreq_disabled() check from cpufreq_cpu_{get|put}()

When cpufreq is disabled, the per-cpu variable would have been set to
NULL. Remove this unnecessary check.

[ Changelog from Saravana Kannan. ]

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Saravana Kannan <skannan@codeaurora.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 6ffae8c0 30-Jan-2015 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Set cpufreq_cpu_data to NULL before putting kobject

In __cpufreq_remove_dev_finish(), per-cpu 'cpufreq_cpu_data' needs
to be cleared before calling kobject_put(&policy->kobj) and under
cpufreq_driver_lock. Otherwise, if someone else calls cpufreq_cpu_get()
in parallel with it, they can obtain a non-NULL policy from that after
kobject_put(&policy->kobj) was executed.

Consider this case:

Thread A Thread B
cpufreq_cpu_get()
acquire cpufreq_driver_lock
read-per-cpu cpufreq_cpu_data
kobject_put(&policy->kobj);
kobject_get(&policy->kobj);
...
per_cpu(&cpufreq_cpu_data, cpu) = NULL

And this will result in a warning like this one:

------------[ cut here ]------------
WARNING: CPU: 0 PID: 4 at include/linux/kref.h:47
kobject_get+0x41/0x50()
Modules linked in: acpi_cpufreq(+) nfsd auth_rpcgss nfs_acl
lockd grace sunrpc xfs libcrc32c sd_mod ixgbe igb mdio ahci hwmon
...
Call Trace:
[<ffffffff81661b14>] dump_stack+0x46/0x58
[<ffffffff81072b61>] warn_slowpath_common+0x81/0xa0
[<ffffffff81072c7a>] warn_slowpath_null+0x1a/0x20
[<ffffffff812e16d1>] kobject_get+0x41/0x50
[<ffffffff815262a5>] cpufreq_cpu_get+0x75/0xc0
[<ffffffff81527c3e>] cpufreq_update_policy+0x2e/0x1f0
[<ffffffff810b8cb2>] ? up+0x32/0x50
[<ffffffff81381aa9>] ? acpi_ns_get_node+0xcb/0xf2
[<ffffffff81381efd>] ? acpi_evaluate_object+0x22c/0x252
[<ffffffff813824f6>] ? acpi_get_handle+0x95/0xc0
[<ffffffff81360967>] ? acpi_has_method+0x25/0x40
[<ffffffff81391e08>] acpi_processor_ppc_has_changed+0x77/0x82
[<ffffffff81089566>] ? move_linked_works+0x66/0x90
[<ffffffff8138e8ed>] acpi_processor_notify+0x58/0xe7
[<ffffffff8137410c>] acpi_ev_notify_dispatch+0x44/0x5c
[<ffffffff8135f293>] acpi_os_execute_deferred+0x15/0x22
[<ffffffff8108c910>] process_one_work+0x160/0x410
[<ffffffff8108d05b>] worker_thread+0x11b/0x520
[<ffffffff8108cf40>] ? rescuer_thread+0x380/0x380
[<ffffffff81092421>] kthread+0xe1/0x100
[<ffffffff81092340>] ? kthread_create_on_node+0x1b0/0x1b0
[<ffffffff81669ebc>] ret_from_fork+0x7c/0xb0
[<ffffffff81092340>] ? kthread_create_on_node+0x1b0/0x1b0
---[ end trace 89e66eb9795efdf7 ]---

The actual code flow is as follows:

Thread A: Workqueue: kacpi_notify

acpi_processor_notify()
acpi_processor_ppc_has_changed()
cpufreq_update_policy()
cpufreq_cpu_get()
kobject_get()

Thread B: xenbus_thread()

xenbus_thread()
msg->u.watch.handle->callback()
handle_vcpu_hotplug_event()
vcpu_hotplug()
cpu_down()
__cpu_notify(CPU_POST_DEAD..)
cpufreq_cpu_callback()
__cpufreq_remove_dev_finish()
cpufreq_policy_put_kobj()
kobject_put()

cpufreq_cpu_get() gets the policy from per-cpu variable cpufreq_cpu_data
under cpufreq_driver_lock, and once it gets a valid policy it expects it
to not be freed until cpufreq_cpu_put() is called.

But the race happens when another thread puts the kobject first and updates
cpufreq_cpu_data before or later. And so the first thread gets a valid policy
structure and before it does kobject_get() on it, the second one has already
done kobject_put().

Fix this by setting cpufreq_cpu_data to NULL before putting the kobject and that
too under locks.

Reported-by: Ethan Zhao <ethan.zhao@oracle.com>
Reported-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: 3.12+ <stable@vger.kernel.org> # 3.12+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# d9f35446 06-Jan-2015 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: remove CPUFREQ_UPDATE_POLICY_CPU notifications

CPUFREQ_UPDATE_POLICY_CPU notifications were used only from cpufreq-stats which
doesn't use it anymore. Remove them.

This also decrements values of other notification macros defined after
CPUFREQ_UPDATE_POLICY_CPU by 1 to remove gaps. Hopefully all users are using
macro's instead of direct numbers and so they wouldn't break as macro values are
changed now.

Reviewed-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 7c418ff0 06-Jan-2015 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Remove (now) unused 'last_cpu' from struct cpufreq_policy

'last_cpu' was used only from cpufreq-stats and isn't used anymore. Get rid of
it.

Reviewed-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 818c5712 01-Jan-2015 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: move some initialization stuff to cpufreq_policy_alloc()

We need to initialize completion and work only on policy allocation and not
really on the policy restore side and so we better move this piece of code to
cpufreq_policy_alloc().

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# ce1bcfe9 01-Jan-2015 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: check cpufreq_policy_list instead of scanning policies for all CPUs

CPUFREQ_STICKY flag is set by drivers which don't want to get unregistered
even if cpufreq-core isn't able to initialize policy for any CPU.

When this flag isn't set, we try to unregister the driver. To find out
which CPUs are registered and which are not, we try to check per_cpu
cpufreq_cpu_data for all CPUs. Because we have a list of valid policies
available now, we better check if the list is empty or not instead of
the 'for' loop. That will be much more efficient.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 39c132ee 01-Jan-2015 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: limit the scope of l_p_j variables

These variables are just used within adjust_jiffies() and so must be
local to it. Also there is no need of a dummy routine for CONFIG_SMP
case as we can take care of all that with help of macros in the same
routine. It doesn't look that ugly.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# d7a9771c 01-Jan-2015 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: use light-weight cpufreq_cpu_get_raw() in __cpufreq_add_dev()

We just need to check if a 'policy' is already present for the cpu we are
adding. We don't need to take all the locks and do kobject usage updates. Use
the light-weight cpufreq_cpu_get_raw() routine instead.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 7f0c020a 01-Jan-2015 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: get rid of 'tpolicy' from __cpufreq_add_dev()

There is no need of this separate variable, use 'policy' instead.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 22a7cfb0 01-Jan-2015 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: get rid of CONFIG_{HOTPLUG_CPU|SMP} mess

These are messing up more than the benefit they provide. It isn't
a lot of code anyway, that we will compile without them.

Kill them.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# bc68b7df 01-Jan-2015 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: update driver_data->flags only if we are registering driver

We should first check if a cpufreq driver is already registered or not
before updating driver_data->flags.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# d92d50a4 01-Jan-2015 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: pass policy to __cpufreq_get()

There is no point finding out the 'policy' again within __cpufreq_get()
when all the callers already have it. Just make them pass policy instead.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# a1e1dc41 01-Jan-2015 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: pass policy to cpufreq_out_of_sync

There is no point finding out the 'policy' again within cpufreq_out_of_sync()
when all the callers already have it. Just make them pass policy instead.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 2e1cc3a5 01-Jan-2015 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: No need to check for has_target()

Either we can be setpolicy or target type, nothing else. And so the
else part of setpolicy will automatically be of has_target() type.
And so we don't need to check it again.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 42f91fa1 01-Jan-2015 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: s/__find_governor/find_governor

Remove unnecessary from find_governor's name.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# db5f2995 01-Jan-2015 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: merge 'if' blocks in __cpufreq_remove_dev_prepare()

There are two 'if' blocks here, checking for !cpufreq_driver->setpolicy and
has_target(). Both are actually doing the same thing, merge them.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 09347b29 01-Jan-2015 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: don't need line break in show_scaling_cur_freq()

No need of an unnecessary line break.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# f13f1184 01-Jan-2015 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: remove extra parenthesis

We can live without it and so we should.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# e673f163 01-Jan-2015 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: remove dangling comment

It doesn't make any sense at all and is a leftover of some earlier commit.
Remove it.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 90de2a4a 23-Dec-2014 Doug Anderson <dianders@chromium.org>

cpufreq: suspend cpufreq governors on shutdown

We should stop cpufreq governors when we shut down the system. If we
don't do this, we can end up with this deadlock:

1. cpufreq governor may be running on a CPU other than CPU0.
2. In machine_restart() we call smp_send_stop() which stops CPUs.
If one of these CPUs was actively running a cpufreq governor
then it may have the mutex / spinlock needed to access the main
PMIC in the system (perhaps over I2C)
3. If a machine needs access to the main PMIC in order to shutdown
then it will never get it since the mutex was lost when the other
CPU stopped.
4. We'll hang (possibly eventually hitting the hard lockup detector).

Let's avoid the problem by stopping the cpufreq governor at shutdown,
which is a sensible thing to do anyway.

Signed-off-by: Doug Anderson <dianders@chromium.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# cb57720b 17-Dec-2014 Ethan Zhao <ethan.zhao@oracle.com>

cpufreq: fix a NULL pointer dereference in __cpufreq_governor()

If ACPI _PPC changed notification happens before governor was initiated
while kernel is booting, a NULL pointer dereference will be triggered:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000030
IP: [<ffffffff81470453>] __cpufreq_governor+0x23/0x1e0
PGD 0
Oops: 0000 [#1] SMP
... ...
RIP: 0010:[<ffffffff81470453>] [<ffffffff81470453>]
__cpufreq_governor+0x23/0x1e0
RSP: 0018:ffff881fcfbcfbb8 EFLAGS: 00010286
RAX: 0000000000000000 RBX: ffff881fd11b3980 RCX: ffff88407fc20000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff881fd11b3980
RBP: ffff881fcfbcfbd8 R08: 0000000000000000 R09: 000000000000000f
R10: ffffffff818068d0 R11: 0000000000000043 R12: 0000000000000004
R13: 0000000000000000 R14: ffffffff8196cae0 R15: 0000000000000000
FS: 0000000000000000(0000) GS:ffff881fffc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000030 CR3: 00000000018ae000 CR4: 00000000000407f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process kworker/0:3 (pid: 750, threadinfo ffff881fcfbce000, task
ffff881fcf556400)
Stack:
ffff881fffc17d00 ffff881fcfbcfc18 ffff881fd11b3980 0000000000000000
ffff881fcfbcfc08 ffffffff81470d08 ffff881fd11b3980 0000000000000007
ffff881fcfbcfc18 ffff881fffc17d00 ffff881fcfbcfd28 ffffffff81472e9a
Call Trace:
[<ffffffff81470d08>] __cpufreq_set_policy+0x1b8/0x2e0
[<ffffffff81472e9a>] cpufreq_update_policy+0xca/0x150
[<ffffffff81472f20>] ? cpufreq_update_policy+0x150/0x150
[<ffffffff81324a96>] acpi_processor_ppc_has_changed+0x71/0x7b
[<ffffffff81320bcd>] acpi_processor_notify+0x55/0x115
[<ffffffff812f9c29>] acpi_device_notify+0x19/0x1b
[<ffffffff813084ca>] acpi_ev_notify_dispatch+0x41/0x5f
[<ffffffff812f64a4>] acpi_os_execute_deferred+0x27/0x34

The root cause is a race conditon -- cpufreq core and acpi-cpufreq driver
were initiated, but cpufreq_governor wasn't and _PPC changed notification
happened, __cpufreq_governor() was called within acpi_os_execute_deferred
kernel thread context.

To fix this panic issue, add pointer checking code in __cpufreq_governor()
before pointer policy->governor is to be dereferenced.

Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 7c45cf31 26-Nov-2014 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Introduce ->ready() callback for cpufreq drivers

Currently there is no callback for cpufreq drivers which is called once the
policy is ready to be used. There are some requirements where such a callback is
required.

One of them is registering a cooling device with the help of
of_cpufreq_cooling_register(). This routine tries to get 'struct cpufreq_policy'
for CPUs which isn't yet initialed at the time ->init() is called and so we face
issues while registering the cooling device.

Because we can't register cooling device from ->init(), we need a callback that
is called after the policy is ready to be used and hence we introduce ->ready()
callback.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Eduardo Valentin <edubezval@gmail.com>
Tested-by: Eduardo Valentin <edubezval@gmail.com>
Reviewed-by: Lukasz Majewski <l.majewski@samsung.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 6d4e81ed 24-Nov-2014 Tomeu Vizoso <tomeu.vizoso@collabora.com>

cpufreq: Ref the policy object sooner

Do it before it's assigned to cpufreq_cpu_data, otherwise when a driver
tries to get the cpu frequency during initialization the policy kobj is
referenced and we get this warning:

------------[ cut here ]------------
WARNING: CPU: 1 PID: 64 at include/linux/kref.h:47 kobject_get+0x64/0x70()
Modules linked in:
CPU: 1 PID: 64 Comm: irq/77-tegra-ac Not tainted 3.18.0-rc4-next-20141114ccu-00050-g3eff942 #326
[<c0016fac>] (unwind_backtrace) from [<c001272c>] (show_stack+0x10/0x14)
[<c001272c>] (show_stack) from [<c06085d8>] (dump_stack+0x98/0xd8)
[<c06085d8>] (dump_stack) from [<c002892c>] (warn_slowpath_common+0x84/0xb4)
[<c002892c>] (warn_slowpath_common) from [<c00289f8>] (warn_slowpath_null+0x1c/0x24)
[<c00289f8>] (warn_slowpath_null) from [<c0220290>] (kobject_get+0x64/0x70)
[<c0220290>] (kobject_get) from [<c03e944c>] (cpufreq_cpu_get+0x88/0xc8)
[<c03e944c>] (cpufreq_cpu_get) from [<c03e9500>] (cpufreq_get+0xc/0x64)
[<c03e9500>] (cpufreq_get) from [<c0285288>] (actmon_thread_isr+0x134/0x198)
[<c0285288>] (actmon_thread_isr) from [<c0069008>] (irq_thread_fn+0x1c/0x40)
[<c0069008>] (irq_thread_fn) from [<c0069324>] (irq_thread+0x134/0x174)
[<c0069324>] (irq_thread) from [<c0040290>] (kthread+0xdc/0xf4)
[<c0040290>] (kthread) from [<c000f4b8>] (ret_from_fork+0x14/0x3c)
---[ end trace b7bd64a81b340c59 ]---

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 619c144c 09-Nov-2014 Vince Hsu <vinceh@nvidia.com>

cpufreq: respect the min/max settings from user space

When the user space tries to set scaling_(max|min)_freq through
sysfs, the cpufreq_set_policy() asks other driver's opinions
for the max/min frequencies. Some device drivers, like Tegra
CPU EDP which is not upstreamed yet though, may constrain the
CPU maximum frequency dynamically because of board design.
So if the user space access happens and some driver is capping
the cpu frequency at the same time, the user_policy->(max|min)
is overridden by the capped value, and that's not expected by
the user space. And if the user space is not invoked again,
the CPU will always be capped by the user_policy->(max|min)
even no drivers limit the CPU frequency any more.

This patch preserves the user specified min/max settings, so that
every time the cpufreq policy is updated, the new max/min can
be re-evaluated correctly based on the user's expection and
the present device drivers' status.

Signed-off-by: Vince Hsu <vinceh@nvidia.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 09712f55 04-Nov-2014 Geert Uytterhoeven <geert+renesas@glider.be>

cpufreq: Avoid crash in resume on SMP without OPP

When resuming from s2ram on an SMP system without cpufreq operating
points (e.g. there's no "operating-points" property for the CPU node in
DT, or the platform doesn't use DT yet), the kernel crashes when
bringing CPU 1 online:

Enabling non-boot CPUs ...
CPU1: Booted secondary processor
Unable to handle kernel NULL pointer dereference at virtual address 0000003c
pgd = ee5e6b00
[0000003c] *pgd=6e579003, *pmd=6e588003, *pte=00000000
Internal error: Oops: a07 [#1] SMP ARM
Modules linked in:
CPU: 0 PID: 1246 Comm: s2ram Tainted: G W 3.18.0-rc3-koelsch-01614-g0377af242bb175c8-dirty #589
task: eeec5240 ti: ee704000 task.ti: ee704000
PC is at __cpufreq_add_dev.isra.24+0x24c/0x77c
LR is at __cpufreq_add_dev.isra.24+0x244/0x77c
pc : [<c0298efc>] lr : [<c0298ef4>] psr: 60000153
sp : ee705d48 ip : ee705d48 fp : ee705d84
r10: c04e0450 r9 : 00000000 r8 : 00000001
r7 : c05426a8 r6 : 00000001 r5 : 00000001 r4 : 00000000
r3 : 00000000 r2 : 00000000 r1 : 20000153 r0 : c0542734

Verify that policy is not NULL before dereferencing it to fix this.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Fixes: 8414809c6a1e (cpufreq: Preserve policy structure across suspend/resume)
Cc: 3.12+ <stable@vger.kernel.org> # 3.12+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# c034b02e 13-Oct-2014 Dirk Brandewie <dirk.j.brandewie@intel.com>

cpufreq: expose scaling_cur_freq sysfs file for set_policy() drivers

Currently the core does not expose scaling_cur_freq for set_policy()
drivers this breaks some userspace monitoring tools.
Change the core to expose this file for all drivers and if the
set_policy() driver supports the get() callback use it to retrieve the
current frequency.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=73741
Cc: All applicable <stable@vger.kernel.org>
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 51315cdf 19-Oct-2014 Thomas Petazzoni <thomas.petazzoni@free-electrons.com>

cpufreq: allow driver-specific data

This commit extends the cpufreq_driver structure with an additional
'void *driver_data' field that can be filled by the ->probe() function
of a cpufreq driver to pass additional custom information to the
driver itself.

A new function called cpufreq_get_driver_data() is added to allow a
cpufreq driver to retrieve those driver data, since they are typically
needed from a cpufreq_policy->init() callback, which does not have
access to the cpufreq_driver structure. This function call is similar
to the existing cpufreq_get_current_driver() function call.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# b1b12bab 29-Sep-2014 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: update 'cpufreq_suspended' after stopping governors

Commit 8e30444e1530 ("cpufreq: fix cpufreq suspend/resume for intel_pstate")
introduced a bug where the governors wouldn't be stopped anymore for
->target{_index}() drivers during suspend. This happens because
'cpufreq_suspended' is updated before stopping the governors during suspend
and due to this __cpufreq_governor() would return early due to this check:

/* Don't start any governor operations if we are entering suspend */
if (cpufreq_suspended)
return 0;

Fixes: 8e30444e1530 ("cpufreq: fix cpufreq suspend/resume for intel_pstate")
Cc: 3.15+ <stable@vger.kernel.org> # 3.15+: 8e30444e1530 "cpufreq: fix cpufreq suspend/resume for intel_pstate"
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 7c4f4539 29-Sep-2014 Rasmus Villemoes <linux@rasmusvillemoes.dk>

cpufreq: Replace strnicmp with strncasecmp

The kernel used to contain two functions for length-delimited,
case-insensitive string comparison, strnicmp with correct semantics
and a slightly buggy strncasecmp. The latter is the POSIX name, so
strnicmp was renamed to strncasecmp, and strnicmp made into a wrapper
for the new strncasecmp to avoid breaking existing users.

To allow the compat wrapper strnicmp to be removed at some point in
the future, and to avoid the extra indirection cost, do
s/strnicmp/strncasecmp/g.

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 789ca243 29-Sep-2014 Preeti U Murthy <preeti@linux.vnet.ibm.com>

cpufreq: Allow stop CPU callback to be used by all cpufreq drivers

Commit 367dc4aa932bfb3 ("cpufreq: Add stop CPU callback to
cpufreq_driver interface") introduced the stop CPU callback for
intel_pstate drivers. During the CPU_DOWN_PREPARE stage, this
callback is invoked so that drivers can take some action on the
pstate of the cpu before it is taken offline. This callback was
assumed to be useful only for those drivers which have implemented
the set_policy CPU callback because they have no other way to take
action about the cpufreq of a CPU which is being hotplugged out
except in the exit callback which is called very late in the offline
process.

The drivers which implement the target/target_index callbacks were
expected to take care of requirements like the ones that commit
367dc4aa addresses in the GOV_STOP notification event. But there
are disadvantages to restricting the usage of stop CPU callback
to cpufreq drivers that implement the set_policy callbacks and who
want to take explicit action on the setting the cpufreq during a
hotplug operation.

1.GOV_STOP gets called for every CPU offline and drivers would usually
want to take action when the last cpu in the policy->cpus mask
is taken offline. As long as there is more than one cpu in the
policy->cpus mask, cpufreq core itself makes sure that the freq
for the other cpus in this mask is set according to the maximum load.
This is sensible and drivers which implement the target_index callback
would mostly not want to modify that. However the cpufreq core leaves a
loose end when the cpu in the policy->cpus mask is the last one to go offline;
it does nothing explicit to the frequency of the core. Drivers may need
a way to take some action here and stop CPU callback mechanism is the
best way to do it today.

2. We cannot implement driver specific actions in the GOV_STOP mechanism.
So we will need another driver callback which is invoked from here which is
unnecessary.

Therefore this patch extends the usage of stop CPU callback to be used
by all cpufreq drivers as long as they have this callback implemented
and irrespective of whether they are set_policy/target_index drivers.
The assumption is if the drivers find the GOV_STOP path to be a suitable
way of implementing what they want to do with the freq of the cpu
going offine,they will not implement the stop CPU callback at all.

Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 7106e02b 10-Sep-2014 Prarit Bhargava <prarit@redhat.com>

cpufreq: release policy->rwsem on error

While debugging a cpufreq-related hardware failure on a system I saw the
following lockdep warning:

=========================
[ BUG: held lock freed! ] 3.17.0-rc4+ #1 Tainted: G E
-------------------------
insmod/2247 is freeing memory ffff88006e1b1400-ffff88006e1b17ff, with a lock still held there!
(&policy->rwsem){+.+...}, at: [<ffffffff8156d37d>] __cpufreq_add_dev.isra.21+0x47d/0xb80
3 locks held by insmod/2247:
#0: (subsys mutex#5){+.+.+.}, at: [<ffffffff81485579>] subsys_interface_register+0x69/0x120
#1: (cpufreq_rwsem){.+.+.+}, at: [<ffffffff8156cf73>] __cpufreq_add_dev.isra.21+0x73/0xb80
#2: (&policy->rwsem){+.+...}, at: [<ffffffff8156d37d>] __cpufreq_add_dev.isra.21+0x47d/0xb80

stack backtrace:
CPU: 0 PID: 2247 Comm: insmod Tainted: G E 3.17.0-rc4+ #1
Hardware name: HP ProLiant MicroServer Gen8, BIOS J06 08/24/2013
0000000000000000 000000008f3063c4 ffff88006f87bb30 ffffffff8171b358
ffff88006bcf3750 ffff88006f87bb68 ffffffff810e09e1 ffff88006e1b1400
ffffea0001b86c00 ffffffff8156d327 ffff880073003500 0000000000000246
Call Trace:
[<ffffffff8171b358>] dump_stack+0x4d/0x66
[<ffffffff810e09e1>] debug_check_no_locks_freed+0x171/0x180
[<ffffffff8156d327>] ? __cpufreq_add_dev.isra.21+0x427/0xb80
[<ffffffff8121412b>] kfree+0xab/0x2b0
[<ffffffff8156d327>] __cpufreq_add_dev.isra.21+0x427/0xb80
[<ffffffff81724cf7>] ? _raw_spin_unlock+0x27/0x40
[<ffffffffa003517f>] ? pcc_cpufreq_do_osc+0x17f/0x17f [pcc_cpufreq]
[<ffffffff8156da8e>] cpufreq_add_dev+0xe/0x10
[<ffffffff814855d1>] subsys_interface_register+0xc1/0x120
[<ffffffff8156bcf2>] cpufreq_register_driver+0x112/0x340
[<ffffffff8121415a>] ? kfree+0xda/0x2b0
[<ffffffffa003517f>] ? pcc_cpufreq_do_osc+0x17f/0x17f [pcc_cpufreq]
[<ffffffffa003562e>] pcc_cpufreq_init+0x4af/0xe81 [pcc_cpufreq]
[<ffffffffa003517f>] ? pcc_cpufreq_do_osc+0x17f/0x17f [pcc_cpufreq]
[<ffffffff81002144>] do_one_initcall+0xd4/0x210
[<ffffffff811f7472>] ? __vunmap+0xd2/0x120
[<ffffffff81127155>] load_module+0x1315/0x1b70
[<ffffffff811222a0>] ? store_uevent+0x70/0x70
[<ffffffff811229d9>] ? copy_module_from_fd.isra.44+0x129/0x180
[<ffffffff81127b86>] SyS_finit_module+0xa6/0xd0
[<ffffffff81725b69>] system_call_fastpath+0x16/0x1b
cpufreq: __cpufreq_add_dev: ->get() failed
insmod: ERROR: could not insert module pcc-cpufreq.ko: No such device

The warning occurs in the __cpufreq_add_dev() code which does

down_write(&policy->rwsem);
...
if (cpufreq_driver->get && !cpufreq_driver->setpolicy) {
policy->cur = cpufreq_driver->get(policy->cpu);
if (!policy->cur) {
pr_err("%s: ->get() failed\n", __func__);
goto err_get_freq;
}

If cpufreq_driver->get(policy->cpu) returns an error we execute the
code at err_get_freq, which does not up the policy->rwsem. This causes
the lockdep warning.

Trivial patch to up the policy->rwsem in the error path.

After the patch has been applied, and an error occurs in the
cpufreq_driver->get(policy->cpu) call we will now see

cpufreq: __cpufreq_add_dev: ->get() failed
cpufreq: __cpufreq_add_dev: ->get() failed
modprobe: ERROR: could not insert 'pcc_cpufreq': No such device

Fixes: 4e97b631f24c (cpufreq: Initialize governor for a new policy under policy->rwsem)
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: 3.14+ <stable@vger.kernel.org> # 3.14+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 8e30444e 18-Sep-2014 Lan Tianyu <tianyu.lan@intel.com>

cpufreq: fix cpufreq suspend/resume for intel_pstate

Cpufreq core introduces cpufreq_suspended flag to let cpufreq sysfs nodes
across S2RAM/S2DISK. But the flag is only set in the cpufreq_suspend()
for cpufreq drivers which have target or target_index callback. This
skips intel_pstate driver. This patch is to set the flag before checking
target or target_index callback.

Fixes: 2f0aea936360 (cpufreq: suspend governors on system suspend/hibernate)
Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
Cc: 3.15+ <stable@vger.kernel.org> # 3.15+
[rjw: Subject]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 1bfb425b 16-Jul-2014 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: move policy kobj to update_policy_cpu()

We are calling kobject_move() from two separate places currently and both these
places share another routine update_policy_cpu() which is handling everything
around updating policy->cpu. Moving ownership of policy->kobj also lies under
the role of update_policy_cpu() routine and must be handled from there.

So, Lets move kobject_move() to update_policy_cpu() and get rid of
cpufreq_nominate_new_policy_cpu() as it doesn't have anything significant left.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 41dfd908 16-Jul-2014 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: propagate error returned by kobject_move()

We are returning -EINVAL instead of the error returned from kobject_move() when
it fails. Propagate the actual error number.

Also add a meaningful print when sysfs_create_link() fails.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 1461dc7d 16-Jul-2014 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: don't restore policy->cpus on failure to move kobj

While hot-unplugging policy->cpu, we call cpufreq_nominate_new_policy_cpu() to
nominate next owner of policy, i.e. policy->cpu. If we fail to move policy
kobject under the new policy->cpu, we try to update policy->cpus with the old
policy->cpu.

This would have been required in case old-CPU is removed from policy->cpus in
the first place. But its not done before calling
cpufreq_nominate_new_policy_cpu(), but during the POST_DEAD notification which
happens quite late in the hot-unplugging path.

So, this is just some useless code hanging around, get rid of it.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 92c14bd9 16-Jul-2014 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: move policy kobj to policy->cpu at resume

This is only relevant to implementations with multiple clusters, where clusters
have separate clock lines but all CPUs within a cluster share it.

Consider a dual cluster platform with 2 cores per cluster. During suspend we
start hot unplugging CPUs in order 1 to 3. When CPU2 is removed, policy->kobj
would be moved to CPU3 and when CPU3 goes down we wouldn't free policy or its
kobj as we want to retain permissions/values/etc.

Now on resume, we will get CPU2 before CPU3 and will call __cpufreq_add_dev().
We will recover the old policy and update policy->cpu from 3 to 2 from
update_policy_cpu().

But the kobj is still tied to CPU3 and isn't moved to CPU2. We wouldn't create a
link for CPU2, but would try that for CPU3 while bringing it online. Which will
report errors as CPU3 already has kobj assigned to it.

This bug got introduced with commit 42f921a, which overlooked this scenario.

To fix this, lets move kobj to the new policy->cpu while bringing first CPU of a
cluster back. Also do a WARN_ON() if kobject_move failed, as we would reach here
only for the first CPU of a non-boot cluster. And we can't recover from this
situation, if kobject_move() fails.

Fixes: 42f921a6f10c (cpufreq: remove sysfs files for CPUs which failed to come back after resume)
Cc: 3.13+ <stable@vger.kernel.org> # 3.13+
Reported-and-tested-by: Bu Yitian <ybu@qti.qualcomm.com>
Reported-by: Saravana Kannan <skannan@codeaurora.org>
Reviewed-by: Srivatsa S. Bhat <srivatsa@mit.edu>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# fefa8ff8 18-Jun-2014 Aaron Plattner <aplattner@nvidia.com>

cpufreq: unlock when failing cpufreq_update_policy()

Commit bd0fa9bb455d introduced a failure path to cpufreq_update_policy() if
cpufreq_driver->get(cpu) returns NULL. However, it jumps to the 'no_policy'
label, which exits without unlocking any of the locks the function acquired
earlier. This causes later calls into cpufreq to hang.

Fix this by creating a new 'unlock' label and jumping to that instead.

Fixes: bd0fa9bb455d ("cpufreq: Return error if ->get() failed in cpufreq_update_policy()")
Link: https://devtalk.nvidia.com/default/topic/751903/kernel-3-15-and-nv-drivers-337-340-failed-to-initialize-the-nvidia-kernel-module-gtx-550-ti-/
Signed-off-by: Aaron Plattner <aplattner@nvidia.com>
Cc: 3.15+ <stable@vger.kernel.org> # 3.15+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 1c03a2d0 02-Jun-2014 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: add support for intermediate (stable) frequencies

Douglas Anderson, recently pointed out an interesting problem due to which
udelay() was expiring earlier than it should.

While transitioning between frequencies few platforms may temporarily switch to
a stable frequency, waiting for the main PLL to stabilize.

For example: When we transition between very low frequencies on exynos, like
between 200MHz and 300MHz, we may temporarily switch to a PLL running at 800MHz.
No CPUFREQ notification is sent for that. That means there's a period of time
when we're running at 800MHz but loops_per_jiffy is calibrated at between 200MHz
and 300MHz. And so udelay behaves badly.

To get this fixed in a generic way, introduce another set of callbacks
get_intermediate() and target_intermediate(), only for drivers with
target_index() and CPUFREQ_ASYNC_NOTIFICATION unset.

get_intermediate() should return a stable intermediate frequency platform wants
to switch to, and target_intermediate() should set CPU to that frequency,
before jumping to the frequency corresponding to 'index'. Core will take care of
sending notifications and driver doesn't have to handle them in
target_intermediate() or target_index().

NOTE: ->target_index() should restore to policy->restore_freq in case of
failures as core would send notifications for that.

Tested-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Doug Anderson <dianders@chromium.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 8d65775d 21-May-2014 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: handle calls to ->target_index() in separate routine

Handling calls to ->target_index() has got complex over time and might become
more complex. So, its better to take target_index() bits out in another routine
__target_index() for better code readability. Shouldn't have any functional
impact.

Tested-by: Stephen Warren <swarren@nvidia.com>
Reviewed-by: Doug Anderson <dianders@chromium.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 5eeaf1f1 07-May-2014 Stratos Karafotis <stratosk@semaphore.gr>

cpufreq: Fix build error on some platforms that use cpufreq_for_each_*

On platforms that use cpufreq_for_each_* macros, build fails if
CONFIG_CPU_FREQ=n, e.g. ARM/shmobile/koelsch/non-multiplatform:

drivers/built-in.o: In function `clk_round_parent':
clkdev.c:(.text+0xcf168): undefined reference to `cpufreq_next_valid'
drivers/built-in.o: In function `clk_rate_table_find':
clkdev.c:(.text+0xcf820): undefined reference to `cpufreq_next_valid'
make[3]: *** [vmlinux] Error 1

Fix this making cpufreq_next_valid function inline and move it to
cpufreq.h.

Fixes: 27e289dce297 (cpufreq: Introduce macros for cpufreq_frequency_table iteration)
Reported-and-tested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# ca654dc3 04-May-2014 Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>

cpufreq: Catch double invocations of cpufreq_freq_transition_begin/end

Some cpufreq drivers were redundantly invoking the _begin() and _end()
APIs around frequency transitions, and this double invocation (one from
the cpufreq core and the other from the cpufreq driver) used to result
in a self-deadlock, leading to system hangs during boot. (The _begin()
API makes contending callers wait until the previous invocation is
complete. Hence, the cpufreq driver would end up waiting on itself!).

Now all such drivers have been fixed, but debugging this issue was not
very straight-forward (even lockdep didn't catch this). So let us add a
debug infrastructure to the cpufreq core to catch such issues more easily
in the future.

We add a new field called 'transition_task' to the policy structure, to keep
track of the task which is performing the frequency transition. Using this
field, we make note of this task during _begin() and print a warning if we
find a case where the same task is calling _begin() again, before completing
the previous frequency transition using the corresponding _end().

We have left out ASYNC_NOTIFICATION drivers from this debug infrastructure
for 2 reasons:

1. At the moment, we have no way to avoid a particular scenario where this
debug infrastructure can emit false-positive warnings for such drivers.
The scenario is depicted below:

Task A Task B

/* 1st freq transition */
Invoke _begin() {
...
...
}

Change the frequency

/* 2nd freq transition */
Invoke _begin() {
... //waiting for B to
... //finish _end() for
... //the 1st transition
... | Got interrupt for successful
... | change of frequency (1st one).
... |
... | /* 1st freq transition */
... | Invoke _end() {
... | ...
... V }
...
...
}

This scenario is actually deadlock-free because, once Task A changes the
frequency, it is Task B's responsibility to invoke the corresponding
_end() for the 1st frequency transition. Hence it is perfectly legal for
Task A to go ahead and attempt another frequency transition in the meantime.
(Of course it won't be able to proceed until Task B finishes the 1st _end(),
but this doesn't cause a deadlock or a hang).

The debug infrastructure cannot handle this scenario and will treat it as
a deadlock and print a warning. To avoid this, we exclude such drivers
from the purview of this code.

2. Luckily, we don't _need_ this infrastructure for ASYNC_NOTIFICATION drivers
at all! The cpufreq core does not automatically invoke the _begin() and
_end() APIs during frequency transitions in such drivers. Thus, the driver
alone is responsible for invoking _begin()/_end() and hence there shouldn't
be any conflicts which lead to double invocations. So, we can skip these
drivers, since the probability that such drivers will hit this problem is
extremely low, as outlined above.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 27e289dc 25-Apr-2014 Stratos Karafotis <stratosk@semaphore.gr>

cpufreq: Introduce macros for cpufreq_frequency_table iteration

Many cpufreq drivers need to iterate over the cpufreq_frequency_table
for various tasks.

This patch introduces two macros which can be used for iteration over
cpufreq_frequency_table keeping a common coding style across drivers:

- cpufreq_for_each_entry: iterate over each entry of the table
- cpufreq_for_each_valid_entry: iterate over each entry that contains
a valid frequency.

It should have no functional changes.

Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr>
Acked-by: Lad, Prabhakar <prabhakar.csengg@gmail.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 236a9800 24-Mar-2014 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Make cpufreq_notify_transition & cpufreq_notify_post_transition static

cpufreq_notify_transition() and cpufreq_notify_post_transition() shouldn't be
called directly by cpufreq drivers anymore and so these should be marked static.

Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 8fec051e 24-Mar-2014 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Convert existing drivers to use cpufreq_freq_transition_{begin|end}

CPUFreq core has new infrastructure that would guarantee serialized calls to
target() or target_index() callbacks. These are called
cpufreq_freq_transition_begin() and cpufreq_freq_transition_end().

This patch converts existing drivers to use these new set of routines.

Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 12478cf0 24-Mar-2014 Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>

cpufreq: Make sure frequency transitions are serialized

Whenever we change the frequency of a CPU, we call the PRECHANGE and POSTCHANGE
notifiers. They must be serialized, i.e. PRECHANGE and POSTCHANGE notifiers
should strictly alternate, thereby preventing two different sets of PRECHANGE or
POSTCHANGE notifiers from interleaving arbitrarily.

The following examples illustrate why this is important:

Scenario 1:
-----------
A thread reading the value of cpuinfo_cur_freq, will call
__cpufreq_cpu_get()->cpufreq_out_of_sync()->cpufreq_notify_transition()

The ondemand governor can decide to change the frequency of the CPU at the same
time and hence it can end up sending the notifications via ->target().

If the notifiers are not serialized, the following sequence can occur:
- PRECHANGE Notification for freq A (from cpuinfo_cur_freq)
- PRECHANGE Notification for freq B (from target())
- Freq changed by target() to B
- POSTCHANGE Notification for freq B
- POSTCHANGE Notification for freq A

We can see from the above that the last POSTCHANGE Notification happens for freq
A but the hardware is set to run at freq B.

Where would we break then?: adjust_jiffies() in cpufreq.c & cpufreq_callback()
in arch/arm/kernel/smp.c (which also adjusts the jiffies). All the
loops_per_jiffy calculations will get messed up.

Scenario 2:
-----------
The governor calls __cpufreq_driver_target() to change the frequency. At the
same time, if we change scaling_{min|max}_freq from sysfs, it will end up
calling the governor's CPUFREQ_GOV_LIMITS notification, which will also call
__cpufreq_driver_target(). And hence we end up issuing concurrent calls to
->target().

Typically, platforms have the following logic in their ->target() routines:
(Eg: cpufreq-cpu0, omap, exynos, etc)

A. If new freq is more than old: Increase voltage
B. Change freq
C. If new freq is less than old: decrease voltage

Now, if the two concurrent calls to ->target() are X and Y, where X is trying to
increase the freq and Y is trying to decrease it, we get the following race
condition:

X.A: voltage gets increased for larger freq
Y.A: nothing happens
Y.B: freq gets decreased
Y.C: voltage gets decreased
X.B: freq gets increased
X.C: nothing happens

Thus we can end up setting a freq which is not supported by the voltage we have
set. That will probably make the clock to the CPU unstable and the system might
not work properly anymore.

This patch introduces a set of synchronization primitives to serialize frequency
transitions, which are to be used as shown below:

cpufreq_freq_transition_begin();

//Perform the frequency change

cpufreq_freq_transition_end();

The _begin() call sends the PRECHANGE notification whereas the _end() call sends
the POSTCHANGE notification. Also, all the necessary synchronization is handled
within these calls. In particular, even drivers which set the ASYNC_NOTIFICATION
flag can also use these APIs for performing frequency transitions (ie., you can
call _begin() from one task, and call the corresponding _end() from a different
task).

The actual synchronization underneath is not that complicated:

The key challenge is to allow drivers to begin the transition from one thread
and end it in a completely different thread (this is to enable drivers that do
asynchronous POSTCHANGE notification from bottom-halves, to also use the same
interface).

To achieve this, a 'transition_ongoing' flag, a 'transition_lock' spinlock and a
wait-queue are added per-policy. The flag and the wait-queue are used in
conjunction to create an "uninterrupted flow" from _begin() to _end(). The
spinlock is used to ensure that only one such "flow" is in flight at any given
time. Put together, this provides us all the necessary synchronization.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 0c5aa405 23-Mar-2014 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: resume drivers before enabling governors

During suspend, we first stop governors and then suspend cpufreq drivers and
resume must be exactly opposite of that. i.e. resume drivers first and then
start governors.

But the current code in resume enables governors first and then resume drivers.
Fix it be changing code sequence there.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 367dc4aa 19-Mar-2014 Dirk Brandewie <dirk.j.brandewie@intel.com>

cpufreq: Add stop CPU callback to cpufreq_driver interface

This callback allows the driver to do clean up before the CPU is
completely down and its state cannot be modified. This is used
by the intel_pstate driver to reduce the requested P state prior to
the core going away. This is required because the requested P state
of the offline core is used to select the package P state. This
effectively sets the floor package P state to the requested P state on
the offline core.

Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
[rjw: Minor modifications]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# bda9f552 19-Mar-2014 Stratos Karafotis <stratosk@semaphore.gr>

cpufreq: Remove unnecessary braces

Remove unnecessary braces from a single statement.

Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# e5c87b76 19-Mar-2014 Stratos Karafotis <stratosk@semaphore.gr>

cpufreq: Fix checkpatch errors and warnings

Fix 2 checkpatch errors about using assignment in if condition,
1 checkpatch error about a required space after comma
and 3 warnings about line over 80 characters.

Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 0b443ead 18-Mar-2014 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: remove unused notifier: CPUFREQ_{SUSPENDCHANGE|RESUMECHANGE}

Two cpufreq notifiers CPUFREQ_RESUMECHANGE and CPUFREQ_SUSPENDCHANGE have
not been used for some time, so remove them to clean up code a bit.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
[rjw: Changelog]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 9832235f 18-Mar-2014 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq: Do not allow ->setpolicy drivers to provide ->target

cpufreq drivers that provide the ->setpolicy() callback are supposed
to have integrated governors, so they don't need to set ->target()
or ->target_index() and may confuse the core if any of these callbacks
is present.

For this reason, add a check preventing ->setpolicy cpufreq drivers
from registering if they have non-NULL ->target or ->target_index.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>


# 2ed99e39 12-Mar-2014 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq: Skip current frequency initialization for ->setpolicy drivers

After commit da60ce9f2fac (cpufreq: call cpufreq_driver->get() after
calling ->init()) __cpufreq_add_dev() sometimes fails for CPUs handled
by intel_pstate, because that driver may return 0 from its ->get()
callback if it has not run long enough to collect enough samples on the
given CPU. That didn't happen before commit da60ce9f2fac which added
policy->cur initialization to __cpufreq_add_dev() to help reduce code
duplication in other cpufreq drivers.

However, the code added by commit da60ce9f2fac need not be executed
for cpufreq drivers having the ->setpolicy callback defined, because
the subsequent invocation of cpufreq_set_policy() will use that
callback to initialize the policy anyway and it doesn't need
policy->cur to be initialized upfront. The analogous code in
cpufreq_update_policy() is also unnecessary for cpufreq drivers
having ->setpolicy set and may be skipped for them as well.

Since intel_pstate provides ->setpolicy, skipping the upfront
policy->cur initialization for cpufreq drivers with that callback
set will cover intel_pstate and the problem it's been having after
commit da60ce9f2fac will be addressed.

Fixes: da60ce9f2fac (cpufreq: call cpufreq_driver->get() after calling ->init())
References: https://bugzilla.kernel.org/show_bug.cgi?id=71931
Reported-and-tested-by: Patrik Lundquist <patrik.lundquist@gmail.com>
Acked-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Cc: 3.13+ <stable@vger.kernel.org> # 3.13+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 96bbbe4a 10-Mar-2014 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Remove unnecessary variable/parameter 'frozen'

We have used 'frozen' variable/function parameter at many places to
distinguish between CPU offline/online on suspend/resume vs sysfs
removals. We now have another variable cpufreq_suspended which can
be used in these cases, so we can get rid of all those variables or
function parameters.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# e0b3165b 10-Mar-2014 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: add 'freq_table' in struct cpufreq_policy

freq table is not per CPU but per policy, so it makes more sense to
keep it within struct cpufreq_policy instead of a per-cpu variable.

This patch does it. Over that, there is no need to set policy->freq_table
to NULL in ->exit(), as policy structure is going to be freed soon.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# e837f9b5 11-Mar-2014 Joe Perches <joe@perches.com>

cpufreq: Reformat printk() statements

- Add missing newlines
- Coalesce format fragments
- Convert printks to pr_<level>
- Align arguments

Based-on-patch-by: Sören Brinkmann <soren.brinkmann@xilinx.com>
Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# e28867ea 03-Mar-2014 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Implement cpufreq_generic_suspend()

Multiple platforms need to set CPUs to a particular frequency before
suspending the system, so provide a common infrastructure for them.

Those platforms only need to point their ->suspend callback pointers
to the generic routine.

Tested-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
[rjw: Changelog]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 2f0aea93 03-Mar-2014 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: suspend governors on system suspend/hibernate

This patch adds cpufreq suspend/resume calls to dpm_{suspend|resume}()
for handling suspend/resume of cpufreq governors.

Lan Tianyu (Intel) & Jinhyuk Choi (Broadcom) found an issue where the
tunables configuration for clusters/sockets with non-boot CPUs was
lost after system suspend/resume, as we were notifying governors with
CPUFREQ_GOV_POLICY_EXIT on removal of the last CPU for that policy
which caused the tunables memory to be freed.

This is fixed by preventing any governor operations from being
carried out between the device suspend and device resume stages of
system suspend and resume, respectively.

We could have added these callbacks at dpm_{suspend|resume}_noirq()
level, but there is an additional problem that the majority of I/O
devices is already suspended at that point and if cpufreq drivers
want to change the frequency before suspending, then that not be
possible on some platforms (which depend on peripherals like i2c,
regulators, etc).

Reported-and-tested-by: Lan Tianyu <tianyu.lan@intel.com>
Reported-by: Jinhyuk Choi <jinchoi@broadcom.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
[rjw: Changelog]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 6e2c89d1 03-Mar-2014 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: move call to __find_governor() to cpufreq_init_policy()

We call __find_governor() during the addition of the first CPU of
each policy from __cpufreq_add_dev() to find the last governor used
for this CPU before it was hot-removed.

After that we call cpufreq_parse_governor() in cpufreq_init_policy(),
either with this governor, or with the default governor. Right after
that policy->governor is set to NULL.

While that code is not functionally problematic, the structure of it
is suboptimal, because some of the code required in cpufreq_init_policy()
is being executed by its caller, __cpufreq_add_dev(). So, it would make
more sense to get all of it together in a single place to make code more
readable.

Accordingly, move the code needed for policy initialization to
cpufreq_init_policy() and initialize policy->governor to NULL at the
beginning.

In order to clean up the code a bit more, some of the #ifdefs for
CONFIG_HOTPLUG_CPU are dropped too.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
[rjw: Changelog]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 4e97b631 03-Mar-2014 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Initialize governor for a new policy under policy->rwsem

policy->rwsem is used to lock access to all parts of code modifying
struct cpufreq_policy, but it's not used on a new policy created by
__cpufreq_add_dev().

Because of that, if cpufreq_update_policy() is called in a tight loop
on one CPU in parallel with offline/online of another CPU, then the
following crash can be triggered:

Unable to handle kernel NULL pointer dereference at virtual address 00000020
pgd = c0003000
[00000020] *pgd=80000000004003, *pmd=00000000
Internal error: Oops: 206 [#1] PREEMPT SMP ARM

PC is at __cpufreq_governor+0x10/0x1ac
LR is at cpufreq_update_policy+0x114/0x150

---[ end trace f23a8defea6cd706 ]---
Kernel panic - not syncing: Fatal exception
CPU0: stopping
CPU: 0 PID: 7136 Comm: mpdecision Tainted: G D W 3.10.0-gd727407-00074-g979ede8 #396

[<c0afe180>] (notifier_call_chain+0x40/0x68) from [<c02a23ac>] (__blocking_notifier_call_chain+0x40/0x58)
[<c02a23ac>] (__blocking_notifier_call_chain+0x40/0x58) from [<c02a23d8>] (blocking_notifier_call_chain+0x14/0x1c)
[<c02a23d8>] (blocking_notifier_call_chain+0x14/0x1c) from [<c0803c68>] (cpufreq_set_policy+0xd4/0x2b8)
[<c0803c68>] (cpufreq_set_policy+0xd4/0x2b8) from [<c0803e7c>] (cpufreq_init_policy+0x30/0x98)
[<c0803e7c>] (cpufreq_init_policy+0x30/0x98) from [<c0805a18>] (__cpufreq_add_dev.isra.17+0x4dc/0x7a4)
[<c0805a18>] (__cpufreq_add_dev.isra.17+0x4dc/0x7a4) from [<c0805d38>] (cpufreq_cpu_callback+0x58/0x84)
[<c0805d38>] (cpufreq_cpu_callback+0x58/0x84) from [<c0afe180>] (notifier_call_chain+0x40/0x68)
[<c0afe180>] (notifier_call_chain+0x40/0x68) from [<c02812dc>] (__cpu_notify+0x28/0x44)
[<c02812dc>] (__cpu_notify+0x28/0x44) from [<c0aeed90>] (_cpu_up+0xf4/0x1dc)
[<c0aeed90>] (_cpu_up+0xf4/0x1dc) from [<c0aeeed4>] (cpu_up+0x5c/0x78)
[<c0aeeed4>] (cpu_up+0x5c/0x78) from [<c0aec808>] (store_online+0x44/0x74)
[<c0aec808>] (store_online+0x44/0x74) from [<c03a40f4>] (sysfs_write_file+0x108/0x14c)
[<c03a40f4>] (sysfs_write_file+0x108/0x14c) from [<c03517d4>] (vfs_write+0xd0/0x180)
[<c03517d4>] (vfs_write+0xd0/0x180) from [<c0351ca8>] (SyS_write+0x38/0x68)
[<c0351ca8>] (SyS_write+0x38/0x68) from [<c0205de0>] (ret_fast_syscall+0x0/0x30)

Fix that by taking locks at appropriate places in __cpufreq_add_dev()
as well.

Reported-by: Saravana Kannan <skannan@codeaurora.org>
Suggested-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
[rjw: Changelog]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 5a7e56a5 03-Mar-2014 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Initialize policy before making it available for others to use

Policy must be fully initialized before it is being made available
for use by others. Otherwise cpufreq_cpu_get() would be able to grab
a half initialized policy structure that might not have affected_cpus
(for example) populated. Then, anybody accessing those fields will get
a wrong value and that will lead to unpredictable results.

In order to fix this, do all the necessary initialization before we
make the policy structure available via cpufreq_cpu_get(). That will
guarantee that any code accessing fields of the policy will get
correct data from them.

Reported-by: Saravana Kannan <skannan@codeaurora.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
[rjw: Changelog]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 999976e0 04-Mar-2014 Aaron Plattner <aplattner@nvidia.com>

cpufreq: use cpufreq_cpu_get() to avoid cpufreq_get() race conditions

If a module calls cpufreq_get while cpufreq is initializing, it's
possible for it to be called after cpufreq_driver is set but before
cpufreq_cpu_data is written during subsys_interface_register. This
happens because cpufreq_get doesn't take the cpufreq_driver_lock
around its use of cpufreq_cpu_data.

Fix this by using cpufreq_cpu_get(cpu) to look up the policy rather
than reading it out of cpufreq_cpu_data directly. cpufreq_cpu_get()
takes the appropriate locks to prevent this race from happening.

Since it's possible for policy to be NULL if the caller passes in an
invalid CPU number or calls the function before cpufreq is initialized,
delete the BUG_ON(!policy) and simply return 0. Don't try to return
-ENOENT because that's negative and the function returns an unsigned
integer.

References: https://bbs.archlinux.org/viewtopic.php?id=177934
Signed-off-by: Aaron Plattner <aplattner@nvidia.com>
Cc: 3.13+ <stable@vger.kernel.org> # 3.13+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# bd0fa9bb 25-Feb-2014 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Return error if ->get() failed in cpufreq_update_policy()

cpufreq_update_policy() calls cpufreq_driver->get() to get current
frequency of a CPU and it is not supposed to fail or return zero.
Return error in case that happens.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 8a5c74a1 26-Feb-2014 Rashika Kheria <rashika.kheria@gmail.com>

cpufreq: Mark function as static in cpufreq.c

Mark function as static in cpufreq.c because it is not
used outside this file.

This eliminates the following warning in cpufreq.c:
drivers/cpufreq/cpufreq.c:355:9: warning: no previous prototype for ‘show_boost’ [-Wmissing-prototypes]

Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 1c0ca902 14-Feb-2014 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: don't call cpufreq_update_policy() on CPU addition

cpufreq_update_policy() is called from two places currently. From a
workqueue handled queued from cpufreq_bp_resume() for boot CPU and
from cpufreq_cpu_callback() whenever a CPU is added.

The first one makes sure that boot CPU is running on the frequency
present in policy->cpu. But we don't really need a call from
cpufreq_cpu_callback(), because we always call cpufreq_driver->init()
(which will set policy->cur correctly) whenever first CPU of any
policy is added back. And so every policy structure is guaranteed to
have the right frequency in policy->cur.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# d9a789c7 17-Feb-2014 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq: Refactor cpufreq_set_policy()

Reduce the rampant usage of goto and the indentation level in
cpufreq_set_policy() to improve the readability of that code.

No functional changes should result from that.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>


# 6964d91d 17-Feb-2014 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: remove sysfs link when a cpu != policy->cpu, is removed

Commit 42f921a (cpufreq: remove sysfs files for CPUs which failed to
come back after resume) tried to do this but missed this piece of code
to fix.

Currently we are getting this on suspend/resume:

------------[ cut here ]------------
WARNING: CPU: 0 PID: 877 at fs/sysfs/dir.c:52 sysfs_warn_dup+0x68/0x84()
sysfs: cannot create duplicate filename '/devices/system/cpu/cpu1/cpufreq'
Modules linked in: brcmfmac brcmutil
CPU: 0 PID: 877 Comm: test-rtc-resume Not tainted 3.14.0-rc2-00259-g9398a10cd964 #12
[<c0015bac>] (unwind_backtrace) from [<c0011850>] (show_stack+0x10/0x14)
[<c0011850>] (show_stack) from [<c056e018>] (dump_stack+0x80/0xcc)
[<c056e018>] (dump_stack) from [<c0025e44>] (warn_slowpath_common+0x64/0x88)
[<c0025e44>] (warn_slowpath_common) from [<c0025efc>] (warn_slowpath_fmt+0x30/0x40)
[<c0025efc>] (warn_slowpath_fmt) from [<c012776c>] (sysfs_warn_dup+0x68/0x84)
[<c012776c>] (sysfs_warn_dup) from [<c0127a54>] (sysfs_do_create_link_sd+0xb0/0xb8)
[<c0127a54>] (sysfs_do_create_link_sd) from [<c038ef64>] (__cpufreq_add_dev.isra.27+0x2a8/0x814)
[<c038ef64>] (__cpufreq_add_dev.isra.27) from [<c038f548>] (cpufreq_cpu_callback+0x70/0x8c)
[<c038f548>] (cpufreq_cpu_callback) from [<c0043864>] (notifier_call_chain+0x44/0x84)
[<c0043864>] (notifier_call_chain) from [<c0025f60>] (__cpu_notify+0x28/0x44)
[<c0025f60>] (__cpu_notify) from [<c00261e8>] (_cpu_up+0xf0/0x140)
[<c00261e8>] (_cpu_up) from [<c0569eb8>] (enable_nonboot_cpus+0x68/0xb0)
[<c0569eb8>] (enable_nonboot_cpus) from [<c006339c>] (suspend_devices_and_enter+0x198/0x2dc)
[<c006339c>] (suspend_devices_and_enter) from [<c0063654>] (pm_suspend+0x174/0x1e8)
[<c0063654>] (pm_suspend) from [<c00624e0>] (state_store+0x6c/0xbc)
[<c00624e0>] (state_store) from [<c01fc200>] (kobj_attr_store+0x14/0x20)
[<c01fc200>] (kobj_attr_store) from [<c0126e50>] (sysfs_kf_write+0x44/0x48)
[<c0126e50>] (sysfs_kf_write) from [<c012a274>] (kernfs_fop_write+0xb4/0x14c)
[<c012a274>] (kernfs_fop_write) from [<c00d4818>] (vfs_write+0xa8/0x180)
[<c00d4818>] (vfs_write) from [<c00d4bb8>] (SyS_write+0x3c/0x70)
[<c00d4bb8>] (SyS_write) from [<c000e620>] (ret_fast_syscall+0x0/0x30)
---[ end trace 76969904b614c18f ]---

Fix this by removing sysfs link for cpufreq directory when cpu removed
isn't policy->cpu.

Revamps: 42f921a (cpufreq: remove sysfs files for CPUs which failed to come back after resume)
Reported-and-tested-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 6f19efc0 20-Dec-2013 Lukasz Majewski <l.majewski@samsung.com>

cpufreq: Add boost frequency support in core

This commit adds boost frequency support in cpufreq core (Hardware &
Software). Some SoCs (like Exynos4 - e.g. 4x12) allow setting frequency
above its normal operation limits. Such mode shall be only used for a
short time.

Overclocking (boost) support is essentially provided by platform
dependent cpufreq driver.

This commit unifies support for SW and HW (Intel) overclocking solutions
in the core cpufreq driver. Previously the "boost" sysfs attribute was
defined in the ACPI processor driver code. By default boost is disabled.
One global attribute is available at: /sys/devices/system/cpu/cpufreq/boost.

It only shows up when cpufreq driver supports overclocking.
Under the hood frequencies dedicated for boosting are marked with a
special flag (CPUFREQ_BOOST_FREQ) at driver's frequency table.
It is the user's concern to enable/disable overclocking with a proper call
to sysfs.

The cpufreq_boost_trigger_state() function is defined non static on purpose.
It is used later with thermal subsystem to provide automatic enable/disable
of the BOOST feature.

Signed-off-by: Lukasz Majewski <l.majewski@samsung.com>
Signed-off-by: Myungjoo Ham <myungjoo.ham@samsung.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 652ed95d 09-Jan-2014 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: introduce cpufreq_generic_get() routine

CPUFreq drivers that use clock frameworks interface,i.e. clk_get_rate(),
to get CPUs clk rate, have similar sort of code used in most of them.

This patch adds a generic ->get() which will do the same thing for them.
All those drivers are required to now is to set .get to cpufreq_generic_get()
and set their clk pointer in policy->clk during ->init().

Acked-by: Hans-Christian Egtvedt <egtvedt@samfundet.no>
Acked-by: Shawn Guo <shawn.guo@linaro.org>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Acked-by: Shawn Guo <shawn.guo@linaro.org>
Acked-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# fcd7af91 06-Jan-2014 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: stats: handle cpufreq_unregister_driver() and suspend/resume properly

There are several problems with cpufreq stats in the way it handles
cpufreq_unregister_driver() and suspend/resume..

- We must not lose data collected so far when suspend/resume happens
and so stats directories must not be removed/allocated during these
operations, which is done currently.

- cpufreq_stat has registered notifiers with both cpufreq and hotplug.
It adds sysfs stats directory with a cpufreq notifier: CPUFREQ_NOTIFY
and removes this directory with a notifier from hotplug core.

In case cpufreq_unregister_driver() is called (on rmmod cpufreq driver),
stats directories per cpu aren't removed as CPUs are still online. The
only call cpufreq_stats gets is cpufreq_stats_update_policy_cpu() for
all CPUs except the last of each policy. And pointer to stat information
is stored in the entry for last CPU in the per-cpu cpufreq_stats_table.
But policy structure would be freed inside cpufreq core and so that will
result in memory leak inside cpufreq stats (as we are never freeing
memory for stats).

Now if we again insert the module cpufreq_register_driver() will be
called and we will again allocate stats data and put it on for first
CPU of every policy. In case we only have a single CPU per policy, we
will return with a error from cpufreq_stats_create_table() due to this
code:

if (per_cpu(cpufreq_stats_table, cpu))
return -EBUSY;

And so probably cpufreq stats directory would not show up anymore (as
it was added inside last policies->kobj which doesn't exist anymore).
I haven't tested it, though. Also the values in stats files wouldn't
be refreshed as we are using the earlier stats structure.

- CPUFREQ_NOTIFY is called from cpufreq_set_policy() which is called for
scenarios where we don't really want cpufreq_stat_notifier_policy() to get
called. For example whenever we are changing anything related to a policy:
min/max/current freq, etc. cpufreq_set_policy() is called and so cpufreq
stats is notified. Where we don't do any useful stuff other than simply
returning with -EBUSY from cpufreq_stats_create_table(). And so this
isn't the right notifier that cpufreq stats..

Due to all above reasons this patch does following changes:
- Add new notifiers CPUFREQ_CREATE_POLICY and CPUFREQ_REMOVE_POLICY,
which are only called when policy is created/destroyed. They aren't
called for suspend/resume paths..
- Use these notifiers in cpufreq_stat_notifier_policy() to create/destory
stats sysfs entries. And so cpufreq_unregister_driver() or suspend/resume
shouldn't be a problem for cpufreq_stats.
- Return early from cpufreq_stat_cpu_callback() for suspend/resume sequence,
so that we don't free stats structure.

Acked-by: Nicolas Pitre <nico@linaro.org>
Tested-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# d3916691 02-Dec-2013 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Make sure CPU is running on a freq from freq-table

Sometimes boot loaders set CPU frequency to a value outside of frequency table
present with cpufreq core. In such cases CPU might be unstable if it has to run
on that frequency for long duration of time and so its better to set it to a
frequency which is specified in freq-table. This also makes cpufreq stats
inconsistent as cpufreq-stats would fail to register because current frequency
of CPU isn't found in freq-table.

Because we don't want this change to affect boot process badly, we go for the
next freq which is >= policy->cur ('cur' must be set by now, otherwise we will
end up setting freq to lowest of the table as 'cur' is initialized to zero).

In case current frequency doesn't match any frequency from freq-table, we throw
warnings to user, so that user can get this fixed in their bootloaders or
freq-tables.

Reported-by: Carlos Hernandez <ceh@ti.com>
Reported-and-tested-by: Nishanth Menon <nm@ti.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# ab1b1c4e 01-Dec-2013 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: send new set of notification for transition failures

In the current code, if we fail during a frequency transition, we
simply send the POSTCHANGE notification with the old frequency. This
isn't enough.

One of the core users of these notifications is the code responsible
for keeping loops_per_jiffy aligned with frequency changes. And mostly
it is written as:

if ((val == CPUFREQ_PRECHANGE && freq->old < freq->new) ||
(val == CPUFREQ_POSTCHANGE && freq->old > freq->new)) {
update-loops-per-jiffy...
}

So, suppose we are changing to a higher frequency and failed during
transition, then following will happen:
- CPUFREQ_PRECHANGE notification with freq-new > freq-old
- CPUFREQ_POSTCHANGE notification with freq-new == freq-old

The first one will update loops_per_jiffy and second one will do
nothing. Even if we send the 2nd notification by exchanging values of
freq-new and old, some users of these notifications might get
unstable.

This can be fixed by simply calling cpufreq_notify_post_transition()
with error code and this routine will take care of sending
notifications in the correct order.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
[rjw: Folded 3 patches into one, rebased unicore2 changes]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# f7ba3b41 01-Dec-2013 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Introduce cpufreq_notify_post_transition()

This introduces a new routine cpufreq_notify_post_transition() which
can be used to send POSTCHANGE notification for new freq with or
without both {PRE|POST}CHANGE notifications for last freq. This is
useful at multiple places, especially for sending transition failure
notifications.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 6f1e4efd 03-Jan-2014 Jane Li <jiel@marvell.com>

cpufreq: Fix timer/workqueue corruption by protecting reading governor_enabled

When a CPU is hot removed we'll cancel all the delayed work items via
gov_cancel_work(). Sometimes the delayed work function determines that
it should adjust the delay for all other CPUs that the policy is
managing. If this scenario occurs, the canceling CPU will cancel its own
work but queue up the other CPUs works to run.

Commit 3617f2 (cpufreq: Fix timer/workqueue corruption due to double
queueing) has tried to fix this, but reading governor_enabled is not
protected by cpufreq_governor_lock. Even though od_dbs_timer() checks
governor_enabled before gov_queue_work(), this scenario may occur. For
example:

CPU0 CPU1
---- ----
cpu_down()
... <work runs>
__cpufreq_remove_dev() od_dbs_timer()
__cpufreq_governor() policy->governor_enabled
policy->governor_enabled = false;
cpufreq_governor_dbs()
case CPUFREQ_GOV_STOP:
gov_cancel_work(dbs_data, policy);
cpu0 work is canceled
timer is canceled
cpu1 work is canceled
<waits for cpu1>
gov_queue_work(*, *, true);
cpu0 work queued
cpu1 work queued
cpu2 work queued
...
cpu1 work is canceled
cpu2 work is canceled
...

At the end of the GOV_STOP case cpu0 still has a work queued to
run although the code is expecting all of the works to be
canceled. __cpufreq_remove_dev() will then proceed to
re-initialize all the other CPUs works except for the CPU that is
going down. The CPUFREQ_GOV_START case in cpufreq_governor_dbs()
will trample over the queued work and debugobjects will spit out
a warning:

WARNING: at lib/debugobjects.c:260 debug_print_object+0x94/0xbc()
ODEBUG: init active (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x14
Modules linked in:
CPU: 1 PID: 1205 Comm: sh Tainted: G W 3.10.0 #200
[<c01144f0>] (unwind_backtrace+0x0/0xf8) from [<c0111d98>] (show_stack+0x10/0x14)
[<c0111d98>] (show_stack+0x10/0x14) from [<c01272cc>] (warn_slowpath_common+0x4c/0x68)
[<c01272cc>] (warn_slowpath_common+0x4c/0x68) from [<c012737c>] (warn_slowpath_fmt+0x30/0x40)
[<c012737c>] (warn_slowpath_fmt+0x30/0x40) from [<c034c640>] (debug_print_object+0x94/0xbc)
[<c034c640>] (debug_print_object+0x94/0xbc) from [<c034c7f8>] (__debug_object_init+0xc8/0x3c0)
[<c034c7f8>] (__debug_object_init+0xc8/0x3c0) from [<c01360e0>] (init_timer_key+0x20/0x104)
[<c01360e0>] (init_timer_key+0x20/0x104) from [<c04872ac>] (cpufreq_governor_dbs+0x1dc/0x68c)
[<c04872ac>] (cpufreq_governor_dbs+0x1dc/0x68c) from [<c04833a8>] (__cpufreq_governor+0x80/0x1b0)
[<c04833a8>] (__cpufreq_governor+0x80/0x1b0) from [<c0483704>] (__cpufreq_remove_dev.isra.12+0x22c/0x380)
[<c0483704>] (__cpufreq_remove_dev.isra.12+0x22c/0x380) from [<c0692f38>] (cpufreq_cpu_callback+0x48/0x5c)
[<c0692f38>] (cpufreq_cpu_callback+0x48/0x5c) from [<c014fb40>] (notifier_call_chain+0x44/0x84)
[<c014fb40>] (notifier_call_chain+0x44/0x84) from [<c012ae44>] (__cpu_notify+0x2c/0x48)
[<c012ae44>] (__cpu_notify+0x2c/0x48) from [<c068dd40>] (_cpu_down+0x80/0x258)
[<c068dd40>] (_cpu_down+0x80/0x258) from [<c068df40>] (cpu_down+0x28/0x3c)
[<c068df40>] (cpu_down+0x28/0x3c) from [<c068e4c0>] (store_online+0x30/0x74)
[<c068e4c0>] (store_online+0x30/0x74) from [<c03a7308>] (dev_attr_store+0x18/0x24)
[<c03a7308>] (dev_attr_store+0x18/0x24) from [<c0256fe0>] (sysfs_write_file+0x100/0x180)
[<c0256fe0>] (sysfs_write_file+0x100/0x180) from [<c01fec9c>] (vfs_write+0xbc/0x184)
[<c01fec9c>] (vfs_write+0xbc/0x184) from [<c01ff034>] (SyS_write+0x40/0x68)
[<c01ff034>] (SyS_write+0x40/0x68) from [<c010e200>] (ret_fast_syscall+0x0/0x48)

In gov_queue_work(), lock cpufreq_governor_lock before gov_queue_work,
and unlock it after __gov_queue_work(). In this way, governor_enabled
is guaranteed not changed in gov_queue_work().

Signed-off-by: Jane Li <jiel@marvell.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 08fd8c1c 23-Dec-2013 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: preserve user_policy across suspend/resume

Prevent __cpufreq_add_dev() from overwriting the existing values of
user_policy.{min|max|policy|governor} with defaults during resume
from system suspend.

Fixes: 5302c3fb2e62 ("cpufreq: Perform light-weight init/teardown during suspend/resume")
Reported-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: 3.12+ <stable@vger.kernel.org> # 3.12+
[rjw: Changelog]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 72368d12 26-Dec-2013 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq: Clean up after a failing light-weight initialization

If cpufreq_policy_restore() returns NULL during system resume,
__cpufreq_add_dev() should just fall back to the full initialization
instead of returning an error, because that may actually make things
work. Moreover, it should not leave stale fallback data behind after
it has failed to restore a previously existing policy.

This change is based on Viresh Kumar's work.

Fixes: 5302c3fb2e62 ("cpufreq: Perform light-weight init/teardown during suspend/resume")
Reported-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: 3.12+ <stable@vger.kernel.org> # 3.12+


# a27a9ab7 19-Dec-2013 Jason Baron <jbaron@akamai.com>

cpufreq: Use CONFIG_CPU_FREQ_DEFAULT_* to set initial policy for setpolicy drivers

When configuring a default governor (via CONFIG_CPU_FREQ_DEFAULT_*) with the
intel_pstate driver, the desired default policy is not properly set. For
example, setting 'CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE' ends up with the
'powersave' policy being set.

Fix by configuring the correct default policy, if either 'powersave' or
'performance' are requested. Otherwise, fallback to what the driver originally
set via its 'init' routine.

Signed-off-by: Jason Baron <jbaron@akamai.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 42f921a6 20-Dec-2013 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: remove sysfs files for CPUs which failed to come back after resume

There are cases where cpufreq_add_dev() may fail for some CPUs
during system resume. With the current code we will still have
sysfs cpufreq files for those CPUs and struct cpufreq_policy
would be already freed for them. Hence any operation on those
sysfs files would result in kernel warnings.

Example of problems resulting from resume errors (from Bjørn Mork):

WARNING: CPU: 0 PID: 6055 at fs/sysfs/file.c:343 sysfs_open_file+0x77/0x212()
missing sysfs attribute operations for kobject: (null)
Modules linked in: [stripped as irrelevant]
CPU: 0 PID: 6055 Comm: grep Tainted: G D 3.13.0-rc2 #153
Hardware name: LENOVO 2776LEG/2776LEG, BIOS 6EET55WW (3.15 ) 12/19/2011
0000000000000009 ffff8802327ebb78 ffffffff81380b0e 0000000000000006
ffff8802327ebbc8 ffff8802327ebbb8 ffffffff81038635 0000000000000000
ffffffff811823c7 ffff88021a19e688 ffff88021a19e688 ffff8802302f9310
Call Trace:
[<ffffffff81380b0e>] dump_stack+0x55/0x76
[<ffffffff81038635>] warn_slowpath_common+0x7c/0x96
[<ffffffff811823c7>] ? sysfs_open_file+0x77/0x212
[<ffffffff810386e3>] warn_slowpath_fmt+0x41/0x43
[<ffffffff81182dec>] ? sysfs_get_active+0x6b/0x82
[<ffffffff81182382>] ? sysfs_open_file+0x32/0x212
[<ffffffff811823c7>] sysfs_open_file+0x77/0x212
[<ffffffff81182350>] ? sysfs_schedule_callback+0x1ac/0x1ac
[<ffffffff81122562>] do_dentry_open+0x17c/0x257
[<ffffffff8112267e>] finish_open+0x41/0x4f
[<ffffffff81130225>] do_last+0x80c/0x9ba
[<ffffffff8112dbbd>] ? inode_permission+0x40/0x42
[<ffffffff81130606>] path_openat+0x233/0x4a1
[<ffffffff81130b7e>] do_filp_open+0x35/0x85
[<ffffffff8113b787>] ? __alloc_fd+0x172/0x184
[<ffffffff811232ea>] do_sys_open+0x6b/0xfa
[<ffffffff811233a7>] SyS_openat+0xf/0x11
[<ffffffff8138c812>] system_call_fastpath+0x16/0x1b

To fix this, remove those sysfs files or put the associated kobject
in case of such errors. Also, to make it simple, remove the cpufreq
sysfs links from all the CPUs (except for the policy->cpu) during
suspend, as that operation won't result in a loss of sysfs file
permissions and we can create those links during resume just fine.

Fixes: 5302c3fb2e62 ("cpufreq: Perform light-weight init/teardown during suspend/resume")
Reported-and-tested-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: 3.12+ <stable@vger.kernel.org> # 3.12+
[rjw: Changelog]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# d4faadd5 07-Dec-2013 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Revert "cpufreq: fix garbage kobjects on errors during suspend/resume"

Commit 2167e2399dc5 (cpufreq: fix garbage kobjects on errors during
suspend/resume) breaks suspend/resume on Martin Ziegler's system
(hard lockup during resume), so revert it.

Fixes: 2167e2399dc5 (cpufreq: fix garbage kobjects on errors during suspend/resume)
References: https://bugzilla.kernel.org/show_bug.cgi?id=66751
Reported-by: Martin Ziegler <ziegler@uni-freiburg.de>
Cc: 3.12+ <stable@vger.kernel.org> # 3.12+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 12205a4b 07-Dec-2013 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Revert "cpufreq: suspend governors on system suspend/hibernate"

Commit 5a87182aa21d (cpufreq: suspend governors on system
suspend/hibernate) causes hibernation problems to happen on
Bjørn Mork's and Paul Bolle's systems, so revert it.

Fixes: 5a87182aa21d (cpufreq: suspend governors on system suspend/hibernate)
Reported-by: Bjørn Mork <bjorn@mork.no>
Reported-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 2167e239 02-Dec-2013 Bjørn Mork <bjorn@mork.no>

cpufreq: fix garbage kobjects on errors during suspend/resume

This is effectively a revert of commit 5302c3fb2e62 ("cpufreq: Perform
light-weight init/teardown during suspend/resume"), which enabled
suspend/resume optimizations leaving the sysfs files in place.

Errors during suspend/resume are not handled properly, leaving
dead sysfs attributes in case of failures. There are are number of
functions with special code for the "frozen" case, and all these
need to also have special error handling.

The problem is easy to demonstrate by making cpufreq_driver->init()
or cpufreq_driver->get() fail during resume.

The code is too complex for a simple fix, with split code paths
in multiple blocks within a number of functions. It is therefore
best to revert the patch enabling this code until the error handling
is in place.

Examples of problems resulting from resume errors:

WARNING: CPU: 0 PID: 6055 at fs/sysfs/file.c:343 sysfs_open_file+0x77/0x212()
missing sysfs attribute operations for kobject: (null)
Modules linked in: [stripped as irrelevant]
CPU: 0 PID: 6055 Comm: grep Tainted: G D 3.13.0-rc2 #153
Hardware name: LENOVO 2776LEG/2776LEG, BIOS 6EET55WW (3.15 ) 12/19/2011
0000000000000009 ffff8802327ebb78 ffffffff81380b0e 0000000000000006
ffff8802327ebbc8 ffff8802327ebbb8 ffffffff81038635 0000000000000000
ffffffff811823c7 ffff88021a19e688 ffff88021a19e688 ffff8802302f9310
Call Trace:
[<ffffffff81380b0e>] dump_stack+0x55/0x76
[<ffffffff81038635>] warn_slowpath_common+0x7c/0x96
[<ffffffff811823c7>] ? sysfs_open_file+0x77/0x212
[<ffffffff810386e3>] warn_slowpath_fmt+0x41/0x43
[<ffffffff81182dec>] ? sysfs_get_active+0x6b/0x82
[<ffffffff81182382>] ? sysfs_open_file+0x32/0x212
[<ffffffff811823c7>] sysfs_open_file+0x77/0x212
[<ffffffff81182350>] ? sysfs_schedule_callback+0x1ac/0x1ac
[<ffffffff81122562>] do_dentry_open+0x17c/0x257
[<ffffffff8112267e>] finish_open+0x41/0x4f
[<ffffffff81130225>] do_last+0x80c/0x9ba
[<ffffffff8112dbbd>] ? inode_permission+0x40/0x42
[<ffffffff81130606>] path_openat+0x233/0x4a1
[<ffffffff81130b7e>] do_filp_open+0x35/0x85
[<ffffffff8113b787>] ? __alloc_fd+0x172/0x184
[<ffffffff811232ea>] do_sys_open+0x6b/0xfa
[<ffffffff811233a7>] SyS_openat+0xf/0x11
[<ffffffff8138c812>] system_call_fastpath+0x16/0x1b

The failure to restore cpufreq devices on cancelled hibernation is
not a new bug. It is caused by the ACPI _PPC call failing unless the
hibernate is completed. This makes the acpi_cpufreq driver fail its
init.

Previously, the cpufreq device could be restored by offlining the
cpu temporarily. And as a complete hibernation cycle would do this,
it would be automatically restored most of the time. But after
commit 5302c3fb2e62 the leftover sysfs attributes will block any
device add action. Therefore offlining and onlining CPU 1 will no
longer restore the cpufreq object, and a complete suspend/resume
cycle will replace it with garbage.

Fixes: 5302c3fb2e62 ("cpufreq: Perform light-weight init/teardown during suspend/resume")
Cc: 3.12+ <stable@vger.kernel.org> # 3.12+
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 5a87182a 26-Nov-2013 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: suspend governors on system suspend/hibernate

This patch adds cpufreq suspend/resume calls to dpm_{suspend|resume}_noirq()
for handling suspend/resume of cpufreq governors.

Lan Tianyu (Intel) & Jinhyuk Choi (Broadcom) found anr issue where
tunables configuration for clusters/sockets with non-boot CPUs was
getting lost after suspend/resume, as we were notifying governors
with CPUFREQ_GOV_POLICY_EXIT on removal of the last cpu for that
policy and so deallocating memory for tunables. This is fixed by
this patch as we don't allow any operation on governors after
device suspend and before device resume now.

Reported-and-tested-by: Lan Tianyu <tianyu.lan@intel.com>
Reported-by: Jinhyuk Choi <jinchoi@broadcom.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
[rjw: Changelog, minor cleanups]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# d4019f0a 14-Aug-2013 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: move freq change notifications to cpufreq core

Most of the drivers do following in their ->target_index() routines:

struct cpufreq_freqs freqs;
freqs.old = old freq...
freqs.new = new freq...

cpufreq_notify_transition(policy, &freqs, CPUFREQ_PRECHANGE);

/* Change rate here */

cpufreq_notify_transition(policy, &freqs, CPUFREQ_POSTCHANGE);

This is replicated over all cpufreq drivers today and there doesn't exists a
good enough reason why this shouldn't be moved to cpufreq core instead.

There are few special cases though, like exynos5440, which doesn't do everything
on the call to ->target_index() routine and call some kind of bottom halves for
doing this work, work/tasklet/etc..

They may continue doing notification from their own code as flag:
CPUFREQ_ASYNC_NOTIFICATION is already set for them.

All drivers are also modified in this patch to avoid breaking 'git bisect', as
double notification would happen otherwise.

Acked-by: Hans-Christian Egtvedt <egtvedt@samfundet.no>
Acked-by: Jesper Nilsson <jesper.nilsson@axis.com>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Acked-by: Russell King <linux@arm.linux.org.uk>
Acked-by: Stephen Warren <swarren@nvidia.com>
Tested-by: Andrew Lunn <andrew@lunn.ch>
Tested-by: Nicolas Pitre <nicolas.pitre@linaro.org>
Reviewed-by: Lan Tianyu <tianyu.lan@intel.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# ad7722da 18-Oct-2013 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: create per policy rwsem instead of per CPU cpu_policy_rwsem

We have per-CPU cpu_policy_rwsem for cpufreq core, but we never use
all of them. We always use rwsem of policy->cpu and so we can
actually make this rwsem per policy instead.

This patch does this change. With this change other tricky situations
are also avoided now, like which lock to take while we are changing
policy->cpu, etc.

Suggested-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Tested-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 9c0ebcf7 25-Oct-2013 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Implement light weight ->target_index() routine

Currently, the prototype of cpufreq_drivers target routines is:

int target(struct cpufreq_policy *policy, unsigned int target_freq,
unsigned int relation);

And most of the drivers call cpufreq_frequency_table_target() to get a valid
index of their frequency table which is closest to the target_freq. And they
don't use target_freq and relation after that.

So, it makes sense to just do this work in cpufreq core before calling
cpufreq_frequency_table_target() and simply pass index instead. But this can be
done only with drivers which expose their frequency table with cpufreq core. For
others we need to stick with the old prototype of target() until those drivers
are converted to expose frequency tables.

This patch implements the new light weight prototype for target_index() routine.
It looks like this:

int target_index(struct cpufreq_policy *policy, unsigned int index);

CPUFreq core will call cpufreq_frequency_table_target() before calling this
routine and pass index to it. Because CPUFreq core now requires to call routines
present in freq_table.c CONFIG_CPU_FREQ_TABLE must be enabled all the time.

This also marks target() interface as deprecated. So, that new drivers avoid
using it. And Documentation is updated accordingly.

It also converts existing .target() to newly defined light weight
.target_index() routine for many driver.

Acked-by: Hans-Christian Egtvedt <egtvedt@samfundet.no>
Acked-by: Jesper Nilsson <jesper.nilsson@axis.com>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Acked-by: Russell King <linux@arm.linux.org.uk>
Acked-by: David S. Miller <davem@davemloft.net>
Tested-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rjw@rjwysocki.net>


# 99ec899e 12-Sep-2013 Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>

cpufreq: Detect spurious invocations of update_policy_cpu()

The function update_policy_cpu() is expected to be called when the policy->cpu
of a cpufreq policy is to be changed: ie., the new CPU nominated to become the
policy->cpu is different from the old one.

Print a warning if it is invoked with new_cpu == old_cpu, since such an
invocation might hint at a faulty logic in the caller.

Suggested-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 3bc28ab6 03-Oct-2013 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: remove CONFIG_CPU_FREQ_TABLE

CONFIG_CPU_FREQ_TABLE will be always enabled when cpufreq framework is used, as
cpufreq core depends on it. So, we don't need this CONFIG option anymore as it
is not configurable. Remove CONFIG_CPU_FREQ_TABLE and update its users.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 70e9e778 03-Oct-2013 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: create cpufreq_generic_init() routine

Many CPUFreq drivers for SMP system (where all cores share same clock lines), do
similar stuff in their ->init() part.

This patch creates a generic routine in cpufreq core which can be used by these
so that we can remove some redundant code.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# da60ce9f 03-Oct-2013 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: call cpufreq_driver->get() after calling ->init()

Almost all drivers set policy->cur with current CPU frequency in their ->init()
part. This can be done for all of them at core level and so they wouldn't need
to do it.

This patch adds supporting code in cpufreq core for calling get() after we have
called init() for a policy.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 0b981e70 02-Oct-2013 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: use cpufreq_driver->flags to mark CPUFREQ_HAVE_GOVERNOR_PER_POLICY

Use cpufreq_driver->flags to mark CPUFREQ_HAVE_GOVERNOR_PER_POLICY instead
of a separate field within cpufreq_driver. This will save some bytes of
memory.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 037ce839 02-Oct-2013 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: rename __cpufreq_set_policy() as cpufreq_set_policy()

Earlier there used to be two functions named __cpufreq_set_policy() and
cpufreq_set_policy(), but now we only have a single routine lets name it
cpufreq_set_policy() instead of __cpufreq_set_policy().

This also removes some invalid comments or fixes some incorrect comments.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 27a862e9 02-Oct-2013 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: remove __cpufreq_remove_dev()

Nobody except cpufreq_remove_dev() calls __cpufreq_remove_dev() and
so we don't need two separate routines here. Merge code from
__cpufreq_remove_dev() into cpufreq_remove_dev() and get rid of
__cpufreq_remove_dev().

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 75949c9a 02-Oct-2013 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: don't break string in print statements

As a rule its better not to break string (quoted inside "") in a
print statement even if it crosses 80 column boundary as that may
introduce bugs and so this patch rewrites one of the print statements..

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# bbdd04ab 02-Oct-2013 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Remove extra blank line

We don't need a blank line just at start of a block, lets remove it.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 67a29e55 02-Oct-2013 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: remove invalid comment from __cpufreq_remove_dev()

Some section of kerneldoc comment for __cpufreq_remove_dev() is invalid now.
Remove it.

Suggested-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 1b750e3b 02-Oct-2013 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: make return type of lock_policy_rwsem_{read|write}() as void

lock_policy_rwsem_{read|write}() currently has return type of int,
but it always returns zero and hence its return type should be void
instead. This patch makes that change and modifies all of the users
accordingly.

Reported-by: Jon Medhurst<tixy@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 26ca8694 20-Sep-2013 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: check cpufreq driver is valid and cpufreq isn't disabled in cpufreq_get()

cpufreq_get() can be called from external drivers which might not be aware if
cpufreq driver is registered or not. And so we should actually check if cpufreq
driver is registered or not and also if cpufreq is active or disabled, at the
beginning of cpufreq_get().

Otherwise call to lock_policy_rwsem_read() might hit BUG_ON(!policy).

Reported-and-tested-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 4dea5806 18-Sep-2013 Yinghai Lu <yinghai@kernel.org>

cpufreq: return EEXIST instead of EBUSY for second registering

On systems that support intel_pstate, acpi_cpufreq fails to load, and
udev keeps trying until trace gets filled up and kernel crashes.

The root cause is driver return ret from cpufreq_register_driver(),
because when some other driver takes over before, it will return
EBUSY and then udev will keep trying ...

cpufreq_register_driver() should return EEXIST instead so that the
system can boot without appending intel_pstate=disable and still use
intel_pstate.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 8efd5765 16-Sep-2013 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: unlock correct rwsem while updating policy->cpu

Current code looks like this:

WARN_ON(lock_policy_rwsem_write(cpu));
update_policy_cpu(policy, new_cpu);
unlock_policy_rwsem_write(cpu);

{lock|unlock}_policy_rwsem_write(cpu) takes/releases policy->cpu's rwsem.
Because cpu is changing with the call to update_policy_cpu(), the
unlock_policy_rwsem_write() will release the incorrect lock.

The right solution would be to release the same lock as was taken earlier. Also
update_policy_cpu() was also called from cpufreq_add_dev() without any locks and
so its better if we move this locking to inside update_policy_cpu().

This patch fixes a regression introduced in 3.12 by commit f9ba680d23
(cpufreq: Extract the handover of policy cpu to a helper function).

Reported-and-tested-by: Jon Medhurst<tixy@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 9c8f1ee4 12-Sep-2013 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Clear policy->cpus bits in __cpufreq_remove_dev_finish()

This broke after a recent change "cedb70a cpufreq: Split __cpufreq_remove_dev()
into two parts" from Srivatsa.

Consider a scenario where we have two CPUs in a policy (0 & 1) and we are
removing CPU 1. On the call to __cpufreq_remove_dev_prepare() we have cleared 1
from policy->cpus and now on a call to __cpufreq_remove_dev_finish() we read
cpumask_weight of policy->cpus, which will come as 1 and this code will behave
as if we are removing the last CPU from policy :)

Fix it by clearing the CPU mask in __cpufreq_remove_dev_finish() instead of
__cpufreq_remove_dev_prepare().

Tested-by: Stephen Warren <swarren@wwwdotorg.org>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 44871c9c 11-Sep-2013 Lan Tianyu <tianyu.lan@intel.com>

cpufreq: Acquire the lock in cpufreq_policy_restore() for reading

In cpufreq_policy_restore() before system suspend policy is read from
percpu's cpufreq_cpu_data_fallback. It's a read operation rather
than a write one, so take the lock for reading in there.

Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# cb38ed5c 11-Sep-2013 Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>

cpufreq: Prevent problems in update_policy_cpu() if last_cpu == new_cpu

If update_policy_cpu() is invoked with the existing policy->cpu itself
as the new-cpu parameter, then a lot of things can go terribly wrong.

In its present form, update_policy_cpu() always assumes that the new-cpu
is different from policy->cpu and invokes other functions to perform their
respective updates. And those functions implement the actual update like
this:

per_cpu(..., new_cpu) = per_cpu(..., last_cpu);
per_cpu(..., last_cpu) = NULL;

Thus, when new_cpu == last_cpu, the final NULL assignment makes the per-cpu
references vanish into thin air! (memory leak). From there, it leads to more
problems: cpufreq_stats_create_table() now doesn't find the per-cpu reference
and hence tries to create a new sysfs-group; but sysfs already had created
the group earlier, so it complains that it cannot create a duplicate filename.
In short, the repercussions of a rather innocuous invocation of
update_policy_cpu() can turn out to be pretty nasty.

Ideally update_policy_cpu() should handle this situation (new == last)
gracefully, and not lead to such severe problems. So fix it by adding an
appropriate check.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Tested-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 61173f25 11-Sep-2013 Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>

cpufreq: Restructure if/else block to avoid unintended behavior

In __cpufreq_remove_dev_prepare(), the code which decides whether to remove
the sysfs link or nominate a new policy cpu, is governed by an if/else block
with a rather complex set of conditionals. Worse, they harbor a subtlety
which leads to certain unintended behavior.

The code looks like this:

if (cpu != policy->cpu && !frozen) {
sysfs_remove_link(&dev->kobj, "cpufreq");
} else if (cpus > 1) {
new_cpu = cpufreq_nominate_new_policy_cpu(...);
...
update_policy_cpu(..., new_cpu);
}

The original intention was:
If the CPU going offline is not policy->cpu, just remove the link.
On the other hand, if the CPU going offline is the policy->cpu itself,
handover the policy->cpu job to some other surviving CPU in that policy.

But because the 'if' condition also includes the 'frozen' check, now there
are *two* possibilities by which we can enter the 'else' block:

1. cpu == policy->cpu (intended)
2. cpu != policy->cpu && frozen (unintended)

Due to the second (unintended) scenario, we end up spuriously nominating
a CPU as the policy->cpu, even when the existing policy->cpu is alive and
well. This can cause problems further down the line, especially when we end
up nominating the same policy->cpu as the new one (ie., old == new),
because it totally confuses update_policy_cpu().

To avoid this mess, restructure the if/else block to only do what was
originally intended, and thus prevent any unwelcome surprises.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Tested-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 0d66b91e 11-Sep-2013 Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>

cpufreq: Fix crash in cpufreq-stats during suspend/resume

Stephen Warren reported that the cpufreq-stats code hits a NULL pointer
dereference during the second attempt to suspend a system. He also
pin-pointed the problem to commit 5302c3f "cpufreq: Perform light-weight
init/teardown during suspend/resume".

That commit actually ensured that the cpufreq-stats table and the
cpufreq-stats sysfs entries are *not* torn down (ie., not freed) during
suspend/resume, which makes it all the more surprising. However, it turns
out that the root-cause is not that we access an already freed memory, but
that the reference to the allocated memory gets moved around and we lose
track of that during resume, leading to the reported crash in a subsequent
suspend attempt.

In the suspend path, during CPU offline, the value of policy->cpu is updated
by choosing one of the surviving CPUs in that policy, as long as there is
atleast one CPU in that policy. And cpufreq_stats_update_policy_cpu() is
invoked to update the reference to the stats structure by assigning it to
the new CPU. However, in the resume path, during CPU online, we end up
assigning a fresh CPU as the policy->cpu, without letting cpufreq-stats
know about this. Thus the reference to the stats structure remains
(incorrectly) associated with the old CPU. So, in a subsequent suspend attempt,
during CPU offline, we end up accessing an incorrect location to get the
stats structure, which eventually leads to the NULL pointer dereference.

Fix this by letting cpufreq-stats know about the update of the policy->cpu
during CPU online in the resume path. (Also, move the update_policy_cpu()
function higher up in the file, so that __cpufreq_add_dev() can invoke
it).

Reported-and-tested-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 798282a8 09-Sep-2013 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Revert "cpufreq: make sure frequency transitions are serialized"

Commit 7c30ed5 (cpufreq: make sure frequency transitions are
serialized) attempted to serialize frequency transitions by
adding checks to the CPUFREQ_PRECHANGE and CPUFREQ_POSTCHANGE
notifications. However, it assumed that the notifications will
always originate from the driver's .target() callback, but they
also can be triggered by cpufreq_out_of_sync() and that leads to
warnings like this on some systems:

WARNING: CPU: 0 PID: 14543 at drivers/cpufreq/cpufreq.c:317
__cpufreq_notify_transition+0x238/0x260()
In middle of another frequency transition

accompanied by a call trace similar to this one:

[<ffffffff81720daa>] dump_stack+0x46/0x58
[<ffffffff8106534c>] warn_slowpath_common+0x8c/0xc0
[<ffffffff815b8560>] ? acpi_cpufreq_target+0x320/0x320
[<ffffffff81065436>] warn_slowpath_fmt+0x46/0x50
[<ffffffff815b1ec8>] __cpufreq_notify_transition+0x238/0x260
[<ffffffff815b33be>] cpufreq_notify_transition+0x3e/0x70
[<ffffffff815b345d>] cpufreq_out_of_sync+0x6d/0xb0
[<ffffffff815b370c>] cpufreq_update_policy+0x10c/0x160
[<ffffffff815b3760>] ? cpufreq_update_policy+0x160/0x160
[<ffffffff81413813>] cpufreq_set_cur_state+0x8c/0xb5
[<ffffffff814138df>] processor_set_cur_state+0xa3/0xcf
[<ffffffff8158e13c>] thermal_cdev_update+0x9c/0xb0
[<ffffffff8159046a>] step_wise_throttle+0x5a/0x90
[<ffffffff8158e21f>] handle_thermal_trip+0x4f/0x140
[<ffffffff8158e377>] thermal_zone_device_update+0x57/0xa0
[<ffffffff81415b36>] acpi_thermal_check+0x2e/0x30
[<ffffffff81415ca0>] acpi_thermal_notify+0x40/0xdc
[<ffffffff813e7dbd>] acpi_device_notify+0x19/0x1b
[<ffffffff813f8241>] acpi_ev_notify_dispatch+0x41/0x5c
[<ffffffff813e3fbe>] acpi_os_execute_deferred+0x25/0x32
[<ffffffff81081060>] process_one_work+0x170/0x4a0
[<ffffffff81082121>] worker_thread+0x121/0x390
[<ffffffff81082000>] ? manage_workers.isra.20+0x170/0x170
[<ffffffff81088fe0>] kthread+0xc0/0xd0
[<ffffffff81088f20>] ? flush_kthread_worker+0xb0/0xb0
[<ffffffff8173582c>] ret_from_fork+0x7c/0xb0
[<ffffffff81088f20>] ? flush_kthread_worker+0xb0/0xb0

For this reason, revert commit 7c30ed5 along with the fix 266c13d
(cpufreq: Fix serialization of frequency transitions) on top of it
and we will revisit the serialization problem later.

Reported-by: Alessandro Bono <alessandro.bono@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 5136fa56 06-Sep-2013 Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>

cpufreq: Use signed type for 'ret' variable, to store negative error values

There are places where the variable 'ret' is declared as unsigned int
and then used to store negative return values such as -EINVAL. Fix them
by declaring the variable as a signed quantity.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 56d07db2 06-Sep-2013 Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>

cpufreq: Remove temporary fix for race between CPU hotplug and sysfs-writes

Commit "cpufreq: serialize calls to __cpufreq_governor()" had been a temporary
and partial solution to the race condition between writing to a cpufreq sysfs
file and taking a CPU offline. Now that we have a proper and complete solution
to that problem, remove the temporary fix.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 4f750c93 06-Sep-2013 Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>

cpufreq: Synchronize the cpufreq store_*() routines with CPU hotplug

The functions that are used to write to cpufreq sysfs files (such as
store_scaling_max_freq()) are not hotplug safe. They can race with CPU
hotplug tasks and lead to problems such as trying to acquire an already
destroyed timer-mutex etc.

Eg:

__cpufreq_remove_dev()
__cpufreq_governor(policy, CPUFREQ_GOV_STOP);
policy->governor->governor(policy, CPUFREQ_GOV_STOP);
cpufreq_governor_dbs()
case CPUFREQ_GOV_STOP:
mutex_destroy(&cpu_cdbs->timer_mutex)
cpu_cdbs->cur_policy = NULL;
<PREEMPT>
store()
__cpufreq_set_policy()
__cpufreq_governor(policy, CPUFREQ_GOV_LIMITS);
policy->governor->governor(policy, CPUFREQ_GOV_LIMITS);
case CPUFREQ_GOV_LIMITS:
mutex_lock(&cpu_cdbs->timer_mutex); <-- Warning (destroyed mutex)
if (policy->max < cpu_cdbs->cur_policy->cur) <- cur_policy == NULL

So use get_online_cpus()/put_online_cpus() in the store_*() functions, to
synchronize with CPU hotplug. However, there is an additional point to note
here: some parts of the CPU teardown in the cpufreq subsystem are done in
the CPU_POST_DEAD stage, with cpu_hotplug.lock *released*. So, using the
get/put_online_cpus() functions alone is insufficient; we should also ensure
that we don't race with those latter steps in the hotplug sequence. We can
easily achieve this by checking if the CPU is online before proceeding with
the store, since the CPU would have been marked offline by the time the
CPU_POST_DEAD notifiers are executed.

Reported-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 1aee40ac 06-Sep-2013 Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>

cpufreq: Invoke __cpufreq_remove_dev_finish() after releasing cpu_hotplug.lock

__cpufreq_remove_dev_finish() handles the kobject cleanup for a CPU going
offline. But because we destroy the kobject towards the end of the CPU offline
phase, there are certain race windows where a task can try to write to a
cpufreq sysfs file (eg: using store_scaling_max_freq()) while we are taking
that CPU offline, and this can bump up the kobject refcount, which in turn might
hinder the CPU offline task from running to completion. (It can also cause
other more serious problems such as trying to acquire a destroyed timer-mutex
etc., depending on the exact stage of the cleanup at which the task managed to
take a new refcount).

To fix the race window, we will need to synchronize those store_*() call-sites
with CPU hotplug, using get_online_cpus()/put_online_cpus(). However, that
in turn can cause a total deadlock because it can end up waiting for the
CPU offline task to complete, with incremented refcount!

Write to sysfs CPU offline task
-------------- ----------------
kobj_refcnt++

Acquire cpu_hotplug.lock

get_online_cpus();

Wait for kobj_refcnt to drop to zero

**DEADLOCK**

A simple way to avoid this problem is to perform the kobject cleanup in the
CPU offline path, with the cpu_hotplug.lock *released*. That is, we can
perform the wait-for-kobj-refcnt-to-drop as well as the subsequent cleanup
in the CPU_POST_DEAD stage of CPU offline, which is run with cpu_hotplug.lock
released. Doing this helps us avoid deadlocks due to holding kobject refcounts
and waiting on each other on the cpu_hotplug.lock.

(Note: We can't move all of the cpufreq CPU offline steps to the
CPU_POST_DEAD stage, because certain things such as stopping the governors
have to be done before the outgoing CPU is marked offline. So retain those
parts in the CPU_DOWN_PREPARE stage itself).

Reported-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# cedb70af 06-Sep-2013 Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>

cpufreq: Split __cpufreq_remove_dev() into two parts

During CPU offline, the cpufreq core invokes __cpufreq_remove_dev()
to perform work such as stopping the cpufreq governor, clearing the
CPU from the policy structure etc, and finally cleaning up the
kobject.

There are certain subtle issues related to the kobject cleanup, and
it would be much easier to deal with them if we separate that part
from the rest of the cleanup-work in the CPU offline phase. So split
the __cpufreq_remove_dev() function into 2 parts: one that handles
the kobject cleanup, and the other that handles the rest of the work.

Reported-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 19c76303 31-Aug-2013 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: serialize calls to __cpufreq_governor()

We can't take a big lock around __cpufreq_governor() as this causes
recursive locking for some cases. But calls to this routine must be
serialized for every policy. Otherwise we can see some unpredictable
events.

For example, consider following scenario:

__cpufreq_remove_dev()
__cpufreq_governor(policy, CPUFREQ_GOV_STOP);
policy->governor->governor(policy, CPUFREQ_GOV_STOP);
cpufreq_governor_dbs()
case CPUFREQ_GOV_STOP:
mutex_destroy(&cpu_cdbs->timer_mutex)
cpu_cdbs->cur_policy = NULL;
<PREEMPT>
store()
__cpufreq_set_policy()
__cpufreq_governor(policy, CPUFREQ_GOV_LIMITS);
policy->governor->governor(policy, CPUFREQ_GOV_LIMITS);
case CPUFREQ_GOV_LIMITS:
mutex_lock(&cpu_cdbs->timer_mutex); <-- Warning (destroyed mutex)
if (policy->max < cpu_cdbs->cur_policy->cur) <- cur_policy == NULL

And so store() will eventually result in a crash if cur_policy is
NULL at this point.

Introduce an additional variable which would guarantee serialization
here.

Reported-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# f73d3933 31-Aug-2013 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: don't allow governor limits to be changed when it is disabled

__cpufreq_governor() returns with -EBUSY when governor is already
stopped and we try to stop it again, but when it is stopped we must
not allow calls to CPUFREQ_GOV_LIMITS event as well.

This patch adds this check in __cpufreq_governor().

Reported-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 5025d628 20-Aug-2013 Li Zhong <zhong@linux.vnet.ibm.com>

cpufreq: fix bad unlock balance on !CONFIG_SMP

This patch tries to fix lockdep complaint attached below.

It seems that we should always read acquire the cpufreq_rwsem,
whether CONFIG_SMP is enabled or not. And CONFIG_HOTPLUG_CPU
depends on CONFIG_SMP, so it seems we don't need CONFIG_SMP for the
code enabled by CONFIG_HOTPLUG_CPU.

[ 0.504191] =====================================
[ 0.504627] [ BUG: bad unlock balance detected! ]
[ 0.504627] 3.11.0-rc6-next-20130819 #1 Not tainted
[ 0.504627] -------------------------------------
[ 0.504627] swapper/1 is trying to release lock (cpufreq_rwsem) at:
[ 0.504627] [<ffffffff813d927a>] cpufreq_add_dev+0x13a/0x3e0
[ 0.504627] but there are no more locks to release!
[ 0.504627]
[ 0.504627] other info that might help us debug this:
[ 0.504627] 1 lock held by swapper/1:
[ 0.504627] #0: (subsys mutex#4){+.+.+.}, at: [<ffffffff8134a7bf>] subsys_interface_register+0x4f/0xe0
[ 0.504627]
[ 0.504627] stack backtrace:
[ 0.504627] CPU: 0 PID: 1 Comm: swapper Not tainted 3.11.0-rc6-next-20130819 #1
[ 0.504627] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
[ 0.504627] ffffffff813d927a ffff88007f847c98 ffffffff814c062b ffff88007f847cc8
[ 0.504627] ffffffff81098bce ffff88007f847cf8 ffffffff81aadc30 ffffffff813d927a
[ 0.504627] 00000000ffffffff ffff88007f847d68 ffffffff8109d0be 0000000000000006
[ 0.504627] Call Trace:
[ 0.504627] [<ffffffff813d927a>] ? cpufreq_add_dev+0x13a/0x3e0
[ 0.504627] [<ffffffff814c062b>] dump_stack+0x19/0x1b
[ 0.504627] [<ffffffff81098bce>] print_unlock_imbalance_bug+0xfe/0x110
[ 0.504627] [<ffffffff813d927a>] ? cpufreq_add_dev+0x13a/0x3e0
[ 0.504627] [<ffffffff8109d0be>] lock_release_non_nested+0x1ee/0x310
[ 0.504627] [<ffffffff81099d0e>] ? mark_held_locks+0xae/0x120
[ 0.504627] [<ffffffff811510cb>] ? kfree+0xcb/0x1d0
[ 0.504627] [<ffffffff813d77ea>] ? cpufreq_policy_free+0x4a/0x60
[ 0.504627] [<ffffffff813d927a>] ? cpufreq_add_dev+0x13a/0x3e0
[ 0.504627] [<ffffffff8109d2a4>] lock_release+0xc4/0x250
[ 0.504627] [<ffffffff8106c9f3>] up_read+0x23/0x40
[ 0.504627] [<ffffffff813d927a>] cpufreq_add_dev+0x13a/0x3e0
[ 0.504627] [<ffffffff8134a809>] subsys_interface_register+0x99/0xe0
[ 0.504627] [<ffffffff81b19f3b>] ? cpufreq_gov_dbs_init+0x12/0x12
[ 0.504627] [<ffffffff813d7f0d>] cpufreq_register_driver+0x9d/0x1d0
[ 0.504627] [<ffffffff81b19f3b>] ? cpufreq_gov_dbs_init+0x12/0x12
[ 0.504627] [<ffffffff81b1a039>] acpi_cpufreq_init+0xfe/0x1f8
[ 0.504627] [<ffffffff810002ba>] do_one_initcall+0xda/0x180
[ 0.504627] [<ffffffff81ae301e>] kernel_init_freeable+0x12c/0x1bb
[ 0.504627] [<ffffffff81ae2841>] ? do_early_param+0x8c/0x8c
[ 0.504627] [<ffffffff814b4dd0>] ? rest_init+0x140/0x140
[ 0.504627] [<ffffffff814b4dde>] kernel_init+0xe/0xf0
[ 0.504627] [<ffffffff814d029a>] ret_from_fork+0x7a/0xb0
[ 0.504627] [<ffffffff814b4dd0>] ? rest_init+0x140/0x140

Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
Acked-and-tested-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 1b274294 19-Aug-2013 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Use cpufreq_policy_list for iterating over policies

To iterate over all policies we currently iterate over all online
CPUs and then get the policy for each of them which is suboptimal.
Use the newly created cpufreq_policy_list for this purpose instead.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 474deff7 19-Aug-2013 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: remove cpufreq_policy_cpu per-cpu variable

cpufreq_policy_cpu per-cpu variables are used for storing the ID of
the CPU that manages the given CPU's policy. However, we also store
a policy pointer for each cpu in cpufreq_cpu_data, so the
cpufreq_policy_cpu information is simply redundant.

It is better to use cpufreq_cpu_data to retrieve a policy and get
policy->cpu from there, so make that happen everywhere and drop the
cpufreq_policy_cpu per-cpu variables which aren't necessary any more.

[rjw: Changelog]
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 9e9fd801 19-Aug-2013 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: remove unnecessary check in __cpufreq_governor()

We don't need to check if event is CPUFREQ_GOV_POLICY_INIT and put
governor module as we are sure event can only be START/STOP here.

Remove the useless check.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 9515f4d6 19-Aug-2013 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: remove policy from cpufreq_policy_list during suspend

cpufreq_policy_list is a list of active policies. We do remove
policies from this list when all CPUs belonging to that policy are
removed. But during system suspend we don't really free a policy
struct as it will be used again during resume, so we didn't remove
it from cpufreq_policy_list as well..

However, this is incorrect. We are saying this policy isn't valid
anymore and must not be referenced (though we haven't freed it), but
it can still be used by code that iterates over cpufreq_policy_list.

Remove policy from this list during system suspend as well.
Of course, we must add it back whenever the first CPU belonging to
that policy shows up.

[rjw: Changelog]
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# edab2fbc 19-Aug-2013 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Fix white space in __cpufreq_remove_dev()

Align closing brace '}' of an if block.

[rjw: Subject and changelog]
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 878f6e07 18-Aug-2013 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Revert "cpufreq: Use cpufreq_policy_list for iterating over policies"

Revert commit eb60852 (cpufreq: Use cpufreq_policy_list for iterating
over policies), because it breaks system suspend/resume on multiple
machines.

It either causes resume to block indefinitely or causes the BUG_ON()
in lock_policy_rwsem_##mode() to trigger on sysfs accesses to cpufreq
attributes.

Conflicts:
drivers/cpufreq/cpufreq.c


# 3de9bdeb 06-Aug-2013 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: improve error checking on return values of __cpufreq_governor()

The __cpufreq_governor() function can fail in rare cases especially
if there are bugs in cpufreq drivers. Thus we must stop processing
as soon as this routine fails, otherwise it may result in undefined
behavior.

This patch adds error checking code whenever this routine is called
from any place.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 6eed9404 06-Aug-2013 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Use rwsem for protecting critical sections

Critical sections of the cpufreq core are protected with the help of
the driver module owner's refcount, which isn't the correct approach,
because it causes rmmod to return an error when some routine has
updated that refcount.

Let's use rwsem for this purpose instead. Only
cpufreq_unregister_driver() will use write sem
and everybody else will use read sem.

[rjw: Subject & changelog]
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# fe492f3f 06-Aug-2013 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Fix broken usage of governor->owner's refcount

The cpufreq governor owner refcount usage is broken. We should only
increment that refcount when a CPUFREQ_GOV_POLICY_INIT event has come
and it should only be decremented if CPUFREQ_GOV_POLICY_EXIT has come.

Currently, there can be situations where the governor is in use, but
we have allowed it to be unloaded which may result in undefined
behavior. Let's fix it.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# eb608521 06-Aug-2013 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Use cpufreq_policy_list for iterating over policies

To iterate over all policies we currently iterate over all CPUs and
then get the policy for each of them. Let's use the newly created
cpufreq_policy_list for this purpose.

[rjw: Changelog]
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# c88a1f8b 06-Aug-2013 Lukasz Majewski <l.majewski@samsung.com>

cpufreq: Store cpufreq policies in a list

Policies available in the cpufreq framework are now linked together.
They are accessible via cpufreq_policy_list defined in the cpufreq
core.

[rjw: Fix from Yinghai Lu folded in]
Signed-off-by: Lukasz Majewski <l.majewski@samsung.com>
Signed-off-by: Myungjoo Ham <myungjoo.ham@samsung.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# d5b73cd8 06-Aug-2013 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Use sizeof(*ptr) convetion for computing sizes

Chapter 14 of Documentation/CodingStyle says:

The preferred form for passing a size of a struct is the following:

p = kmalloc(sizeof(*p), ...);

The alternative form where struct name is spelled out hurts
readability and introduces an opportunity for a bug when the pointer
variable type is changed but the corresponding sizeof that is passed
to a memory allocator is not.

This wasn't followed consistently in drivers/cpufreq, let's make it
more consistent by always following this rule.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 3a3e9e06 06-Aug-2013 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Give consistent names to cpufreq_policy objects

They are called policy, cur_policy, new_policy, data, etc. Just call
them policy wherever possible.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 5ff0a268 06-Aug-2013 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Clean up header files included in the core

This patch addresses the following issues in the header files in the
cpufreq core:
- Include headers in ascending order, so that we don't add same
many times by mistake.
- <asm/> must be included after <linux/>, so that they override
whatever they need to.
- Remove unnecessary includes.
- Don't include files already included by cpufreq.h or
cpufreq_governor.h.

[rjw: Changelog]
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# d8d3b471 03-Aug-2013 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Pass policy to cpufreq_add_policy_cpu()

The caller of cpufreq_add_policy_cpu() already has a pointer to the
policy structure and there is no need to look it up again in
cpufreq_add_policy_cpu(). Let's pass it directly.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 10659ab7 03-Aug-2013 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq: Avoid double kobject_put() for the same kobject in error code path

The only case triggering a jump to the err_out_unregister label in
__cpufreq_add_dev() is when cpufreq_add_dev_interface() fails.
However, if cpufreq_add_dev_interface() fails, it calls kobject_put()
for the policy kobject in its error code path and since that causes
the kobject's refcount to become 0, the additional kobject_put() for
the same kobject under err_out_unregister and the
wait_for_completion() following it are pointless, so drop them.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>


# 71c3461e 03-Aug-2013 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq: Do not hold driver module references for additional policy CPUs

The cpufreq core is a little inconsistent in the way it uses the
driver module refcount.

Namely, if __cpufreq_add_dev() is called for a CPU that doesn't
share the policy object with any other CPUs, the driver module
refcount it grabs to start with will be dropped by it before
returning and will be equal to whatever it had been before that
function was invoked.

However, if the given CPU does share the policy object with other
CPUs, either cpufreq_add_policy_cpu() is called to link the new CPU
to the existing policy, or cpufreq_add_dev_symlink() is used to link
the other CPUs sharing the policy with it to the just created policy
object. In that case, because both cpufreq_add_policy_cpu() and
cpufreq_add_dev_symlink() call cpufreq_cpu_get() for the given
policy (the latter possibly many times) without the balancing
cpufreq_cpu_put() (unless there is an error), the driver module
refcount will be left by __cpufreq_add_dev() with a nonzero value
(different from the initial one).

To remove that inconsistency make cpufreq_add_policy_cpu() execute
cpufreq_cpu_put() for the given policy before returning, which
decrements the driver module refcount so that it will be equal to its
initial value after __cpufreq_add_dev() returns. Also remove the
cpufreq_cpu_get() call from cpufreq_add_dev_symlink(), since both the
policy refcount and the driver module refcount are nonzero when it is
called and they don't need to be bumped up by it.

Accordingly, drop the cpufreq_cpu_put() from __cpufreq_remove_dev(),
since it is only necessary to balance the cpufreq_cpu_get() called
by cpufreq_add_policy_cpu() or cpufreq_add_dev_symlink().

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>


# 308b60e7 31-Jul-2013 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Don't pass CPU to cpufreq_add_dev_{symlink|interface}()

Pointer to struct cpufreq_policy is already passed to these routines
and we don't need to send policy->cpu to them as well. So, get rid
of this extra argument and use policy->cpu everywhere.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# e8fdde10 31-Jul-2013 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Remove extra variables from cpufreq_add_dev_symlink()

We call cpufreq_cpu_get() in cpufreq_add_dev_symlink() to increase usage
refcount of policy, but not to get a policy for the given CPU. So, we
don't really need to capture the return value of this routine. We can
simply use policy passed as an argument to cpufreq_add_dev_symlink().

Moreover debug print is rewritten to make it more clear.

[rjw: Changelog]
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 5302c3fb 29-Jul-2013 Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>

cpufreq: Perform light-weight init/teardown during suspend/resume

Now that we have the infrastructure to perform a light-weight init/tear-down,
use that in the cpufreq CPU hotplug notifier when invoked from the
suspend/resume path.

This also ensures that the file permissions of the cpufreq sysfs files are
preserved across suspend/resume, something which commit a66b2e (cpufreq:
Preserve sysfs files across suspend/resume) originally intended to do, but
had to be reverted due to other problems.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 8414809c 29-Jul-2013 Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>

cpufreq: Preserve policy structure across suspend/resume

To perform light-weight cpu-init and teardown in the cpufreq subsystem
during suspend/resume, we need to separate out the 2 main functionalities
of the cpufreq CPU hotplug callbacks, as outlined below:

1. Init/tear-down of core cpufreq and CPU-specific components, which are
critical to the correct functioning of the cpufreq subsystem.

2. Init/tear-down of cpufreq sysfs files during suspend/resume.

The first part requires accurate updates to the policy structure such as
its ->cpus and ->related_cpus masks, whereas the second part requires that
the policy->kobj structure is not released or re-initialized during
suspend/resume.

To handle both these requirements, we need to allow updates to the policy
structure throughout suspend/resume, but prevent the structure from getting
freed up. Also, we must have a mechanism by which the cpu-up callbacks can
restore the policy structure, without allocating things afresh. (That also
helps avoid memory leaks).

To achieve this, we use 2 schemes:
a. Use a fallback per-cpu storage area for preserving the policy structures
during suspend, so that they can be restored during resume appropriately.

b. Use the 'frozen' flag to determine when to free or allocate the policy
structure vs when to restore the policy from the saved fallback storage.
Thus we can successfully preserve the structure across suspend/resume.

Effectively, this helps us complete the separation of the 'light-weight'
and the 'full' init/tear-down sequences in the cpufreq subsystem, so that
this can be made use of in the suspend/resume scenario.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# a82fab29 29-Jul-2013 Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>

cpufreq: Introduce a flag ('frozen') to separate full vs temporary init/teardown

During suspend/resume we would like to do a light-weight init/teardown of
CPUs in the cpufreq subsystem and preserve certain things such as sysfs files
etc across suspend/resume transitions. Add a flag called 'frozen' to help
distinguish the full init/teardown sequence from the light-weight one.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# f9ba680d 29-Jul-2013 Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>

cpufreq: Extract the handover of policy cpu to a helper function

During cpu offline, when the policy->cpu is going down, some other CPU
present in the policy->cpus mask is nominated as the new policy->cpu.
Extract this functionality from __cpufreq_remove_dev() and implement
it in a helper function. This helps in upcoming code reorganization.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# e18f1682 29-Jul-2013 Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>

cpufreq: Extract non-interface related stuff from cpufreq_add_dev_interface

cpufreq_add_dev_interface() includes the work of exposing the interface
to the device, as well as a lot of unrelated stuff. Move the latter to
cpufreq_add_dev(), where it is more appropriate.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# e9698cc5 29-Jul-2013 Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>

cpufreq: Add helper to perform alloc/free of policy structure

Separate out the allocation of the cpufreq policy structure (along with
its error handling) to a helper function. This makes the code easier to
read and also helps with some upcoming code reorganization.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 23d32899 29-Jul-2013 Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>

cpufreq: Fix misplaced call to cpufreq_update_policy()

The call to cpufreq_update_policy() is placed in the CPU hotplug callback
of cpufreq_stats, which has a higher priority than the CPU hotplug callback
of cpufreq-core. As a result, during CPU_ONLINE/CPU_ONLINE_FROZEN, we end up
calling cpufreq_update_policy() *before* calling cpufreq_add_dev() !
And for uninitialized CPUs, it just returns silently, not doing anything.

To add to that, cpufreq_stats is not even the right place to call
cpufreq_update_policy() to begin with. The cpufreq core ought to handle
this in its own callback, from an elegance/relevance perspective.

So move the invocation of cpufreq_update_policy() to cpufreq_cpu_callback,
and place it *after* cpufreq_add_dev().

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 2a998599 29-Jul-2013 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq: Fix cpufreq driver module refcount balance after suspend/resume

Since cpufreq_cpu_put() called by __cpufreq_remove_dev() drops the
driver module refcount, __cpufreq_remove_dev() causes that refcount
to become negative for the cpufreq driver after a suspend/resume
cycle.

This is not the only bad thing that happens there, however, because
kobject_put() should only be called for the policy kobject at this
point if the CPU is not the last one for that policy.

Namely, if the given CPU is the last one for that policy, the
policy kobject's refcount should be 1 at this point, as set by
cpufreq_add_dev_interface(), and only needs to be dropped once for
the kobject to go away. This actually happens under the cpu == 1
check, so it need not be done before by cpufreq_cpu_put().

On the other hand, if the given CPU is not the last one for that
policy, this means that cpufreq_add_policy_cpu() has been called
at least once for that policy and cpufreq_cpu_get() has been
called for it too. To balance that cpufreq_cpu_get(), we need to
call cpufreq_cpu_put() in that case.

Thus, to fix the described problem and keep the reference
counters balanced in both cases, move the cpufreq_cpu_get() call
in __cpufreq_remove_dev() to the code path executed only for
CPUs that share the policy with other CPUs.

Reported-and-tested-by: Toralf Förster <toralf.foerster@gmx.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Cc: 3.10+ <stable@vger.kernel.org>


# cffe4e0e 05-Jun-2013 Stratos Karafotis <stratosk@semaphore.gr>

cpufreq: Remove unused function __cpufreq_driver_getavg()

The target frequency calculation method in the ondemand governor has
changed and it is now independent of the measured average frequency.
Consequently, the __cpufreq_driver_getavg() function and getavg
member of struct cpufreq_driver are not used any more, so drop them.

[rjw: Changelog]
Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 2760984f 19-Jun-2013 Paul Gortmaker <paul.gortmaker@windriver.com>

cpufreq: delete __cpuinit usage from all cpufreq files

The __cpuinit type of throwaway sections might have made sense
some time ago when RAM was more constrained, but now the savings
do not offset the cost and complications. For example, the fix in
commit 5e427ec2d0 ("x86: Fix bit corruption at CPU resume time")
is a good example of the nasty type of bugs that can be created
with improper use of the various __init prefixes.

After a discussion on LKML[1] it was decided that cpuinit should go
the way of devinit and be phased out. Once all the users are gone,
we can then finally remove the macros themselves from linux/init.h.

This removes all the drivers/cpufreq uses of the __cpuinit macros
from all C files.

[1] https://lkml.org/lkml/2013/5/20/589

[v2: leave 2nd lines of args misaligned as requested by Viresh]
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: cpufreq@vger.kernel.org
Cc: linux-pm@vger.kernel.org
Acked-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>


# aae760ed 11-Jul-2013 Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>

cpufreq: Revert commit a66b2e to fix suspend/resume regression

commit a66b2e (cpufreq: Preserve sysfs files across suspend/resume)
has unfortunately caused several things in the cpufreq subsystem to
break subtly after a suspend/resume cycle.

The intention of that patch was to retain the file permissions of the
cpufreq related sysfs files across suspend/resume. To achieve that,
the commit completely removed the calls to cpufreq_add_dev() and
__cpufreq_remove_dev() during suspend/resume transitions. But the
problem is that those functions do 2 kinds of things:
1. Low-level initialization/tear-down that are critical to the
correct functioning of cpufreq-core.
2. Kobject and sysfs related initialization/teardown.

Ideally we should have reorganized the code to cleanly separate these
two responsibilities, and skipped only the sysfs related parts during
suspend/resume. Since we skipped the entire callbacks instead (which
also included some CPU and cpufreq-specific critical components),
cpufreq subsystem started behaving erratically after suspend/resume.

So revert the commit to fix the regression. We'll revisit and address
the original goal of that commit separately, since it involves quite a
bit of careful code reorganization and appears to be non-trivial.

(While reverting the commit, note that another commit f51e1eb
(cpufreq: Fix cpufreq regression after suspend/resume) already
reverted part of the original set of changes. So revert only the
remaining ones).

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Tested-by: Paul Bolle <pebolle@tiscali.nl>
Cc: 3.10+ <stable@vger.kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 266c13d7 02-Jul-2013 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Fix serialization of frequency transitions

Commit 7c30ed ("cpufreq: make sure frequency transitions are serialized")
interacts poorly with systems that have a single core freqency for all
cores. On such systems we have a single policy for all cores with
several CPUs. When we do a frequency transition the governor calls the
pre and post change notifiers which causes cpufreq_notify_transition()
per CPU. Since the policy is the same for all of them all CPUs after
the first and the warnings added are generated by checking a per-policy
flag the warnings will be triggered for all cores after the first.

Fix this by allowing notifier to be called for n times. Where n is the number of
cpus in policy->cpus.

Reported-and-tested-by: Mark Brown <broonie@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# f4fd3797 27-Jun-2013 Lan Tianyu <tianyu.lan@intel.com>

acpi-cpufreq: Add new sysfs attribute freqdomain_cpus

Commits fcf8058 (cpufreq: Simplify cpufreq_add_dev()) and aa77a52
(cpufreq: acpi-cpufreq: Don't set policy->related_cpus from .init())
changed the contents of the "related_cpus" sysfs attribute on systems
where acpi-cpufreq is used and user space can't get the list of CPUs
which are in the same hardware coordination CPU domain (provided by
the ACPI AML method _PSD) via "related_cpus" any more.

To make up for that loss add a new sysfs attribute "freqdomian_cpus"
for the acpi-cpufreq driver which exposes the list of CPUs in the
same domain regardless of whether it is coordinated by hardware or
software.

[rjw: Changelog, documentation]
References: https://bugzilla.kernel.org/show_bug.cgi?id=58761
Reported-by: Jean-Philippe Halimi <jean-philippe.halimi@exascale-computing.eu>
Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 7c30ed53 18-Jun-2013 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: make sure frequency transitions are serialized

Whenever we are changing frequency of a cpu, we are calling PRECHANGE and
POSTCHANGE notifiers. They must be serialized. i.e. PRECHANGE or POSTCHANGE
shouldn't be called twice contiguously.

This can happen due to bugs in users of __cpufreq_driver_target() or actual
cpufreq drivers who are sending these notifiers.

This patch adds some protection against this. Now, we keep track of the last
transaction and see if something went wrong.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 0956df9c 19-Jun-2013 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: make __cpufreq_notify_transition() static

__cpufreq_notify_transition() is used only in cpufreq.c,
make it static.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# bb176f7d 19-Jun-2013 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Fix minor formatting issues

There were a few noticeable formatting issues in core cpufreq code.
This cleans them up to make code look better. The changes include:
- Whitespace cleanup.
- Rearrangements of code.
- Multiline comments fixes.
- Formatting changes to fit 80 columns.

Copyright information in cpufreq.c is also updated to include my name
for 2013.

[rjw: Changelog]
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 95731ebb 19-Jun-2013 Xiaoguang Chen <chenxg@marvell.com>

cpufreq: Fix governor start/stop race condition

Cpufreq governors' stop and start operations should be carried out
in sequence. Otherwise, there will be unexpected behavior, like in
the example below.

Suppose there are 4 CPUs and policy->cpu=CPU0, CPU1/2/3 are linked
to CPU0. The normal sequence is:

1) Current governor is userspace. An application tries to set the
governor to ondemand. It will call __cpufreq_set_policy() in
which it will stop the userspace governor and then start the
ondemand governor.

2) Current governor is userspace. The online of CPU3 runs on CPU0.
It will call cpufreq_add_policy_cpu() in which it will first
stop the userspace governor, and then start it again.

If the sequence of the above two cases interleaves, it becomes:

1) Application stops userspace governor
2) Hotplug stops userspace governor

which is a problem, because the governor shouldn't be stopped twice
in a row. What happens next is:

3) Application starts ondemand governor
4) Hotplug starts a governor

In step 4, the hotplug is supposed to start the userspace governor,
but now the governor has been changed by the application to ondemand,
so the ondemand governor is started once again, which is incorrect.

The solution is to prevent policy governors from being stopped
multiple times in a row. A governor should only be stopped once for
one policy. After it has been stopped, no more governor stop
operations should be executed.

Also add a mutex to serialize governor operations.

[rjw: Changelog. And you owe me a beverage of my choice.]
Signed-off-by: Xiaoguang Chen <chenxg@marvell.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# a262e94c 31-May-2013 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: remove unnecessary cpufreq_cpu_{get|put}() calls

struct cpufreq_policy is already passed as argument to some routines
like: __cpufreq_driver_getavg() and so we don't really need to do
cpufreq_cpu_get() before and cpufreq_cpu_put() in them to get a
policy structure.

Remove them.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 2361be23 17-May-2013 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Don't create empty /sys/devices/system/cpu/cpufreq directory

When we don't have any file in cpu/cpufreq directory we shouldn't
create it. Specially with the introduction of per-policy governor
instance patchset, even governors are moved to
cpu/cpu*/cpufreq/governor-name directory and so this directory is
just not required.

Lets have it only when required.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 72a4ce34 17-May-2013 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Move get_cpu_idle_time() to cpufreq.c

Governors other than ondemand and conservative can also use
get_cpu_idle_time() and they aren't required to compile
cpufreq_governor.c. So, move these independent routines to
cpufreq.c instead.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 944e9a03 15-May-2013 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: governors: Move get_governor_parent_kobj() to cpufreq.c

get_governor_parent_kobj() can be used by any governor, generic
cpufreq governors or platform specific ones and so must be present in
cpufreq.c instead of cpufreq_governor.c.

This patch moves it to cpufreq.c. This also adds
EXPORT_SYMBOL_GPL(get_governor_parent_kobj) so that modules can use
this function too.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 3f869d6d 15-May-2013 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Add EXPORT_SYMBOL_GPL for have_governor_per_policy

This patch adds: EXPORT_SYMBOL_GPL(have_governor_per_policy), so that
this routine can be used by modules too.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 955ef483 15-May-2013 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Drop rwsem lock around CPUFREQ_GOV_POLICY_EXIT

With the rwsem lock around
__cpufreq_governor(policy, CPUFREQ_GOV_POLICY_EXIT), we
get circular dependency when we call sysfs_remove_group().

======================================================
[ INFO: possible circular locking dependency detected ]
3.9.0-rc7+ #15 Not tainted
-------------------------------------------------------
cat/2387 is trying to acquire lock:
(&per_cpu(cpu_policy_rwsem, cpu)){+++++.}, at: [<c02f6179>] lock_policy_rwsem_read+0x25/0x34

but task is already holding lock:
(s_active#41){++++.+}, at: [<c00f9bf7>] sysfs_read_file+0x4f/0xcc

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #1 (s_active#41){++++.+}:
[<c0055a79>] lock_acquire+0x61/0xbc
[<c00fabf1>] sysfs_addrm_finish+0xc1/0x128
[<c00f9819>] sysfs_hash_and_remove+0x35/0x64
[<c00fbe6f>] remove_files.isra.0+0x1b/0x24
[<c00fbea5>] sysfs_remove_group+0x2d/0xa8
[<c02f9a0b>] cpufreq_governor_interactive+0x13b/0x35c
[<c02f61df>] __cpufreq_governor+0x2b/0x8c
[<c02f6579>] __cpufreq_set_policy+0xa9/0xf8
[<c02f6b75>] store_scaling_governor+0x61/0x100
[<c02f6f4d>] store+0x39/0x60
[<c00f9b81>] sysfs_write_file+0xed/0x114
[<c00b3fd1>] vfs_write+0x65/0xd8
[<c00b424b>] sys_write+0x2f/0x50
[<c000cdc1>] ret_fast_syscall+0x1/0x52

-> #0 (&per_cpu(cpu_policy_rwsem, cpu)){+++++.}:
[<c0055253>] __lock_acquire+0xef3/0x13dc
[<c0055a79>] lock_acquire+0x61/0xbc
[<c03ee1f5>] down_read+0x25/0x30
[<c02f6179>] lock_policy_rwsem_read+0x25/0x34
[<c02f6edd>] show+0x21/0x58
[<c00f9c0f>] sysfs_read_file+0x67/0xcc
[<c00b40a7>] vfs_read+0x63/0xd8
[<c00b41fb>] sys_read+0x2f/0x50
[<c000cdc1>] ret_fast_syscall+0x1/0x52

other info that might help us debug this:

Possible unsafe locking scenario:

CPU0 CPU1
---- ----
lock(s_active#41);
lock(&per_cpu(cpu_policy_rwsem, cpu));
lock(s_active#41);
lock(&per_cpu(cpu_policy_rwsem, cpu));

*** DEADLOCK ***

2 locks held by cat/2387:
#0: (&buffer->mutex){+.+.+.}, at: [<c00f9bcd>] sysfs_read_file+0x25/0xcc
#1: (s_active#41){++++.+}, at: [<c00f9bf7>] sysfs_read_file+0x4f/0xcc

stack backtrace:
[<c0011d55>] (unwind_backtrace+0x1/0x9c) from [<c03e9a09>] (print_circular_bug+0x19d/0x1e8)
[<c03e9a09>] (print_circular_bug+0x19d/0x1e8) from [<c0055253>] (__lock_acquire+0xef3/0x13dc)
[<c0055253>] (__lock_acquire+0xef3/0x13dc) from [<c0055a79>] (lock_acquire+0x61/0xbc)
[<c0055a79>] (lock_acquire+0x61/0xbc) from [<c03ee1f5>] (down_read+0x25/0x30)
[<c03ee1f5>] (down_read+0x25/0x30) from [<c02f6179>] (lock_policy_rwsem_read+0x25/0x34)
[<c02f6179>] (lock_policy_rwsem_read+0x25/0x34) from [<c02f6edd>] (show+0x21/0x58)
[<c02f6edd>] (show+0x21/0x58) from [<c00f9c0f>] (sysfs_read_file+0x67/0xcc)
[<c00f9c0f>] (sysfs_read_file+0x67/0xcc) from [<c00b40a7>] (vfs_read+0x63/0xd8)
[<c00b40a7>] (vfs_read+0x63/0xd8) from [<c00b41fb>] (sys_read+0x2f/0x50)
[<c00b41fb>] (sys_read+0x2f/0x50) from [<c000cdc1>] (ret_fast_syscall+0x1/0x52)

This lock isn't required while calling __cpufreq_governor(policy,
CPUFREQ_GOV_POLICY_EXIT). Remove it.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# a66b2e50 15-May-2013 Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>

cpufreq: Preserve sysfs files across suspend/resume

The file permissions of cpufreq per-cpu sysfs files are not preserved
across suspend/resume because we internally go through the CPU
Hotplug path which reinitializes the file permissions on CPU online.

But the user is not supposed to know that we are using CPU hotplug
internally within suspend/resume (IOW, the kernel should not silently
wreck the user-set file permissions across a suspend cycle).
Therefore, we need to preserve the file permissions as they are
across suspend/resume.

The simplest way to achieve that is to just not touch the sysfs files
at all - ie., just ignore the CPU hotplug notifications in the
suspend/resume path (_FROZEN) in the cpufreq hotplug callback.

Reported-by: Robert Jarzmik <robert.jarzmik@intel.com>
Reported-by: Durgadoss R <durgadoss.r@intel.com>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# d96038e0 30-Apr-2013 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Issue CPUFREQ_GOV_POLICY_EXIT notifier before dropping policy refcount

We must call __cpufreq_governor(data, CPUFREQ_GOV_POLICY_EXIT) before
calling cpufreq_cpu_put(data), so that policy kobject have valid
fields. Otherwise, removing last online cpu of policy->cpus causes
this crash for ondemand/conservative governor.

[<c00fb076>] (sysfs_find_dirent+0xe/0xa8) from [<c00fb1bd>] (sysfs_get_dirent+0x21/0x58)
[<c00fb1bd>] (sysfs_get_dirent+0x21/0x58) from [<c00fc259>] (sysfs_remove_group+0x85/0xbc)
[<c00fc259>] (sysfs_remove_group+0x85/0xbc) from [<c02faad9>] (cpufreq_governor_dbs+0x369/0x4a0)
[<c02faad9>] (cpufreq_governor_dbs+0x369/0x4a0) from [<c02f66d7>] (__cpufreq_governor+0x2b/0x8c)
[<c02f66d7>] (__cpufreq_governor+0x2b/0x8c) from [<c02f6893>] (__cpufreq_remove_dev.isra.12+0x15b/0x250)
[<c02f6893>] (__cpufreq_remove_dev.isra.12+0x15b/0x250) from [<c03e91c7>] (cpufreq_cpu_callback+0x2f/0x3c)
[<c03e91c7>] (cpufreq_cpu_callback+0x2f/0x3c) from [<c0036fe1>] (notifier_call_chain+0x45/0x54)
[<c0036fe1>] (notifier_call_chain+0x45/0x54) from [<c001e611>] (__cpu_notify+0x1d/0x34)
[<c001e611>] (__cpu_notify+0x1d/0x34) from [<c03e5833>] (_cpu_down+0x63/0x1ac)
[<c03e5833>] (_cpu_down+0x63/0x1ac) from [<c03e5997>] (cpu_down+0x1b/0x30)
[<c03e5997>] (cpu_down+0x1b/0x30) from [<c03e60eb>] (store_online+0x27/0x54)
[<c03e60eb>] (store_online+0x27/0x54) from [<c0295629>] (dev_attr_store+0x11/0x18)
[<c0295629>] (dev_attr_store+0x11/0x18) from [<c00f9edd>] (sysfs_write_file+0xed/0x114)
[<c00f9edd>] (sysfs_write_file+0xed/0x114) from [<c00b42a9>] (vfs_write+0x65/0xd8)
[<c00b42a9>] (vfs_write+0x65/0xd8) from [<c00b4523>] (sys_write+0x2f/0x50)
[<c00b4523>] (sys_write+0x2f/0x50) from [<c000cdc1>] (ret_fast_syscall+0x1/0x52)

Of course this only impacted drivers which have
have_governor_per_policy set to true. i.e. big LITTLE cpufreq driver.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 1c3d85dd 28-Apr-2013 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq: Revert incorrect commit 5800043

Commit 5800043 (cpufreq: convert cpufreq_driver to using RCU) causes
the following call trace to be spit on boot:

BUG: sleeping function called from invalid context at /scratch/rafael/work/linux-pm/mm/slab.c:3179
in_atomic(): 0, irqs_disabled(): 0, pid: 292, name: systemd-udevd
2 locks held by systemd-udevd/292:
#0: (subsys mutex){+.+.+.}, at: [<ffffffff8146851a>] subsys_interface_register+0x4a/0xe0
#1: (rcu_read_lock){.+.+.+}, at: [<ffffffff81538210>] cpufreq_add_dev_interface+0x60/0x5e0
Pid: 292, comm: systemd-udevd Not tainted 3.9.0-rc8+ #323
Call Trace:
[<ffffffff81072c90>] __might_sleep+0x140/0x1f0
[<ffffffff811581c2>] kmem_cache_alloc+0x42/0x2b0
[<ffffffff811e7179>] sysfs_new_dirent+0x59/0x130
[<ffffffff811e63cb>] sysfs_add_file_mode+0x6b/0x110
[<ffffffff81538210>] ? cpufreq_add_dev_interface+0x60/0x5e0
[<ffffffff810a3254>] ? __lock_is_held+0x54/0x80
[<ffffffff811e647d>] sysfs_add_file+0xd/0x10
[<ffffffff811e6541>] sysfs_create_file+0x21/0x30
[<ffffffff81538280>] cpufreq_add_dev_interface+0xd0/0x5e0
[<ffffffff81538210>] ? cpufreq_add_dev_interface+0x60/0x5e0
[<ffffffffa000337f>] ? acpi_processor_get_platform_limit+0x32/0xbb [processor]
[<ffffffffa022f540>] ? do_drv_write+0x70/0x70 [acpi_cpufreq]
[<ffffffff810a3254>] ? __lock_is_held+0x54/0x80
[<ffffffff8106c97e>] ? up_read+0x1e/0x40
[<ffffffff8106e632>] ? __blocking_notifier_call_chain+0x72/0xc0
[<ffffffff81538dbd>] cpufreq_add_dev+0x62d/0xae0
[<ffffffff815389b8>] ? cpufreq_add_dev+0x228/0xae0
[<ffffffff81468569>] subsys_interface_register+0x99/0xe0
[<ffffffffa014d000>] ? 0xffffffffa014cfff
[<ffffffff81535d5d>] cpufreq_register_driver+0x9d/0x200
[<ffffffffa014d000>] ? 0xffffffffa014cfff
[<ffffffffa014d0e9>] acpi_cpufreq_init+0xe9/0x1000 [acpi_cpufreq]
[<ffffffff810002fa>] do_one_initcall+0x11a/0x170
[<ffffffff810b4b87>] load_module+0x1cf7/0x2920
[<ffffffff81322580>] ? ddebug_proc_open+0xb0/0xb0
[<ffffffff816baee0>] ? retint_restore_args+0xe/0xe
[<ffffffff810b5887>] sys_init_module+0xd7/0x120
[<ffffffff816bb6d2>] system_call_fastpath+0x16/0x1b

which is quite obvious, because that commit put (multiple instances
of) sysfs_create_file() under rcu_read_lock()/rcu_read_unlock(),
although sysfs_create_file() may cause memory to be allocated with
GFP_KERNEL and that may sleep, which is not permitted in RCU read
critical section.

Revert the buggy commit altogether along with some changes on top
of it.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 820c6ca2 21-Apr-2013 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Don't call __cpufreq_governor() for drivers without target()

Some cpufreq drivers implement their own governor and so don't need
us to call generic governors interface via __cpufreq_governor(). Few
recent commits haven't obeyed this law well and we saw some
regressions.

This patch is an attempt to fix the above issue.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reported-and-tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Tested-by: Dirk Brandewie <dirk.brandewie@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# e4969eba 11-Apr-2013 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Call __cpufreq_governor() with correct policy->cpus mask

__cpufreq_governor() must be called with a correct policy->cpus mask.
In __cpufreq_remove_dev() we initially clear policy->cpus with
cpumask_clear_cpu() and then call
__cpufreq_governor(policy, CPUFREQ_GOV_POLICY_EXIT). If the governor
is doing some per-cpu stuff in EXIT callback, this can create
uncertain behavior.

Generic governors in drivers/cpufreq/ doesn't do any per-cpu stuff
in EXIT callback and so we don't face any issues currently. But its
better to keep the code clean, so we don't face any issues in future.

Now, we call cpumask_clear_cpu() only when multiple cpus are managed
by policy.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 5800043b 04-Apr-2013 Nathan Zimmer <nzimmer@sgi.com>

cpufreq: convert cpufreq_driver to using RCU

We eventually would like to remove the rwlock cpufreq_driver_lock or
convert it back to a spinlock and protect the read sections with RCU.
The first step in that direction is to make cpufreq_driver use RCU.
I don't see an easy wasy to protect the cpufreq_cpu_data structure
with RCU, so I am leaving it with the rwlock for now since under
certain configs __cpufreq_cpu_get is a hot spot with 256+ cores.

[rjw: Subject, changelog, white space]
Signed-off-by: Nathan Zimmer <nzimmer@sgi.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# b43a7ffb 24-Mar-2013 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Notify all policy->cpus in cpufreq_notify_transition()

policy->cpus contains all online cpus that have single shared clock line. And
their frequencies are always updated together.

Many SMP system's cpufreq drivers take care of this in individual drivers but
the best place for this code is in cpufreq core.

This patch modifies cpufreq_notify_transition() to notify frequency change for
all cpus in policy->cpus and hence updates all users of this API.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Stephen Warren <swarren@nvidia.com>
Tested-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 4d5dcc42 27-Mar-2013 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: governor: Implement per policy instances of governors

Currently, there can't be multiple instances of single governor_type.
If we have a multi-package system, where we have multiple instances
of struct policy (per package), we can't have multiple instances of
same governor. i.e. We can't have multiple instances of ondemand
governor for multiple packages.

Governors directory in sysfs is created at /sys/devices/system/cpu/cpufreq/
governor-name/. Which again reflects that there can be only one
instance of a governor_type in the system.

This is a bottleneck for multicluster system, where we want different
packages to use same governor type, but with different tunables.

This patch uses the infrastructure provided by earlier patch and
implements init/exit routines for ondemand and conservative
governors.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 7bd353a9 27-Mar-2013 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Add per policy governor-init/exit infrastructure

Currently, there can't be multiple instances of single governor_type.
If we have a multi-package system, where we have multiple instances
of struct policy (per package), we can't have multiple instances of
same governor. i.e. We can't have multiple instances of ondemand
governor for multiple packages.

Governors directory in sysfs is created at /sys/devices/system/cpu/cpufreq/
governor-name/. Which again reflects that there can be only one
instance of a governor_type in the system.

This is a bottleneck for multicluster system, where we want different
packages to use same governor type, but with different tunables.

This patch is inclined towards providing this infrastructure. Because
we are required to allocate governor's resources dynamically now, we
must do it at policy creation and end. And so got
CPUFREQ_GOV_POLICY_INIT/EXIT.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 0d1857a1 22-Feb-2013 Nathan Zimmer <nzimmer@sgi.com>

cpufreq: Convert the cpufreq_driver_lock to a rwlock

This eliminates the contention I am seeing in __cpufreq_cpu_get.
It also nicely stages the lock to be replaced by the rcu.

Signed-off-by: Nathan Zimmer <nzimmer@sgi.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# fa69e33f 06-Feb-2013 Dirk Brandewie <dirk.brandewie@gmail.com>

cpufreq: Do not track governor name for scaling drivers with internal governors.

Scaling drivers that implement internal governors do not have governor
structures assocaited with them. Only track the name of the governor
associated with the CPU if the driver does not implement
cpufreq_driver.setpolicy()

Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# f6b0515b 06-Feb-2013 Dirk Brandewie <dirk.brandewie@gmail.com>

cpufreq: Only call cpufreq_out_of_sync() for driver that implement cpufreq_driver.target()

Scaling drivers that implement cpufreq_driver.setpolicy() have
internal governors that do not signal changes via
cpufreq_notify_transition() so the frequncy in the policy will almost
certainly be different than the current frequncy. Only call
cpufreq_out_of_sync() when the underlying driver implements
cpufreq_driver.target()

Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 9e21ba8b 06-Feb-2013 Dirk Brandewie <dirk.brandewie@gmail.com>

cpufreq: Retrieve current frequency from scaling drivers with internal governors

Scaling drivers that implement the cpufreq_driver.setpolicy() versus
the cpufreq_driver.target() interface do not set policy->cur.

Normally policy->cur is set during the call to cpufreq_driver.target()
when the frequnecy request is made by the governor.

If the scaling driver implements cpufreq_driver.setpolicy() and
cpufreq_driver.get() interfaces use cpufreq_driver.get() to retrieve
the current frequency.

Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 2eaa3e2d 06-Feb-2013 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Fix locking issues

cpufreq core uses two locks:
- cpufreq_driver_lock: General lock for driver and cpufreq_cpu_data array.
- cpu_policy_rwsemfix locking: per CPU reader-writer semaphore designed to cure
all cpufreq/hotplug/workqueue/etc related lock issues.

These locks were not used properly and are placed against their principle
(present before their definition) at various places. This patch is an attempt to
fix their use.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# fa1d8af4 07-Feb-2013 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Create a macro for unlock_policy_rwsem{read,write}

On the lines of macro: lock_policy_rwsem, we can create another macro for
unlock_policy_rwsem. Lets do it.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 65922465 06-Feb-2013 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Remove unused HOTPLUG_CPU code

Because the sibling cpu of any online cpu is identified very early in
cpufreq_add_dev(), below code is never executed. And so can be removed.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 8e53695f 06-Feb-2013 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: governors: Fix WARN_ON() for multi-policy platforms

On multi-policy systems there is a single instance of governor for both the
policies (if same governor is chosen for both policies). With the code update
from following patches:

8eeed09 cpufreq: governors: Get rid of dbs_data->enable field
b394058 cpufreq: governors: Reset tunables only for cpufreq_unregister_governor()

We are creating/removing sysfs directory of governor for for every call to
GOV_START and STOP. This would fail for multi-policy system as there is a
per-policy call to START/STOP.

This patch reuses the governor->initialized variable to detect total users of
governor.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 3361b7b1 04-Feb-2013 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Don't check cpu_online(policy->cpu)

policy->cpu or cpus in policy->cpus can't be offline anymore. And so we don't
need to check if they are online or not.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 73bf0fc2 05-Feb-2013 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Don't remove sysfs link for policy->cpu

"cpufreq" directory in policy->cpu is never created using
sysfs_create_link(), but using kobject_init_and_add(). And so we
shouldn't call sysfs_remove_link() for policy->cpu(). sysfs stuff
for policy->cpu is automatically removed when we call kobject_put()
for dying policy.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Tested-by: Dirk Brandewie <dirk.brandewie@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# b394058f 31-Jan-2013 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: governors: Reset tunables only for cpufreq_unregister_governor()

Currently, whenever governor->governor() is called for CPUFRREQ_GOV_START event
we reset few tunables of governor. Which isn't correct, as this routine is
called for every cpu hot-[un]plugging event. We should actually be resetting
these only when the governor module is removed and re-installed.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# fcf80582 29-Jan-2013 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Simplify cpufreq_add_dev()

Currently cpufreq_add_dev() firsts allocates policy, calls
driver->init() and then checks if this CPU is already managed or not.
And if it is already managed, its policy is freed.

We can save all this if we somehow know that CPU is managed or not in
advance. policy->related_cpus contains the list of all valid sibling
CPUs of policy->cpu. We can check this to see if the current CPU is
already managed.

From now on, platforms don't really need to set related_cpus from
their init() routines, as the same work is done by core too.

If a platform driver needs to set the related_cpus mask with some
additional CPUs, other than CPUs present in policy->cpus, they are
free to do it, though, as we don't override anything.

[rjw: Changelog]
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Tested-by: Shawn Guo <shawn.guo@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# b26f7204 28-Jan-2013 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Revert "cpufreq: Don't use cpu removed during cpufreq_driver_unregister"

This reverts commit 956f339 "cpufreq: Don't use cpu removed during
cpufreq_driver_unregister".

With the addition of the following commit, this change/variable is not
required any more:

commit b9ba2725343ae57add3f324dfa5074167f48de96
Author: Viresh Kumar <viresh.kumar@linaro.org>
Date: Mon Jan 14 13:23:03 2013 +0000

cpufreq: Simplify __cpufreq_remove_dev()

[rjw: Subject and changelog]
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 9d95046e 20-Jan-2013 Borislav Petkov <bp@suse.de>

cpufreq: Add a get_current_driver helper

Add a helper function to return cpufreq_driver->name.

Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# d5aaffa9 17-Jan-2013 Dirk Brandewie <dirk.j.brandewie@intel.com>

cpufreq: handle cpufreq being disabled for all exported function.

When disable_cpufreq() is called some exported functions are still
being used that do not have a check for cpufreq being disabled.

Add a disabled check into cpufreq_cpu_get() to return NULL if
cpufreq is disabled this covers most of the exported functions. For
the exported functions that do not call cpufreq_cpu_get() add an
explicit check.

Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# b8eed8af 14-Jan-2013 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Simplify __cpufreq_remove_dev()

__cpufreq_remove_dev() is called on multiple occasions: cpufreq_driver
unregister and cpu removals.

Current implementation of this routine is overly complex without much need. If
the cpu to be removed is the policy->cpu, we remove the policy first and add all
other cpus again from policy->cpus and then finally call __cpufreq_remove_dev()
again to remove the cpu to be deleted. Haahhhh..

There exist a simple solution to removal of a cpu:
- Simply use the old policy structure
- update its fields like: policy->cpu, etc.
- notify any users of cpufreq, which depend on changing policy->cpu

Hence this patch, which tries to implement the above theory. It is tested well
by myself on ARM big.LITTLE TC2 SoC, which has 5 cores (2 A15 and 3 A7). Both
A15's share same struct policy and all A7's share same policy structure.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Tested-by: Shawn Guo <shawn.guo@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 6954ca9c 11-Jan-2013 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Don't use cpu removed during cpufreq_driver_unregister

This is how the core works:
cpufreq_driver_unregister()
- subsys_interface_unregister()
- for_each_cpu() call cpufreq_remove_dev(), i.e. 0,1,2,3,4 when we
unregister.

cpufreq_remove_dev():
- Remove policy node
- Call cpufreq_add_dev() for next cpu, sharing mask with removed cpu.
i.e. When cpu 0 is removed, we call it for cpu 1. And when called for cpu 2,
we call it for cpu 3.
- cpufreq_add_dev() would call cpufreq_driver->init()
- init would return mask as AND of 2, 3 and 4 for cluster A7.
- cpufreq core would do online_cpu && policy->cpus
Here is the BUG(). Because cpu hasn't died but we have just unregistered
the cpufreq driver, online cpu would still have cpu 2 in it. And so thing
go bad again.

Solution: Keep cpumask of cpus that are registered with cpufreq core and clear
cpus when we get a call from subsys_interface_unregister() via
cpufreq_remove_dev().

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# f6a7409c 11-Jan-2013 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Notify governors when cpus are hot-[un]plugged

Because cpufreq core and governors worry only about the online cpus, if a cpu is
hot [un]plugged, we must notify governors about it, otherwise be ready to expect
something unexpected.

We already have notifiers in the form of CPUFREQ_GOV_START/CPUFREQ_GOV_STOP, we
just need to call them now.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 643ae6e8 11-Jan-2013 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Manage only online cpus

cpufreq core doesn't manage offline cpus and if driver->init() has returned
mask including offline cpus, it may result in unwanted behavior by cpufreq core
or governors.

We need to get only online cpus in this mask. There are two places to fix this
mask, cpufreq core and cpufreq driver. It makes sense to do this at common place
and hence is done in core.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 43720bd6 11-Jan-2013 Paul Gortmaker <paul.gortmaker@windriver.com>

PM / tracing: remove deprecated power trace API

The text in Documentation said it would be removed in 2.6.41;
the text in the Kconfig said removal in the 3.1 release. Either
way you look at it, we are well past both, so push it off a cliff.

Note that the POWER_CSTATE and the POWER_PSTATE are part of the
legacy tracing API. Remove all tracepoints which use these flags.
As can be seen from context, most already have a trace entry via
trace_cpu_idle anyways.

Also, the cpufreq/cpufreq.c PSTATE one is actually unpaired, as
compared to the CSTATE ones which all have a clear start/stop.
As part of this, the trace_power_frequency also becomes orphaned,
so it too is deleted.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# f55c9c26 30-Oct-2012 Jingoo Han <jg1.han@samsung.com>

cpufreq: Remove unnecessary initialization of a local variable

Remove an unnecessary initializer for the 'ret' variable in
__cpufreq_set_policy().

[rjw: Modified the subject and changelog slightly.]
Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 7249924e 30-Oct-2012 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Make sure target freq is within limits

__cpufreq_driver_target() must not pass target frequency beyond the
limits of current policy.

Today most of cpufreq platform drivers are doing this check in their
target routines. Why not move it to __cpufreq_driver_target()?

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 5a1c0228 30-Oct-2012 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Avoid calling cpufreq driver's target() routine if target_freq == policy->cur

Avoid calling cpufreq driver's target() routine if new frequency is same as
policies current frequency.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# da584455 25-Oct-2012 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Fix sparse warning by making local function static

cpufreq_disabled() is a local function, so should be marked static.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 0676f7f2 24-Oct-2012 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: return early from __cpufreq_driver_getavg()

There is no need to do cpufreq_get_cpu() and cpufreq_put_cpu() for drivers that
don't support getavg() routine.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# db701151 22-Oct-2012 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Improve debug prints

With debug options on, it is difficult to locate cpufreq core's debug prints.
Fix this by prefixing debug prints with KBUILD_MODNAME.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 4b972f0b 22-Oct-2012 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq / core: Fix printing of governor and driver name

Arrays for governer and driver name are of size CPUFREQ_NAME_LEN or 16.
i.e. 15 bytes for name and 1 for trailing '\0'.

When cpufreq driver print these names (for sysfs), it includes '\n' or ' ' in
the fmt string and still passes length as CPUFREQ_NAME_LEN. If the driver or
governor names are using all 15 fields allocated to them, then the trailing '\n'
or ' ' will never be printed. And so commands like:

root@linaro-developer# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_driver

will print something like:

cpufreq_foodrvroot@linaro-developer#

Fix this by increasing print length by one character.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 8bf1ac72 22-Oct-2012 Viresh Kumar <viresh.kumar@linaro.org>

cpufreq / core: Fix typo in comment describing show_bios_limit()

show_bios_limit is mistakenly written as show_scaling_driver in a comment
describing purpose of show_bios_limit() routine.

Fix it.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# a9144436 20-Jul-2012 Stephen Boyd <sboyd@codeaurora.org>

cpufreq: Fix sysfs deadlock with concurrent hotplug/frequency switch

Running one program that continuously hotplugs and replugs a cpu
concurrently with another program that continuously writes to the
scaling_setspeed node eventually deadlocks with:

=============================================
[ INFO: possible recursive locking detected ]
3.4.0 #37 Tainted: G W
---------------------------------------------
filemonkey/122 is trying to acquire lock:
(s_active#13){++++.+}, at: [<c01a3d28>] sysfs_remove_dir+0x9c/0xb4

but task is already holding lock:
(s_active#13){++++.+}, at: [<c01a22f0>] sysfs_write_file+0xe8/0x140

other info that might help us debug this:
Possible unsafe locking scenario:

CPU0
----
lock(s_active#13);
lock(s_active#13);

*** DEADLOCK ***

May be due to missing lock nesting notation

2 locks held by filemonkey/122:
#0: (&buffer->mutex){+.+.+.}, at: [<c01a2230>] sysfs_write_file+0x28/0x140
#1: (s_active#13){++++.+}, at: [<c01a22f0>] sysfs_write_file+0xe8/0x140

stack backtrace:
[<c0014fcc>] (unwind_backtrace+0x0/0x120) from [<c00ca600>] (validate_chain+0x6f8/0x1054)
[<c00ca600>] (validate_chain+0x6f8/0x1054) from [<c00cb778>] (__lock_acquire+0x81c/0x8d8)
[<c00cb778>] (__lock_acquire+0x81c/0x8d8) from [<c00cb9c0>] (lock_acquire+0x18c/0x1e8)
[<c00cb9c0>] (lock_acquire+0x18c/0x1e8) from [<c01a3ba8>] (sysfs_addrm_finish+0xd0/0x180)
[<c01a3ba8>] (sysfs_addrm_finish+0xd0/0x180) from [<c01a3d28>] (sysfs_remove_dir+0x9c/0xb4)
[<c01a3d28>] (sysfs_remove_dir+0x9c/0xb4) from [<c02d0e5c>] (kobject_del+0x10/0x38)
[<c02d0e5c>] (kobject_del+0x10/0x38) from [<c02d0f74>] (kobject_release+0xf0/0x194)
[<c02d0f74>] (kobject_release+0xf0/0x194) from [<c0565a98>] (cpufreq_cpu_put+0xc/0x24)
[<c0565a98>] (cpufreq_cpu_put+0xc/0x24) from [<c05683f0>] (store+0x6c/0x74)
[<c05683f0>] (store+0x6c/0x74) from [<c01a2314>] (sysfs_write_file+0x10c/0x140)
[<c01a2314>] (sysfs_write_file+0x10c/0x140) from [<c014af44>] (vfs_write+0xb0/0x128)
[<c014af44>] (vfs_write+0xb0/0x128) from [<c014b06c>] (sys_write+0x3c/0x68)
[<c014b06c>] (sys_write+0x3c/0x68) from [<c000e0e0>] (ret_fast_syscall+0x0/0x3c)

This is because store() in cpufreq.c indirectly calls
kobject_get() via cpufreq_cpu_get() and is the last one to call
kobject_put() via cpufreq_cpu_put(). Sysfs code should not call
kobject_get() or kobject_put() directly (see the comment around
sysfs_schedule_callback() for more information).

Fix this deadlock by introducing two new functions:

struct cpufreq_policy *cpufreq_cpu_get_sysfs(unsigned int cpu)
void cpufreq_cpu_put_sysfs(struct cpufreq_policy *data)

which do the same thing as cpufreq_cpu_{get,put}() but don't call
kobject functions.

To easily trigger this deadlock you can insert an msleep() with a
reasonably large value right after the fail label at the bottom
of the store() function in cpufreq.c and then write
scaling_setspeed in one task and offline the cpu in another. The
first task will hang and be detected by the hung task detector.

Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>


# 448c8b1d 13-Mar-2012 Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

provide disable_cpufreq() function to disable the API.

useful for disabling cpufreq altogether. The cpu frequency
scaling drivers and cpu frequency governors will fail to register.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Dave Jones <davej@redhat.com>


# a7b422cd 13-Mar-2012 Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

provide disable_cpufreq() function to disable the API.

useful for disabling cpufreq altogether. The cpu frequency
scaling drivers and cpu frequency governors will fail to register.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Dave Jones <davej@redhat.com>


# d08de0c1 03-Jan-2012 Afzal Mohammed <afzal@ti.com>

[CPUFREQ] update lpj only if frequency has changed

During scaling up of cpu frequency, loops_per_jiffy
is updated upon invoking PRECHANGE notifier.
If setting to new frequency fails in cpufreq driver,
lpj is left at incorrect value.

Hence update lpj only if cpu frequency is changed,
i.e. upon invoking POSTCHANGE notifier.

Penalty would be that during time period between
changing cpu frequency & invocation of POSTCHANGE
notifier, udelay(x) may not gurantee minimal delay
of 'x' us for frequency scaling up operation.

Perhaps a better solution would be to define
CPUFREQ_ABORTCHANGE & handle accordingly, but then
it would be more intrusive (using ABORTCHANGE may
help drivers also; if any has registered notifier
and expect POST for a PRECHANGE, their needs can
be taken care using ABORT)

Signed-off-by: Afzal Mohammed <afzal@ti.com>
Signed-off-by: Dave Jones <davej@redhat.com>


# 8a25a2fd 21-Dec-2011 Kay Sievers <kay.sievers@vrfy.org>

cpu: convert 'cpu' and 'machinecheck' sysdev_class to a regular subsystem

This moves the 'cpu sysdev_class' over to a regular 'cpu' subsystem
and converts the devices to regular devices. The sysdev drivers are
implemented as subsystem interfaces now.

After all sysdev classes are ported to regular driver core entities, the
sysdev implementation will be entirely removed from the kernel.

Userspace relies on events and generic sysfs subsystem infrastructure
from sysdev devices, which are made available with this conversion.

Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Borislav Petkov <bp@amd64.org>
Cc: Tigran Aivazian <tigran@aivazian.fsnet.co.uk>
Cc: Len Brown <lenb@kernel.org>
Cc: Zhang Rui <rui.zhang@intel.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>


# 3d737108 28-Jun-2011 Jesse Barnes <jbarnes@virtuousgeek.org>

cpufreq: expose a cpufreq_quick_get_max routine

This allows drivers and other code to get the max reported CPU frequency.
Initial use is for scaling ring frequency with GPU frequency in the i915
driver.

Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Keith Packard <keithp@keithp.com>


# 1a8e1463 04-May-2011 Kees Cook <keescook@chromium.org>

[CPUFREQ] remove redundant sprintf from request_module call.

Since format string handling is part of request_module, there is no
need to construct the module name. As such, drop the redundant sprintf
and heap usage.

Signed-off-by: Kees Cook <kees.cook@canonical.com>
Signed-off-by: Dave Jones <davej@redhat.com>


# 2d06d8c4 27-Mar-2011 Dominik Brodowski <linux@dominikbrodowski.net>

[CPUFREQ] use dynamic debug instead of custom infrastructure

With dynamic debug having gained the capability to report debug messages
also during the boot process, it offers a far superior interface for
debug messages than the custom cpufreq infrastructure. As a first step,
remove the old cpufreq_debug_printk() function and replace it with a call
to the generic pr_debug() function.

How can dynamic debug be used on cpufreq? You need a kernel which has
CONFIG_DYNAMIC_DEBUG enabled.

To enabled debugging during runtime, mount debugfs and

$ echo -n 'module cpufreq +p' > /sys/kernel/debug/dynamic_debug/control

for debugging the complete "cpufreq" module. To achieve the same goal during
boot, append

ddebug_query="module cpufreq +p"

as a boot parameter to the kernel of your choice.

For more detailled instructions, please see
Documentation/dynamic-debug-howto.txt

Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Dave Jones <davej@redhat.com>


# 27ecddc2 27-Apr-2011 Jacob Shin <jacob.shin@amd.com>

[CPUFREQ] CPU hotplug, re-create sysfs directory and symlinks

When we discover CPUs that are affected by each other's
frequency/voltage transitions, the first CPU gets a sysfs directory
created, and rest of the siblings get symlinks. Currently, when we
hotplug off only the first CPU, all of the symlinks and the sysfs
directory gets removed. Even though rest of the siblings are still
online and functional, they are orphaned, and no longer governed by
cpufreq.

This patch, given the above scenario, creates a sysfs directory for
the first sibling and symlinks for the rest of the siblings.

Please note the recursive call, it was rather too ugly to roll it
out. And the removal of redundant NULL setting (it is already taken
care of near the top of the function).

Signed-off-by: Jacob Shin <jacob.shin@amd.com>
Acked-by: Mark Langsdorf <mark.langsdorf@amd.com>
Reviewed-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Dave Jones <davej@redhat.com>
Cc: stable@kernel.org


# 25985edc 30-Mar-2011 Lucas De Marchi <lucas.demarchi@profusion.mobi>

Fix common misspellings

Fixes generated by 'codespell' and manually reviewed.

Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>


# e00e56df 23-Mar-2011 Rafael J. Wysocki <rjw@rjwysocki.net>

cpufreq: Use syscore_ops for boot CPU suspend/resume (v2)

The cpufreq subsystem uses sysdev suspend and resume for
executing cpufreq_suspend() and cpufreq_resume(), respectively,
during system suspend, after interrupts have been switched off on the
boot CPU, and during system resume, while interrupts are still off on
the boot CPU. In both cases the other CPUs are off-line at the
relevant point (either they have been switched off via CPU hotplug
during suspend, or they haven't been switched on yet during resume).
For this reason, although it may seem that cpufreq_suspend() and
cpufreq_resume() are executed for all CPUs in the system, they are
only called for the boot CPU in fact, which is quite confusing.

To remove the confusion and to prepare for elimiating sysdev
suspend and resume operations from the kernel enirely, convernt
cpufreq to using a struct syscore_ops object for the boot CPU
suspend and resume and rename the callbacks so that their names
reflect their purpose. In addition, put some explanatory remarks
into their kerneldoc comments.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>


# 7ca64e2d 10-Mar-2011 Rafael J. Wysocki <rjw@rjwysocki.net>

[CPUFREQ] Remove the pm_message_t argument from driver suspend

None of the existing cpufreq drivers uses the second argument of
its .suspend() callback (which isn't useful anyway), so remove it.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Dave Jones <davej@redhat.com>


# 8f5bc2ab 01-Mar-2011 Jiri Slaby <jirislaby@kernel.org>

[CPUFREQ] fix BUG on cpufreq policy init failure

cpufreq_register_driver sets cpufreq_driver to a structure owned (and
placed) in the caller's memory. If cpufreq policy fails in its ->init
function, sysdev_driver_register returns nonzero in
cpufreq_register_driver. Now, cpufreq_register_driver returns an error
without setting cpufreq_driver back to NULL.

Usually cpufreq policy modules are unloaded because they propagate the
error to the module init function and return that.

So a later access to any member of cpufreq_driver causes bugs like:
BUG: unable to handle kernel paging request at ffffffffa00270a0
IP: [<ffffffff8145eca3>] cpufreq_cpu_get+0x53/0xe0
PGD 1805067 PUD 1809063 PMD 1c3f90067 PTE 0
Oops: 0000 [#1] SMP
last sysfs file: /sys/devices/virtual/net/tun0/statistics/collisions
CPU 0
Modules linked in: ...
Pid: 5677, comm: thunderbird-bin Tainted: G W 2.6.38-rc4-mm1_64+ #1389 To be filled by O.E.M./To Be Filled By O.E.M.
RIP: 0010:[<ffffffff8145eca3>] [<ffffffff8145eca3>] cpufreq_cpu_get+0x53/0xe0
RSP: 0018:ffff8801aec37d98 EFLAGS: 00010086
RAX: 0000000000000202 RBX: 0000000000000000 RCX: 0000000000000001
RDX: ffffffffa00270a0 RSI: 0000000000001000 RDI: ffffffff8199ece8
...
Call Trace:
[<ffffffff8145f490>] cpufreq_quick_get+0x10/0x30
[<ffffffff8103f12b>] show_cpuinfo+0x2ab/0x300
[<ffffffff81136292>] seq_read+0xf2/0x3f0
[<ffffffff8126c5d3>] ? __strncpy_from_user+0x33/0x60
[<ffffffff8116850d>] proc_reg_read+0x6d/0xa0
[<ffffffff81116e53>] vfs_read+0xc3/0x180
[<ffffffff81116f5c>] sys_read+0x4c/0x90
[<ffffffff81030dbb>] system_call_fastpath+0x16/0x1b
...

It's all cause by weird fail path handling in cpufreq_register_driver.
To fix that, shuffle the code to do proper handling with gotos.

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Dave Jones <davej@redhat.com>


# 25e41933 03-Jan-2011 Thomas Renninger <trenn@suse.de>

perf: Clean up power events by introducing new, more generic ones

Add these new power trace events:

power:cpu_idle
power:cpu_frequency
power:machine_suspend

The old C-state/idle accounting events:
power:power_start
power:power_end

Have now a replacement (but we are still keeping the old
tracepoints for compatibility):

power:cpu_idle

and
power:power_frequency

is replaced with:
power:cpu_frequency

power:machine_suspend is newly introduced.

Jean Pihet has a patch integrated into the generic layer
(kernel/power/suspend.c) which will make use of it.

the type= field got removed from both, it was never
used and the type is differed by the event type itself.

perf timechart userspace tool gets adjusted in a separate patch.

Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Acked-by: Jean Pihet <jean.pihet@newoldbits.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: rjw@sisk.pl
LKML-Reference: <1294073445-14812-3-git-send-email-trenn@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
LKML-Reference: <1290072314-31155-2-git-send-email-trenn@suse.de>


# bec037aa 05-Aug-2010 Julia Lawall <julia@diku.dk>

[CPUFREQ] drivers/cpufreq: Adjust confusing if indentation

Indent the body of for_each_cpu.

The semantic match that finds this problem is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@r disable braces4@
position p1,p2;
statement S1,S2;
@@

(
if (...) { ... }
|
if (...) S1@p1 S2@p2
)

@script:python@
p1 << r.p1;
p2 << r.p2;
@@

if (p1[0].column == p2[0].column):
cocci.print_main("branch",p1)
cocci.print_secs("after",p2)
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Dave Jones <davej@redhat.com>


# 9c36f746 22-Jun-2010 Neal Buckendahl <nealb001@gmail.com>

[CPUFREQ] fix brace coding style issue.

This patch fixes up a brace warning found by the checkpatch.pl tool

Signed-off-by: Neal Buckendahl <nealb001@tbcnet.com>
Signed-off-by: Dave Jones <davej@redhat.com>


# 6f4f2723 20-Apr-2010 Thomas Renninger <trenn@suse.de>

[CPUFREQ] x86 cpufreq: Make trace_power_frequency cpufreq driver independent

and fix the broken case if a core's frequency depends on others.

trace_power_frequency was only implemented in a rather ungeneric way
in acpi-cpufreq driver's target() function only.
-> Move the call to trace_power_frequency to
cpufreq.c:cpufreq_notify_transition() where CPUFREQ_POSTCHANGE
notifier is triggered.
This will support power frequency tracing by all cpufreq drivers

trace_power_frequency did not trace frequency changes correctly when
the userspace governor was used or when CPU cores' frequency depend
on each other.
-> Moving this into the CPUFREQ_POSTCHANGE notifier and pass the cpu
which gets switched automatically fixes this.

Robert Schoene provided some important fixes on top of my initial
quick shot version which are integrated in this patch:
- Forgot some changes in power_end trace (TP_printk/variable names)
- Variable dummy in power_end must now be cpu_id
- Use static 64 bit variable instead of unsigned int for cpu_id

Signed-off-by: Thomas Renninger <trenn@suse.de>
CC: davej@redhat.com
CC: arjan@infradead.org
CC: linux-kernel@vger.kernel.org
CC: robert.schoene@tu-dresden.de
Tested-by: robert.schoene@tu-dresden.de
Signed-off-by: Dave Jones <davej@redhat.com>


# 226528c6 04-Mar-2010 Amerigo Wang <amwang@redhat.com>

[CPUFREQ] unexport (un)lock_policy_rwsem* functions

lock_policy_rwsem_* and unlock_policy_rwsem_* functions are scheduled
to be unexported when 2.6.33. Now there are no other callers of them
out of cpufreq.c, unexport them and make them static.

Signed-off-by: WANG Cong <amwang@redhat.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>


# cad70a6a 20-Jul-2010 Xiaotian Feng <dfeng@redhat.com>

[CPUFREQ] fix memory leak in cpufreq_add_dev

We didn't free policy->related_cpus in error path err_unlock_policy.
This is catched by following kmemleak report:

unreferenced object 0xffff88022a0b96d0 (size 512):
comm "modprobe", pid 886, jiffies 4294689177 (age 780.694s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<ffffffff8111ebe5>] create_object+0x186/0x281
[<ffffffff814fad4f>] kmemleak_alloc+0x60/0xa7
[<ffffffff8111127a>] kmem_cache_alloc_node_notrace+0x120/0x142
[<ffffffff81262e4f>] alloc_cpumask_var_node+0x2c/0xd7
[<ffffffff81262f0b>] alloc_cpumask_var+0x11/0x13
[<ffffffff81262f1c>] zalloc_cpumask_var+0xf/0x11
[<ffffffff8140fac0>] cpufreq_add_dev+0x11f/0x547
[<ffffffff81334bda>] sysdev_driver_register+0xc2/0x11d
[<ffffffff8140e334>] cpufreq_register_driver+0xcb/0x1b8
[<ffffffffa032e040>] 0xffffffffa032e040
[<ffffffff810021ba>] do_one_initcall+0x5e/0x15c
[<ffffffff81087f94>] sys_init_module+0xa6/0x1e6
[<ffffffff81009bc2>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff

Signed-off-by: Xiaotian Feng <dfeng@redhat.com>
Cc: Thomas Renninger <trenn@suse.de>
Cc: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Dave Jones <davej@redhat.com>


# ffe6275f 14-May-2010 Andrej Gelenberg <andrej.gelenberg@udo.edu>

[CPUFREQ] revert "[CPUFREQ] remove rwsem lock from CPUFREQ_GOV_STOP call (second call site)"

395913d0b1db37092ea3d9d69b832183b1dd84c5 ("[CPUFREQ] remove rwsem lock
from CPUFREQ_GOV_STOP call (second call site)") is not needed, because
there is no rwsem lock in cpufreq_ondemand and cpufreq_conservative
anymore. Lock should not be released until the work done.

Addresses https://bugzilla.kernel.org/show_bug.cgi?id=1594

Signed-off-by: Andrej Gelenberg <andrej.gelenberg@udo.edu>
Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Dave Jones <davej@redhat.com>


# 6f90388a 20-Jul-2010 Xiaotian Feng <dfeng@redhat.com>

[CPUFREQ] fix memory leak in cpufreq_add_dev

We didn't free policy->related_cpus in error path err_unlock_policy.
This is catched by following kmemleak report:

unreferenced object 0xffff88022a0b96d0 (size 512):
comm "modprobe", pid 886, jiffies 4294689177 (age 780.694s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<ffffffff8111ebe5>] create_object+0x186/0x281
[<ffffffff814fad4f>] kmemleak_alloc+0x60/0xa7
[<ffffffff8111127a>] kmem_cache_alloc_node_notrace+0x120/0x142
[<ffffffff81262e4f>] alloc_cpumask_var_node+0x2c/0xd7
[<ffffffff81262f0b>] alloc_cpumask_var+0x11/0x13
[<ffffffff81262f1c>] zalloc_cpumask_var+0xf/0x11
[<ffffffff8140fac0>] cpufreq_add_dev+0x11f/0x547
[<ffffffff81334bda>] sysdev_driver_register+0xc2/0x11d
[<ffffffff8140e334>] cpufreq_register_driver+0xcb/0x1b8
[<ffffffffa032e040>] 0xffffffffa032e040
[<ffffffff810021ba>] do_one_initcall+0x5e/0x15c
[<ffffffff81087f94>] sys_init_module+0xa6/0x1e6
[<ffffffff81009bc2>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff

Signed-off-by: Xiaotian Feng <dfeng@redhat.com>
Cc: Thomas Renninger <trenn@suse.de>
Cc: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Dave Jones <davej@redhat.com>


# accd8466 14-May-2010 Andrej Gelenberg <andrej.gelenberg@udo.edu>

[CPUFREQ] revert "[CPUFREQ] remove rwsem lock from CPUFREQ_GOV_STOP call (second call site)"

395913d0b1db37092ea3d9d69b832183b1dd84c5 ("[CPUFREQ] remove rwsem lock
from CPUFREQ_GOV_STOP call (second call site)") is not needed, because
there is no rwsem lock in cpufreq_ondemand and cpufreq_conservative
anymore. Lock should not be released until the work done.

Addresses https://bugzilla.kernel.org/show_bug.cgi?id=1594

Signed-off-by: Andrej Gelenberg <andrej.gelenberg@udo.edu>
Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Dave Jones <davej@redhat.com>


# 4c21adf2 20-Jul-2010 Thomas Renninger <trenn@suse.de>

x86 cpufreq, perf: Make trace_power_frequency cpufreq driver independent

and fix the broken case if a core's frequency depends on others.

trace_power_frequency was only implemented in a rather ungeneric
way in acpi-cpufreq driver's target() function only.

-> Move the call to trace_power_frequency to
cpufreq.c:cpufreq_notify_transition() where CPUFREQ_POSTCHANGE
notifier is triggered.
This will support power frequency tracing by all cpufreq
drivers.

trace_power_frequency did not trace frequency changes correctly
when the userspace governor was used or when CPU cores'
frequency depend on each other.

-> Moving this into the CPUFREQ_POSTCHANGE notifier and pass the cpu
which gets switched automatically fixes this.

Robert Schoene provided some important fixes on top of my
initial quick shot version which are integrated in this patch:
- Forgot some changes in power_end trace (TP_printk/variable names)
- Variable dummy in power_end must now be cpu_id
- Use static 64 bit variable instead of unsigned int for cpu_id

[akpm@linux-foundation.org: build fix]
Signed-off-by: Thomas Renninger <trenn@suse.de>
Cc: davej@codemonkey.org.uk
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Dave Jones <davej@codemonkey.org.uk>
Acked-by: Arjan van de Ven <arjan@infradead.org>
Cc: Robert Schoene <robert.schoene@tu-dresden.de>
Tested-by: Robert Schoene <robert.schoene@tu-dresden.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>


# 6dad2a29 31-Mar-2010 Borislav Petkov <borislav.petkov@amd.com>

cpufreq: Unify sysfs attribute definition macros

Multiple modules used to define those which are with identical
functionality and were needlessly replicated among the different cpufreq
drivers. Push them into the header and remove duplication.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <1270065406-1814-7-git-send-email-bp@amd64.org>
Reviewed-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>


# 499bca9b 04-Mar-2010 Amerigo Wang <amwang@redhat.com>

[CPUFREQ] fix a lockdep warning

There is no need to do sysfs_remove_link() or kobject_put() etc.
when policy_rwsem_write is held, move them after releasing the lock.

This fixes the lockdep warning:

halt/4071 is trying to acquire lock:
(s_active){++++.+}, at: [<c0000000001ef868>] .sysfs_addrm_finish+0x58/0xc0

but task is already holding lock:
(&per_cpu(cpu_policy_rwsem, cpu)){+.+.+.}, at: [<c0000000004cd6ac>] .lock_policy_rwsem_write+0x84/0xf4

Reported-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: WANG Cong <amwang@redhat.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>


# 52cf25d0 18-Jan-2010 Emese Revfy <re.emese@gmail.com>

Driver core: Constify struct sysfs_ops in struct kobj_type

Constify struct sysfs_ops.

This is part of the ops structure constification
effort started by Arjan van de Ven et al.

Benefits of this constification:

* prevents modification of data that is shared
(referenced) by many other structure instances
at runtime

* detects/prevents accidental (but not intentional)
modification attempts on archs that enforce
read-only kernel data at runtime

* potentially better optimized code as the compiler
can assume that the const data cannot be changed

* the compiler/linker move const data into .rodata
and therefore exclude them from false sharing

Signed-off-by: Emese Revfy <re.emese@gmail.com>
Acked-by: David Teigland <teigland@redhat.com>
Acked-by: Matt Domsch <Matt_Domsch@dell.com>
Acked-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Acked-by: Hans J. Koch <hjk@linutronix.de>
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Acked-by: Jens Axboe <jens.axboe@oracle.com>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>


# e2f74f35 18-Nov-2009 Thomas Renninger <trenn@suse.de>

[ACPI/CPUFREQ] Introduce bios_limit per cpu cpufreq sysfs interface

This interface is mainly intended (and implemented) for ACPI _PPC BIOS
frequency limitations, but other cpufreq drivers can also use it for
similar use-cases.

Why is this needed:

Currently it's not obvious why cpufreq got limited.
People see cpufreq/scaling_max_freq reduced, but this could have
happened by:
- any userspace prog writing to scaling_max_freq
- thermal limitations
- hardware (_PPC in ACPI case) limitiations

Therefore export bios_limit (in kHz) to:
- Point the user that it's the BIOS (broken or intended) which limits
frequency
- Export it as a sysfs interface for userspace progs.
While this was a rarely used feature on laptops, there will appear
more and more server implemenations providing "Green IT" features like
allowing the service processor to limit the frequency. People want
to know about HW/BIOS frequency limitations.

All ACPI P-state driven cpufreq drivers are covered with this patch:
- powernow-k8
- powernow-k7
- acpi-cpufreq

Tested with a patched DSDT which limits the first two cores (_PPC returns 1)
via _PPC, exposed by bios_limit:
# echo 2200000 >cpu2/cpufreq/scaling_max_freq
# cat cpu*/cpufreq/scaling_max_freq
2600000
2600000
2200000
2200000
# #scaling_max_freq shows general user/thermal/BIOS limitations

# cat cpu*/cpufreq/bios_limit
2600000
2600000
2800000
2800000
# #bios_limit only shows the HW/BIOS limitation

CC: Pallipadi Venkatesh <venkatesh.pallipadi@intel.com>
CC: Len Brown <lenb@kernel.org>
CC: davej@codemonkey.org.uk
CC: linux@dominikbrodowski.net

Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Dave Jones <davej@redhat.com>


# cf3289d0 17-Nov-2009 Alex Chiang <achiang@hp.com>

[CPUFREQ] make internal cpufreq_add_dev_* static

No need to export these symbols; make them static.

cpufreq_add_dev_policy
cpufreq_add_dev_symlink
cpufreq_add_dev_interface

Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Dave Jones <davej@redhat.com>


# 90e41bac 12-Nov-2009 Prarit Bhargava <prarit@redhat.com>

[CPUFREQ] Fix stale cpufreq_cpu_governor pointer

Dave,

Attached is an update of my patch against the cpufreq fixes branch.

Before applying the patch I compiled and booted the tree to see if the panic
was still there -- to my surprise it was not. This is because of the conversion
of cpufreq_cpu_governor to a char[].

While the panic is kaput, the problem of stale data continues and my patch is
still valid. It is possible to end up with the wrong governor after hotplug
events because CPUFREQ_DEFAULT_GOVERNOR is statically linked to a default,
while the cpu siblings may have had a different governor assigned by a user.

ie) the patch is still needed in order to keep the governors assigned
properly when hotplugging devices

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Dave Jones <davej@redhat.com>


# e77b89f1 04-Oct-2009 Dmitry Monakhov <dmonakhov@openvz.org>

[CPUFREQ] Fix use after free on governor restore

Currently on governer backup/restore path we storing governor's pointer.
This is wrong because one may unload governor's module after cpu goes
offline. As result use-after-free will take place on restored cpu.
It is not easy to exploit this bug, but still we have to close this
issue ASAP. Issue was introduced by following commit
084f34939424161669467c19280dbcf637730314

##TESTCASE##
#!/bin/sh -x
modprobe acpi_cpufreq
# Any non default governor, in may case it is "ondemand"
modprobe cpufreq_ondemand
echo ondemand > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
rmmod acpi_cpufreq
rmmod cpufreq_ondemand
modprobe acpi_cpufreq # << use-after-free here.

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Dave Jones <davej@redhat.com>


# f1625066 29-Oct-2009 Tejun Heo <tj@kernel.org>

percpu: make percpu symbols in cpufreq unique

This patch updates percpu related symbols in cpufreq such that percpu
symbols are unique and don't clash with local symbols. This serves
two purposes of decreasing the possibility of global percpu symbol
collision and allowing dropping per_cpu__ prefix from percpu symbols.

* drivers/cpufreq/cpufreq.c: s/policy_cpu/cpufreq_policy_cpu/
* drivers/cpufreq/freq_table.c: s/show_table/cpufreq_show_table/
* arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c: s/drv_data/acfreq_data/
s/old_perf/acfreq_old_perf/

Partly based on Rusty Russell's "alloc_percpu: rename percpu vars
which cause name clashes" patch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>


# 395913d0 08-Jun-2009 Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>

[CPUFREQ] remove rwsem lock from CPUFREQ_GOV_STOP call (second call site)

remove rwsem lock from CPUFREQ_GOV_STOP call (second call site)

commit 42a06f2166f2f6f7bf04f32b4e823eacdceafdc9

Missed a call site for CPUFREQ_GOV_STOP to remove the rwlock taken around the
teardown. To make a long story short, the rwlock write-lock causes a circular
dependency with cancel_delayed_work_sync(), because the timer handler takes the
read lock.

Note that all callers to __cpufreq_set_policy are taking the rwsem. All sysfs
callers (writers) hold the write rwsem at the earliest sysfs calling stage.

However, the rwlock write-lock is not needed upon governor stop.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Acked-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
CC: rjw@sisk.pl
CC: mingo@elte.hu
CC: Shaohua Li <shaohua.li@intel.com>
CC: Pekka Enberg <penberg@cs.helsinki.fi>
CC: Dave Young <hidave.darkstar@gmail.com>
CC: "Rafael J. Wysocki" <rjw@sisk.pl>
CC: Rusty Russell <rusty@rustcorp.com.au>
CC: trenn@suse.de
CC: sven.wegener@stealer.net
CC: cpufreq@vger.kernel.org
Signed-off-by: Dave Jones <davej@redhat.com>


# 8aa84ad8 24-Jul-2009 Thomas Renninger <trenn@suse.de>

[CPUFREQ] Introduce global, not per core: /sys/devices/system/cpu/cpufreq

Currently everything in the cpufreq layer is per core based.
This does not reflect reality, for example ondemand on conservative
governors have global sysfs variables.

Introduce a global cpufreq directory and add the kobject to the governor
struct, so that governors can easily access it.
The directory is initialized in the cpufreq_core_init initcall and thus will
always be created if cpufreq is compiled in, even if no cpufreq driver is
active later.

Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Dave Jones <davej@redhat.com>


# 4bfa042c 24-Jul-2009 Thomas Renninger <trenn@suse.de>

[CPUFREQ] Bail out of cpufreq_add_dev if the link for a managed CPU got created

Doing:
echo 0 >cpu1/online
echo 1 >cpu1/online

on a managed CPU will result in:
Jul 22 15:15:37 linux kernel: [ 80.013864] WARNING: at fs/sysfs/dir.c:487 sysfs_add_one+0xcf/0xe6()
Jul 22 15:15:37 linux kernel: [ 80.013866] Hardware name: To Be Filled By O.E.M.
Jul 22 15:15:37 linux kernel: [ 80.013868] sysfs: cannot create duplicate filename '/devices/system/cpu/cpu1/cpufreq'
Jul 22 15:15:37 linux kernel: [ 80.013870] Modules linked in: powernow_k8
Jul 22 15:15:37 linux kernel: [ 80.013874] Pid: 5750, comm: bash Not tainted 2.6.31-rc2 #40
Jul 22 15:15:37 linux kernel: [ 80.013876] Call Trace:
Jul 22 15:15:37 linux kernel: [ 80.013879] [<ffffffff8112ebda>] ? sysfs_add_one+0xcf/0xe6
Jul 22 15:15:37 linux kernel: [ 80.013884] [<ffffffff81041926>] warn_slowpath_common+0x77/0xa4
Jul 22 15:15:37 linux kernel: [ 80.013888] [<ffffffff810419a0>] warn_slowpath_fmt+0x3c/0x3e
Jul 22 15:15:37 linux kernel: [ 80.013891] [<ffffffff8112ebda>] sysfs_add_one+0xcf/0xe6
Jul 22 15:15:37 linux kernel: [ 80.013894] [<ffffffff8112f213>] create_dir+0x58/0x87
Jul 22 15:15:37 linux kernel: [ 80.013898] [<ffffffff8112f27a>] sysfs_create_dir+0x38/0x4f
Jul 22 15:15:37 linux kernel: [ 80.013902] [<ffffffff811ffb8a>] kobject_add_internal+0x11f/0x1de
Jul 22 15:15:37 linux kernel: [ 80.013905] [<ffffffff811ffd21>] kobject_add_varg+0x41/0x4e
Jul 22 15:15:37 linux kernel: [ 80.013908] [<ffffffff811ffd7a>] kobject_init_and_add+0x4c/0x57
Jul 22 15:15:37 linux kernel: [ 80.013913] [<ffffffff810667bc>] ? mark_lock+0x22/0x228
Jul 22 15:15:37 linux kernel: [ 80.013918] [<ffffffff813e8a3b>] cpufreq_add_dev_interface+0x40/0x1e4
...

This bug slipped in by git commit:
150b06f7f223cfd0f808737a5243cceca8ea47fa

When splitting up cpufreq_add_dev, the whole cpufreq_add_dev function
is not left anymore, only cpufreq_add_dev_policy.
This patch should reconstruct the identical functionality again as it
was before the split.

CC: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Dave Jones <davej@redhat.com>


# ecf7e461 08-Jul-2009 Dave Jones <davej@redhat.com>

[CPUFREQ] Factor out policy setting from cpufreq_add_dev

Signed-off-by: Dave Jones <davej@redhat.com>


# 909a694e 08-Jul-2009 Dave Jones <davej@redhat.com>

[CPUFREQ] Factor out interface creation from cpufreq_add_dev

Signed-off-by: Dave Jones <davej@redhat.com>


# 19d6f7ec 08-Jul-2009 Dave Jones <davej@redhat.com>

[CPUFREQ] Factor out symlink creation from cpufreq_add_dev

Signed-off-by: Dave Jones <davej@redhat.com>


# 059019a3 08-Jul-2009 Dave Jones <davej@redhat.com>

[CPUFREQ] cleanup up -ENOMEM handling in cpufreq_add_dev

Signed-off-by: Dave Jones <davej@redhat.com>


# 54e6fe16 08-Jul-2009 Dave Jones <davej@redhat.com>

[CPUFREQ] Reduce scope of cpu_sys_dev in cpufreq_add_dev

Signed-off-by: Dave Jones <davej@redhat.com>


# ce6c3997 07-Aug-2009 Dominik Brodowski <linux@dominikbrodowski.net>

[CPUFREQ] Re-enable cpufreq suspend and resume code

Commit 4bc5d3413503 is broken and causes regressions:

(1) cpufreq_driver->resume() and ->suspend() were only called on
__powerpc__, but you could set them on all architectures. In fact,
->resume() was defined and used before the PPC-related commit
42d4dc3f4e1e complained about in 4bc5d3413503.

(2) Therfore, the resume functions in acpi_cpufreq and speedstep-smi
would never be called.

(3) This means speedstep-smi would be unusuable after suspend or resume.

The _real_ problem was calling cpufreq_driver->get() with interrupts
off, but it re-enabling interrupts on some platforms. Why is ->get()
necessary?

Some systems like to change the CPU frequency behind our
back, especially during BIOS-intensive operations like suspend or
resume. If such systems also use a CPU frequency-dependant timing loop,
delays might be off by large factors. Therefore, we need to ascertain
as soon as possible that the CPU frequency is indeed at the speed we
think it is. You can do this two ways: either setting it anew, or trying
to get it. The latter is what was done, the former also has the same IRQ
issue.

So, let's try something different: defer the checking to after interrupts
are re-enabled, by calling cpufreq_update_policy() (via schedule_work()).
Timings may be off until this later stage, so let's watch out for
resume regressions caused by the deferred handling of frequency changes
behind the kernel's back.

Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Dave Jones <davej@redhat.com>


# 4bc5d341 04-Aug-2009 Dave Jones <davej@redhat.com>

[CPUFREQ] Make cpufreq suspend code conditional on powerpc.

The suspend code runs with interrupts disabled, and the powerpc workaround we
do in the cpufreq suspend hook calls the drivers ->get method.

powernow-k8's ->get does an smp_call_function_single
which needs interrupts enabled

cpufreq's suspend/resume code was added in 42d4dc3f4e1e to work around
a hardware problem on ppc powerbooks. If we make all this code
conditional on powerpc, we avoid the issue above.

Signed-off-by: Dave Jones <davej@redhat.com>


# d5194dec 29-Jul-2009 Thomas Renninger <trenn@suse.de>

[CPUFREQ] Fix a kobject reference bug related to managed CPUs

The first offline/online cycle is successful, the second not.
Doing:
echo 0 >cpu1/online
echo 1 >cpu1/online
echo 0 >cpu1/online

The last command will trigger:
Jul 22 14:39:50 linux kernel: [ 593.210125] ------------[ cut here ]------------
Jul 22 14:39:50 linux kernel: [ 593.210139] WARNING: at lib/kref.c:43 kref_get+0x23/0x2b()
Jul 22 14:39:50 linux kernel: [ 593.210144] Hardware name: To Be Filled By O.E.M.
Jul 22 14:39:50 linux kernel: [ 593.210148] Modules linked in: powernow_k8
Jul 22 14:39:50 linux kernel: [ 593.210158] Pid: 378, comm: kondemand/2 Tainted: G W 2.6.31-rc2 #38
Jul 22 14:39:50 linux kernel: [ 593.210163] Call Trace:
Jul 22 14:39:50 linux kernel: [ 593.210171] [<ffffffff812008e8>] ? kref_get+0x23/0x2b
Jul 22 14:39:50 linux kernel: [ 593.210181] [<ffffffff81041926>] warn_slowpath_common+0x77/0xa4
Jul 22 14:39:50 linux kernel: [ 593.210190] [<ffffffff81041962>] warn_slowpath_null+0xf/0x11
Jul 22 14:39:50 linux kernel: [ 593.210198] [<ffffffff812008e8>] kref_get+0x23/0x2b
Jul 22 14:39:50 linux kernel: [ 593.210206] [<ffffffff811ffa19>] kobject_get+0x1a/0x22
Jul 22 14:39:50 linux kernel: [ 593.210214] [<ffffffff813e815d>] cpufreq_cpu_get+0x8a/0xcb
Jul 22 14:39:50 linux kernel: [ 593.210222] [<ffffffff813e87d1>] __cpufreq_driver_getavg+0x1d/0x67
Jul 22 14:39:50 linux kernel: [ 593.210231] [<ffffffff813ea18f>] do_dbs_timer+0x158/0x27f
Jul 22 14:39:50 linux kernel: [ 593.210240] [<ffffffff810529ea>] worker_thread+0x200/0x313
...

The output continues on every do_dbs_timer ondemand freq checking poll.
This regression was introduced by git commit:
3f4a782b5ce2698b1870b5a7b573cd721d4fce33

The policy is released when the cpufreq device is removed in:
__cpufreq_remove_dev():
/* if this isn't the CPU which is the parent of the kobj, we
* only need to unlink, put and exit
*/

Not creating the symlink is not sever at all.
As long as:
sysfs_remove_link(&sys_dev->kobj, "cpufreq");
handles it gracefully that the symlink did not exist.
Possibly no error should be returned at all, because ondemand
governor would still provide the same functionality.
Userspace in userspace gov case might be confused if the link
is missing.

Resolves http://bugzilla.kernel.org/show_bug.cgi?id=13903

CC: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Dave Jones <davej@redhat.com>


# 42c74b84 03-Aug-2009 Prarit Bhargava <prarit@redhat.com>

[CPUFREQ] Do not set policy for offline cpus

Suspend/Resume fails on multi socket, multi core systems because the cpufreq
code erroneously sets the per_cpu policy_cpu value when a logical cpu is
offline.

This most notably results in missing sysfs files that are used to set the
cpu frequencies of the various cpus.

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Dave Jones <davej@redhat.com>


# 5e1596f7 08-Jul-2009 Dave Jones <davej@redhat.com>

[CPUFREQ] Fix compile failure in cpufreq.c

managed_policy is out of scope for the non-smp case.
Declare it locally where used (twice)

Signed-off-by: Dave Jones <davej@redhat.com>


# 3f4a782b 03-Jul-2009 Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>

[CPUFREQ] fix (utter) cpufreq_add_dev mess

OK, I've tried to clean it up the best I could, but please test this with
concurrent cpu hotplug and cpufreq add/remove in loops. I'm sure we will make
other interesting findings.

This is step one of fixing the overall locking dependency mess in cpufreq.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
CC: rjw@sisk.pl
CC: mingo@elte.hu
CC: Shaohua Li <shaohua.li@intel.com>
CC: Pekka Enberg <penberg@cs.helsinki.fi>
CC: Dave Young <hidave.darkstar@gmail.com>
CC: "Rafael J. Wysocki" <rjw@sisk.pl>
CC: Rusty Russell <rusty@rustcorp.com.au>
CC: sven.wegener@stealer.net
CC: cpufreq@vger.kernel.org
CC: Thomas Renninger <trenn@suse.de>
Signed-off-by: Dave Jones <davej@redhat.com>


# 7d26e2d5 02-Jul-2009 venkatesh.pallipadi@intel.com <venkatesh.pallipadi@intel.com>

[CPUFREQ] Eliminate the recent lockdep warnings in cpufreq

Commit b14893a62c73af0eca414cfed505b8c09efc613c although it was very
much needed to properly cleanup ondemand timer, opened-up a can of worms
related to locking dependencies in cpufreq.

Patch here defines the need for dbs_mutex and cleans up its usage in
ondemand governor. This also resolves the lockdep warnings reported here

http://lkml.indiana.edu/hypermail/linux/kernel/0906.1/01925.html
http://lkml.indiana.edu/hypermail/linux/kernel/0907.0/00820.html

and few others..

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>


# eaa95840 06-Jun-2009 Yinghai Lu <yinghai@kernel.org>

cpumask: alloc zeroed cpumask for static cpumask_var_ts

These are defined as static cpumask_var_t so if MAXSMP is not used,
they are cleared already. Avoid surprises when MAXSMP is enabled.

Signed-off-by: Yinghai Lu <yinghai.lu@kernel.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


# 42a06f21 17-May-2009 Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>

[CPUFREQ] remove rwsem lock from CPUFREQ_GOV_STOP call

* Rafael J. Wysocki (rjw@sisk.pl) wrote:
> This message has been generated automatically as a part of a report
> of regressions introduced between 2.6.28 and 2.6.29.
>
> The following bug entry is on the current list of known regressions
> introduced between 2.6.28 and 2.6.29. Please verify if it still should
> be listed and let me know (either way).
>
>
> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13186
> Subject : cpufreq timer teardown problem
> Submitter : Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
> Date : 2009-04-23 14:00 (24 days old)
> References : http://marc.info/?l=linux-kernel&m=124049523515036&w=4
> Handled-By : Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
> Patch : http://patchwork.kernel.org/patch/19754/
> http://patchwork.kernel.org/patch/19753/

The patches linked above depend on the following patch to remove
circular locking dependency :

cpufreq: remove rwsem lock from CPUFREQ_GOV_STOP call

(the following issue was faced when using cancel_delayed_work_sync() in the
timer teardown (which fixes a race).

* KOSAKI Motohiro (kosaki.motohiro@jp.fujitsu.com) wrote:
> Hi
>
> my box output following warnings.
> it seems regression by commit 7ccc7608b836e58fbacf65ee4f8eefa288e86fac.
>
> A: work -> do_dbs_timer() -> cpu_policy_rwsem
> B: store() -> cpu_policy_rwsem -> cpufreq_governor_dbs() -> work
>
>

Hrm, I think it must be due to my attempt to fix the timer teardown race
in ondemand governor mixed with new locking behavior in 2.6.30-rc.

The rwlock seems to be taken around the whole call to
cpufreq_governor_dbs(), when it should be only taken around accesses to
the locked data, and especially *not* around the call to
dbs_timer_exit().

Reverting my fix attempt would put the teardown race back in place
(replacing the cancel_delayed_work_sync by cancel_delayed_work).
Instead, a proper fix would imply modifying this critical section :

cpufreq.c: __cpufreq_remove_dev()
...
if (cpufreq_driver->target)
__cpufreq_governor(data, CPUFREQ_GOV_STOP);

unlock_policy_rwsem_write(cpu);

To make sure the __cpufreq_governor() callback is not called with rwsem
held. This would allow execution of cancel_delayed_work_sync() without
being nested within the rwsem.

Applies on top of the 2.6.30-rc5 tree.

Required to remove circular dep in teardown of both conservative and
ondemande governors so they can use cancel_delayed_work_sync().
CPUFREQ_GOV_STOP does not modify the policy, therefore this locking seemed
unneeded.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Greg KH <greg@kroah.com>
CC: Ingo Molnar <mingo@elte.hu>
CC: "Rafael J. Wysocki" <rjw@sisk.pl>
CC: Ben Slusky <sluskyb@paranoiacs.org>
CC: Chris Wright <chrisw@sous-sol.org>
CC: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dave Jones <davej@redhat.com>


# 129f8ae9 09-Mar-2009 Dave Jones <davej@redhat.com>

Revert "[CPUFREQ] Disable sysfs ui for p4-clockmod."

This reverts commit e088e4c9cdb618675874becb91b2fd581ee707e6.

Removing the sysfs interface for p4-clockmod was flagged as a
regression in bug 12826.

Course of action:
- Find out the remaining causes of overheating, and fix them
if possible. ACPI should be doing the right thing automatically.
If it isn't, we need to fix that.
- mark p4-clockmod ui as deprecated
- try again with the removal in six months.

It's not really feasible to printk about the deprecation, because
it needs to happen at all the sysfs entry points, which means adding
a lot of strcmp("p4-clockmod".. calls to the core, which.. bleuch.

Signed-off-by: Dave Jones <davej@redhat.com>


# ed129784 03-Feb-2009 Thomas Renninger <trenn@suse.de>

[CPUFREQ] Introduce /sys/devices/system/cpu/cpu*/cpufreq/cpuinfo_transition_latency

It's not only useful for the ondemand and conservative governors, but
also for userspace daemons to know about the HW transition latency of
the CPU.
It is especially useful for userspace to know about this value when
the ondemand or conservative governors are run. The sampling rate
control value depends on it and for userspace being able to set sane
tuning values there it has to know about the transition latency.

Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Dave Jones <davej@redhat.com>


# 29464f28 17-Jan-2009 Dave Jones <davej@redhat.com>

[CPUFREQ] checkpatch cleanups for cpufreq core

Signed-off-by: Dave Jones <davej@redhat.com>


# 835481d9 04-Jan-2009 Rusty Russell <rusty@rustcorp.com.au>

cpumask: convert struct cpufreq_policy to cpumask_var_t

Impact: use new cpumask API to reduce memory usage

This is part of an effort to reduce structure sizes for machines
configured with large NR_CPUS. cpumask_t gets replaced by
cpumask_var_t, which is either struct cpumask[1] (small NR_CPUS) or
struct cpumask * (large NR_CPUS).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Mike Travis <travis@sgi.com>
Acked-by: Dave Jones <davej@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 187d9f4e 04-Dec-2008 Mike Chan <mike@android.com>

[CPUFREQ] Fix on resume, now preserves user policy min/max.

Previously driver resume would always set the current policy min/max with
the cpuinfo min/max, defined by user_policy.min/max. Resulting in a reset
of policy settings when policy.min/max != cpuinfo.min/max when coming out
of suspend. Now user_policy is saved as the policy instead of cpuinfo to
preserve what the user actually set.

Signed-off-by: Mike Chan <mike@android.com>
Signed-off-by: Dave Jones <davej@redhat.com>


# e088e4c9 25-Nov-2008 Matthew Garrett <mjg@redhat.com>

[CPUFREQ] Disable sysfs ui for p4-clockmod.

p4-clockmod has a long history of abuse. It pretends to be a CPU
frequency scaling driver, even though it doesn't actually change
the CPU frequency, but instead just modulates the frequency with
wait-states.
The biggest misconception is that when running at the lower 'frequency'
p4-clockmod is saving power. This isn't the case, as workloads running
slower take longer to complete, preventing the CPU from entering deep C states.

However p4-clockmod does have a purpose. It can prevent overheating.
Having it hooked up to the cpufreq interfaces is the wrong way to achieve
cooling however. It should instead be hooked up to ACPI.

This diff introduces a means for a cpufreq driver to register with the
cpufreq core, but not present a sysfs interface.

Signed-off-by: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Dave Jones <davej@redhat.com>


# bf0b90e3 04-Aug-2008 venkatesh.pallipadi@intel.com <venkatesh.pallipadi@intel.com>

[CPUFREQ][1/6] cpufreq: Add cpu number parameter to __cpufreq_driver_getavg()

Add a cpu parameter to __cpufreq_driver_getavg(). This is needed for software
cpufreq coordination where policy->cpu may not be same as the CPU on which we
want to getavg frequency.

A follow-on patch will use this parameter to getavg freq from all cpus
in policy->cpus.

Change since last patch. Fix the offline/online and suspend/resume
oops reported by Youquan Song <youquan.song@intel.com>

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>


# f1829e4a 25-Jul-2008 Julia Lawall <julia@diku.dk>

[CPUFREQ] drivers/cpufreq/cpufreq.c: Adjust error handling code involving cpufreq_cpu_put

After calling cpufreq_cpu_get, error handling code should call
cpufreq_cpu_put.

The semantic match that finds this problem is as follows:
(http://www.emn.fr/x-info/coccinelle/)

// <smpl>
@r@
expression x,E;
statement S;
position p1,p2,p3;
@@

(
if ((x = cpufreq_cpu_get@p1(...)) == NULL || ...) S
|
x = cpufreq_cpu_get@p1(...)
... when != x
if (x == NULL || ...) S
)
<...
if@p3 (...) { ... when != cpufreq_cpu_put(x)
when != if (x) { ... cpufreq_cpu_put(x); ...}
return@p2 ...;
}
...>
(
return x;
|
return 0;
|
x = E
|
E = x
|
cpufreq_cpu_put(x)
)

@exists@
position r.p1,r.p2,r.p3;
expression x;
int ret != 0;
statement S;
@@

* x = cpufreq_cpu_get@p1(...)
<...
* if@p3 (...)
S
...>
* return@p2 \(NULL\|ret\);
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Dave Jones <davej@redhat.com>


# a1531acd 29-Jul-2008 Thomas Renninger <trenn@suse.de>

cpufreq acpi: only call _PPC after cpufreq ACPI init funcs got called already

Ingo Molnar provided a fix to not call _PPC at processor driver
initialization time in "[PATCH] ACPI: fix cpufreq regression" (git
commit e4233dec749a3519069d9390561b5636a75c7579)

But it can still happen that _PPC is called at processor driver
initialization time.

This patch should make sure that this is not possible anymore.

Signed-off-by: Thomas Renninger <trenn@suse.de>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Len Brown <lenb@kernel.org>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 326f6a5c 06-Jun-2008 Chris Wright <chrisw@sous-sol.org>

[CPUFREQ] Fix format string bug.

Format string bug. Not exploitable, as this is only writable by root,
but worth fixing all the same.

Spotted-by: Ilja van Sprundel <ilja@netric.org>
Signed-off-by: Dave Jones <davej@redhat.com>


# 879000f9 05-Jun-2008 CHIKAMA masaki <masaki.chikama@gmail.com>

cpufreq: fix null object access on Transmeta CPU

If cpu specific cpufreq driver(i.e. longrun) has "setpolicy" function,
governor object isn't set into cpufreq_policy object at "__cpufreq_set_policy"
function in driver/cpufreq/cpufreq.c .

This causes a null object access at "store_scaling_setspeed" and
"show_scaling_setspeed" function in driver/cpufreq/cpufreq.c when reading or
writing through /sys interface (ex. cat
/sys/devices/system/cpu/cpu0/cpufreq/scaling_setspeed)

Addresses:
http://bugzilla.kernel.org/show_bug.cgi?id=10654
https://bugzilla.redhat.com/show_bug.cgi?id=443354

Signed-off-by: CHIKAMA Masaki <masaki.chikama@gmail.com>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Chuck Ebbert <cebbert@redhat.com>
Acked-by: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# dca02613 29-May-2008 Lothar Waßmann <LW@KARO-electronics.de>

[CPUFREQ] fix double unlock of cpu_policy_rwsem in drivers/cpufreq/cpufreq.c

In drivers/cpufreq/cpufreq.c the function cpufreq_add_dev() takes the
error exit 'err_out_unregister' from different places once with the
'cpu_policy_rwsem' lock held, once with the lock released:
| if (ret)
| goto err_out_unregister;
| }
|
| policy->governor = NULL; /* to assure that the starting sequence is
| * run in cpufreq_set_policy */
|
| /* set default policy */
| ret = __cpufreq_set_policy(policy, &new_policy);
| policy->user_policy.policy = policy->policy;
| policy->user_policy.governor = policy->governor;
|
| unlock_policy_rwsem_write(cpu);
|
| if (ret) {
| dprintk("setting policy failed\n");
| goto err_out_unregister;
| }

This leads to the following error message in case of a failing
__cpufreq_set_policy() call:
=====================================
[ BUG: bad unlock balance detected! ]
-------------------------------------
swapper/1 is trying to release lock (&per_cpu(cpu_policy_rwsem, cpu)) at:
[<c01b4564>] unlock_policy_rwsem_write+0x30/0x40
but there are no more locks to release!

other info that might help us debug this:
1 lock held by swapper/1:
#0: (sysdev_drivers_lock){--..}, at: [<c018fd18>] sysdev_driver_register+0x74/0x130

stack backtrace:
[<c002f588>] (dump_stack+0x0/0x14) from [<c00692fc>] (print_unlock_inbalance_bug+0xc8/0x104)
[<c0069234>] (print_unlock_inbalance_bug+0x0/0x104) from [<c006b7ac>] (lock_release_non_nested+0xc4/0x19c)
r6:00000028 r5:c3c1ab80 r4:c01b4564
[<c006b6e8>] (lock_release_non_nested+0x0/0x19c) from [<c006b9e0>] (lock_release+0x15c/0x18c)
r8:60000013 r7:00000001 r6:c01b4564 r5:c0541bb4 r4:c3c1ab80
[<c006b884>] (lock_release+0x0/0x18c) from [<c0061ba0>] (up_write+0x24/0x30)
r8:c0541b80 r7:00000000 r6:ffffffea r5:c3c34828 r4:c0541b8c
[<c0061b7c>] (up_write+0x0/0x30) from [<c01b4564>] (unlock_policy_rwsem_write+0x30/0x40)
r4:c3c34884
[<c01b4534>] (unlock_policy_rwsem_write+0x0/0x40) from [<c01b4c40>] (cpufreq_add_dev+0x324/0x398)
[<c01b491c>] (cpufreq_add_dev+0x0/0x398) from [<c018fd64>] (sysdev_driver_register+0xc0/0x130)
[<c018fca4>] (sysdev_driver_register+0x0/0x130) from [<c01b3574>] (cpufreq_register_driver+0xbc/0x174)

Signed-off-by: Lothar Waßmann <LW@KARO-electronics.de>
Signed-off-by: Dave Jones <davej@redhat.com>


# 068b1277 12-May-2008 Mike Travis <travis@sgi.com>

cpufreq: use performance variant for_each_cpu_mask_nr

Change references from for_each_cpu_mask to for_each_cpu_mask_nr
where appropriate

Reviewed-by: Paul Jackson <pj@sgi.com>
Reviewed-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 7a6aedfa 25-Mar-2008 Mike Travis <travis@sgi.com>

[CPUFREQ] change cpu freq arrays to per_cpu variables

Change cpufreq_policy and cpufreq_governor pointer tables
from arrays to per_cpu variables in the cpufreq subsystem.

Also some minor complaints from checkpatch.pl fixed.

Based on:
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86.git

Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Dave Jones <davej@redhat.com>


# e8628dd0 18-Apr-2008 Darrick J. Wong <djwong@us.ibm.com>

[CPUFREQ] expose cpufreq coordination requirements regardless of coordination mechanism

Currently, affected_cpus shows which CPUs need to have their frequency
coordinated in software. When hardware coordination is in use, the contents
of this file appear the same as when no coordination is required. This can
lead to some confusion among user-space programs, for example, that do not
know that extra coordination is required to force a CPU core to a particular
speed to control power consumption.

To fix this, create a "related_cpus" attribute that always displays the
coordination map regardless of whatever coordination strategy the cpufreq
driver uses (sw or hw). If the cpufreq driver does not provide a value, fall
back to policy->cpus.

Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dave Jones <davej@redhat.com>


# 74212ca4 16-Feb-2008 Cesar Eduardo Barros <cesarb@cesarb.net>

[CPUFREQ] Warn when cpufreq_register_notifier called before pure initcalls

If cpufreq_register_notifier is called before pure initcalls,
init_cpufreq_transition_notifier_list will overwrite whatever it did,
causing notifiers to be ignored.

Print some noise to the kernel log if that happens.

Signed-off-by: Cesar Eduardo Barros <cesarb@cesarb.net>
Signed-off-by: Dave Jones <davej@codemonkey.org.uk>


# 45709118 05-Mar-2008 Dave Jones <davej@redhat.com>

[CPUFREQ] Refactor locking in cpufreq_add_dev

Simplify this by moving the unlocking out of the error
paths into the exit path.

Signed-off-by: Dave Jones <davej@codemonkey.org.uk>


# 905d77cd 05-Mar-2008 Dave Jones <davej@redhat.com>

[CPUFREQ] more CodingStyle

void * p -> void *p
no space between function parameters
removed excess whitespace

Signed-off-by: Dave Jones <davej@codemonkey.org.uk>


# 4d34a67d 07-Feb-2008 Dave Jones <davej@redhat.com>

[CPUFREQ] CodingStyle

return is not a function.

Signed-off-by: Dave Jones <davej@redhat.com>


# c9060494 07-Feb-2008 Dave Jones <davej@redhat.com>

[CPUFREQ] Slightly shorten the error paths of cpufreq_suspend/cpufreq_resume

Signed-off-by: Dave Jones <davej@redhat.com>


# f6ebef30 17-Feb-2008 Sam Ravnborg <sam@ravnborg.org>

[CPUFREQ] fix section mismatch warnings

Fix the following warnings:
WARNING: vmlinux.o(.text+0xfe6711): Section mismatch in reference from the function cpufreq_unregister_driver() to the variable .cpuinit.data:cpufreq_cpu_notifier
WARNING: vmlinux.o(.text+0xfe68af): Section mismatch in reference from the function cpufreq_register_driver() to the variable .cpuinit.data:cpufreq_cpu_notifier
WARNING: vmlinux.o(.exit.text+0xc4fa): Section mismatch in reference from the function cpufreq_stats_exit() to the variable .cpuinit.data:cpufreq_stat_cpu_notifier

The warnings were casued by references to unregister_hotcpu_notifier()
from normal functions or exit functions.
This is flagged by modpost as a potential error because
it does not know that for the non HOTPLUG_CPU
scenario the unregister_hotcpu_notifier() is a nop.
Silence the warning by replacing the __initdata
annotation with a __refdata annotation.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Dave Jones <davej@codemonkey.org.uk>


# a07530b4 05-Mar-2008 Dave Jones <davej@redhat.com>

[CPUFREQ] Fix missing cpufreq_cpu_put() call in ->store

refactor to use gotos instead of explicit exit paths

Signed-off-by: Dave Jones <davej@redhat.com>


# 0db4a8a9 05-Mar-2008 Dave Jones <davej@redhat.com>

[CPUFREQ] Fix missing cpufreq_cpu_put() call in ->show

refactor to use gotos instead of explicit exit paths

Signed-off-by: Dave Jones <davej@redhat.com>


# 7ab47050 08-Feb-2008 Balaji Rao <balajirrao@gmail.com>

cpufreq: fix kobject reference count handling

The cpufreq core should not take an extra kobject reference count for no
reason, and then refuse to release it. This has been reported as
keeping machines from properly powering down all the way.


Signed-off-by: Balaji Rao <balajirrao@gmail.com>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Yi Yang <yi.y.yang@intel.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Frans Pop <elendil@planet.nl>
Cc: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>


# 9e76988e 26-Oct-2007 Venki Pallipadi <venkatesh.pallipadi@intel.com>

[CPUFREQ] Eliminate cpufreq_userspace scaling_setspeed deadlock

Eliminate cpufreq_userspace scaling_setspeed deadlock.

Luming Yu recently uncovered yet another cpufreq related deadlock.
One thread that continuously switches the governors and the other thread that
repeatedly cats the contents of cpufreq directory causes both these threads to
go into a deadlock.

Detailed examination of the deadlock showed the exact flow before the deadlock
as:

Thread 1 Thread 2
________ ________
cats files under /sys/devices/.../cpufreq/
Set governor to userspace
Adds a new sysfs entry for
scaling_setspeed
cats files under /sys/devices/.../cpufreq/

Set governor to performance
Holds cpufreq_rw_sem in write
mode
Sends a STOP notify to
userspace governor
cat /sys/devices/.../cpufreq/scaling_setspeed
Gets a handle on the above sysfs entry with
sysfs_get_active
Blocks while trying to get cpufreq_rw_sem
in read mode
Remove a sysfs entry for
scaling_setspeed
Blocks on sysfs_deactivate
while waiting for earlier
get_active (on other thread)
to drain

At this point both threads go into deadlock and any other thread that tries to
do anything with sysfs cpufreq will also block.

There seems to be no easy way to avoid this deadlock as long as
cpufreq_userspace adds/removes the sysfs entry under same kobject as cpufreq.
Below patch moves scaling_setspeed to cpufreq.c, keeping it always and calling
back the governor on read/write. This is the cleanest fix I could think of,
even though adding two callbacks in governor structure just for this seems
unnecessary.

Note that the change makes scaling_setspeed under /sys/.../cpufreq permanent
and returns <unsupported> when governor is not userspace.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>


# a4a9df58 19-Nov-2007 Joe Perches <joe@perches.com>

[CPUFREQ] drivers/cpufreq: Add missing "space"

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Dave Jones <davej@redhat.com>


# 53391fa2 30-Jan-2008 Yi Yang <yi.y.yang@intel.com>

cpufreq: fix obvious condition statement error

The function __cpufreq_set_policy in file drivers/cpufreq/cpufreq.c
has a very obvious error:

if (policy->min > data->min && policy->min > policy->max) {
ret = -EINVAL;
goto error_out;
}

This condtion statement is wrong because it returns -EINVAL only if
policy->min is greater than policy->max (in this case,
"policy->min > data->min" is true for ever.). In fact, it should
return -EINVAL as well if policy->max is less than data->min.

The correct condition should be:

if (policy->min > data->max || policy->max < data->min) {

The following test result testifies the above conclusion:

Before applying this patch:

[root@yangyi-dev /]# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_frequencies
2394000 1596000
[root@yangyi-dev /]# echo 1596000 > /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
[root@yangyi-dev /]# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
1596000
[root@yangyi-dev /]# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq
1596000
[root@yangyi-dev /]# echo "2000000" > /sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq
-bash: echo: write error: Invalid argument
[root@yangyi-dev /]# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq
1596000
[root@yangyi-dev /]# echo "0" > /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
[root@yangyi-dev /]# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
1596000
[root@yangyi-dev /]# echo "1595000" > /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
[root@yangyi-dev /]# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
1596000
[root@yangyi-dev /]#

After applying this patch:

[root@yangyi-dev /]# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_frequencies
2394000 1596000
[root@yangyi-dev /]# echo 1596000 > /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
[root@yangyi-dev /]# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
1596000
[root@yangyi-dev /]# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq
1596000
[root@localhost /]# echo "2000000" > /sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq
-bash: echo: write error: Invalid argument
[root@localhost /]# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq
1596000
[root@localhost /]# echo "0" > /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
-bash: echo: write error: Invalid argument
[root@localhost /]# echo "1595000" > /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
-bash: echo: write error: Invalid argument
[root@localhost /]# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
1596000
[root@localhost /]# echo "1596000" > /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
[root@localhost /]# echo "2394000" > /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
[root@localhost /]# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
2394000
[root@localhost /]

Signed-off-by: Yi Yang <yi.y.yang@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# c10997f6 20-Dec-2007 Greg Kroah-Hartman <gregkh@suse.de>

Kobject: convert drivers/* from kobject_unregister() to kobject_put()

There is no need for kobject_unregister() anymore, thanks to Kay's
kobject cleanup changes, so replace all instances of it with
kobject_put().


Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>


# 038c5b3e 17-Dec-2007 Greg Kroah-Hartman <gregkh@suse.de>

Kobject: change drivers/cpufreq/cpufreq.c to use kobject_init_and_add

Stop using kobject_register, as this way we can control the sending of
the uevent properly, after everything is properly initialized.

Cc: Dominik Brodowski <linux@brodo.de>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Jacob Shin <jacob.shin@amd.com>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>


# bd6cba53 17-Dec-2007 Dave Jones <davej@redhat.com>

cpufreq: fix missing unlocks in cpufreq_add_dev error paths.

Ingo hit some BUG_ONs that were probably caused by these missing unlocks
causing an unbalance. He couldn't reproduce the bug reliably, so it's
unknown that it's definitly fixing the problem he hit, but it's a fairly
good chance, and this fixes an obvious bug.

[ Dave: "Ingo followed up that he hit some lockdep related output with
this applied, so it may not be right. I'll look at it after
xmas if no-one has it figured out before then."
Akpm: "It looks pretty correct to me though." ]

Signed-off-by: Dave Jones <davej@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 19c38de8 12-Sep-2007 Greg Kroah-Hartman <gregkh@suse.de>

kobjects: fix up improper use of the kobject name field

A number of different drivers incorrect access the kobject name field
directly. This is not correct as the name might not be in the array.
Use the proper accessor function instead.


# 9eb59573 09-Oct-2007 Andi Kleen <ak@linux.intel.com>

[CPUFREQ] Don't take semaphore in cpufreq_quick_get()

I don't see any reason to take an expensive lock in cpufreq_quick_get()
Reading policy->cur is a single atomic operation and after
the lock is dropped again the state could change any time anyways.

So don't take the lock in the first place.

This also makes this function interrupt safe which is useful
for some code of mine.

Signed-off-by: Andi Kleen <ak@suse.de>
Cc: "Pallipadi, Venkatesh" <venkatesh.pallipadi@intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>


# dd184a01 02-Oct-2007 Satyam Sharma <satyam@infradead.org>

[CPUFREQ] mark hotplug notifier callback as __cpuinit

The notifier_block is already __cpuinitdata, thereby allowing us to safely
mark the callback function as __cpuinit also, thereby saving space when
HOTPLUG_CPU=n.

Signed-off-by: Satyam Sharma <satyam@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dave Jones <davej@redhat.com>


# 6afde10c 02-Oct-2007 Thomas Renninger <trenn@suse.de>

[CPUFREQ] Only check for transition latency on problematic governors (kconfig fix)

Cc: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dave Jones <davej@redhat.com>


# 1c256245 02-Oct-2007 Thomas Renninger <trenn@suse.de>

[CPUFREQ] allow ondemand and conservative cpufreq governors to be used as default

Depending on the transition latency of the HW for cpufreq switches, the
ondemand or conservative governor cannot be used with certain cpufreq
drivers. Still the ondemand should be the default governor on a wide range
of systems. This patch allows this and lets the governor fallback to the
performance governor at cpufreq driver load time, if the driver does not
support fast enough frequency switching.

Main benefit is that on e.g. installation or other systems without
userspace support a working dynamic cpufreq support can be achieved on most
systems by simply loading the cpufreq driver. This is especially essential
for recent x86(_64) laptop hardware which may rely on working dynamic
cpufreq OS support.

Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Bryan Wu <bryan.wu@analog.com>
Cc: Andi Kleen <ak@suse.de>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dave Jones <davej@redhat.com>


# 8122c6ce 02-Oct-2007 Thomas Renninger <trenn@suse.de>

[CPUFREQ] move policy's governor initialisation out of low-level drivers into cpufreq core

Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Bryan Wu <bryan.wu@analog.com>
Cc: Andi Kleen <ak@suse.de>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dave Jones <davej@redhat.com>


# 084f3493 09-Jul-2007 Thomas Renninger <trenn@suse.de>

[CPUFREQ] Restore previously used governor on a hot-replugged CPU

Negative side effect: needs NR_CPUs pointer array of memory in
CONFIG_HOTPLUG_CPU case.

Still needs userspace track keeping and rewriting of governors if governors
change while a CPU is not active (always the governor at CPU remove time is
restored).

Move of policy->user_policy.governor assignment is just a minor cleanup.
http://bugzilla.kernel.org/show_bug.cgi?id=8671

Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Mattia Dongili <malattia@linux.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dave Jones <davej@redhat.com>


# 91973de7 09-Jul-2007 Peter Oruba <peter.oruba@amd.com>

[CPUFREQ] bugfix cpufreq in combination with performance governor

There is a frequency scaling issue that I encountered with the performance
governor in combination with CPU hotplug.

In cpufreq.c CPU frequency is reduced to its minimum before the CPU gets
unregistered and set offline. Does that have a particular reason?

Since the (k8-)governor does not monitor CPU frequency that setting also
applies then to the remaining CPU as well and lets the system run on the
lowest frequency although performance is chose as the policy.

Signed-off-by: Peter Oruba <peter.oruba@amd.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dave Jones <davej@redhat.com>


# 58a7295b 13-Jun-2007 Tobias Klauser <tklauser@distanz.ch>

[CPUFREQ] Fix sysfs_create_file return value handling

Commit 0a4b2ccc555fa2ca6873d60219047104e4805d45 in cpufreq.git
eliminates the build warnings but does not pass on the error code of
sysfs_create_file to the function calling cpufreq_add_dev. Instead some
previous value of ret would be returned.

Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Dave Jones <davej@redhat.com>


# 0a4b2ccc 21-May-2007 Thomas Renninger <trenn@suse.de>

[CPUFREQ] check return value of sysfs_create_file

Eliminate build warning (sysfs_create_file return value must be checked)

Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Dave Jones <davej@redhat.com>


# 8bb78442 09-May-2007 Rafael J. Wysocki <rjw@rjwysocki.net>

Add suspend-related notifications for CPU hotplug

Since nonboot CPUs are now disabled after tasks and devices have been
frozen and the CPU hotplug infrastructure is used for this purpose, we need
special CPU hotplug notifications that will help the CPU-hotplug-aware
subsystems distinguish normal CPU hotplug events from CPU hotplug events
related to a system-wide suspend or resume operation in progress. This
patch introduces such notifications and causes them to be used during
suspend and resume transitions. It also changes all of the
CPU-hotplug-aware subsystems to take these notifications into consideration
(for now they are handled in the same way as the corresponding "normal"
ones).

[oleg@tv-sign.ru: cleanups]
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Gautham R Shenoy <ego@in.ibm.com>
Cc: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 632786ce 19-Apr-2007 Thomas Renninger <trenn@suse.de>

[CPUFREQ] Remove deprecated /proc/acpi/processor/performance write support

Remove deprecated /proc/acpi/processor/performance write support

Writing to /proc/acpi/processor/xy/performance interferes with sysfs
cpufreq interface. Also removes buggy cpufreq_set_policy exported symbol.

Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Dave Jones <davej@redhat.com>


# 22c970f3 19-Apr-2007 Thomas Renninger <trenn@suse.de>

[CPUFREQ] Fix limited cpufreq when booted on battery

References:
https://bugzilla.novell.com/show_bug.cgi?id=231107
https://bugzilla.novell.com/show_bug.cgi?id=264077

Fix limited cpufreq when booted on battery

If booted on battery:
cpufreq_set_policy (evil) is invoked which calls verify_within_limits.
max_freq gets lowered and therefore users_policy.max, which
is used to restore higher freqs via update_policy later is set to the
already limited frequency -> you can never go up again, even BIOS
allows higher freqs later.

Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Dave Jones <davej@redhat.com>


# ec28297a 26-Mar-2007 Venki Pallipadi <venkatesh.pallipadi@intel.com>

[PATCH] Fix maxcpus=1 trigerring BUG() in cpufreq

Ingo reported it on lkml in the thread
"2.6.21-rc5: maxcpus=1 crash in cpufreq: kernel BUG at drivers/cpufreq/cpufreq.c:82!"

This check added to remove_dev is symmetric to one in add_dev and handles
callbacks for offline cpus cleanly.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 221dee28 26-Feb-2007 Linus Torvalds <torvalds@woody.linux-foundation.org>

Revert "[CPUFREQ] constify cpufreq_driver where possible."

This reverts commit aeeddc1435c37fa3fc844f31d39c185b08de4158, which was
half-baked and broken. It just resulted in compile errors, since
cpufreq_register_driver() still changes the 'driver_data' by setting
bits in the flags field. So claiming it is 'const' _really_ doesn't
work.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# aeeddc14 22-Feb-2007 Dave Jones <davej@redhat.com>

[CPUFREQ] constify cpufreq_driver where possible.

Not all cases are possible due to ->flags being set at runtime
on some drivers.

Signed-off-by: Dave Jones <davej@redhat.com>


# 5a01f2e8 05-Feb-2007 Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>

[CPUFREQ] Rewrite lock in cpufreq to eliminate cpufreq/hotplug related issues

Yet another attempt to resolve cpufreq and hotplug locking issues.

Patchset has 3 patches:
* Rewrite the lock infrastructure of cpufreq using a per cpu rwsem.
* Minor restructuring of work callback in ondemand driver.
* Use the new cpufreq rwsem infrastructure in ondemand work.

This patch:

Convert policy->lock to rwsem and move it to per_cpu area.
This rwsem will protect against both changing/accessing policy
related parameters and CPU hot plug/unplug.

[malattia@linux.it: fix oops in kref_put()]
Cc: Gautham R Shenoy <ego@in.ibm.com>
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Gautham R Shenoy <ego@in.ibm.com>
Signed-off-by: Mattia Dongili <malattia@linux.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dave Jones <davej@redhat.com>


# c1200697 05-Feb-2007 Dave Jones <davej@redhat.com>

[CPUFREQ] Remove hotplug cpu crap

The hotplug CPU locking in cpufreq is horrendous. No-one seems to care
enough to fix it, so just remove it so that the 99.9% of the real world
users of this code can use cpufreq without being bothered by warnings.

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dave Jones <davej@redhat.com>


# 0142f9dc 04-Jan-2007 Ahmed S. Darwish <darwish.07@gmail.com>

[CPUFREQ] check sysfs_create_link return value

Trivial patch to check sysfs_create_link return values.
Fail gracefully if needed.

Signed-off-by: Ahmed Darwish <darwish.07@gmail.com>
Signed-off-by: Dave Jones <davej@redhat.com>


# 4ab70df4 13-Dec-2006 Dhaval Giani <dhaval.giani@gmail.com>

[CPUFREQ] fixes typo in cpufreq.c

This patch fixes a typo in cpufreq.c

From: Dhaval Giani <dhaval.giani@gmail.com>
Signed-off-by: Dave Jones <davej@redhat.com>


# 02316067 06-Dec-2006 Ingo Molnar <mingo@elte.hu>

[PATCH] hotplug CPU: clean up hotcpu_notifier() use

There was lots of #ifdef noise in the kernel due to hotcpu_notifier(fn,
prio) not correctly marking 'fn' as used in the !HOTPLUG_CPU case, and thus
generating compiler warnings of unused symbols, hence forcing people to add
#ifdefs.

the compiler can skip truly unused functions just fine:

text data bss dec hex filename
1624412 728710 3674856 6027978 5bfaca vmlinux.before
1624412 728710 3674856 6027978 5bfaca vmlinux.after

[akpm@osdl.org: topology.c fix]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 65f27f38 22-Nov-2006 David Howells <dhowells@redhat.com>

WorkStruct: Pass the work_struct pointer instead of context data

Pass the work_struct pointer to the work function rather than context data.
The work function can use container_of() to work out the data.

For the cases where the container of the work_struct may go away the moment the
pending bit is cleared, it is made possible to defer the release of the
structure by deferring the clearing of the pending bit.

To make this work, an extra flag is introduced into the management side of the
work_struct. This governs auto-release of the structure upon execution.

Ordinarily, the work queue executor would release the work_struct for further
scheduling or deallocation by clearing the pending bit prior to jumping to the
work function. This means that, unless the driver makes some guarantee itself
that the work_struct won't go away, the work function may not access anything
else in the work_struct or its container lest they be deallocated.. This is a
problem if the auxiliary data is taken away (as done by the last patch).

However, if the pending bit is *not* cleared before jumping to the work
function, then the work function *may* access the work_struct and its container
with no problems. But then the work function must itself release the
work_struct by calling work_release().

In most cases, automatic release is fine, so this is the default. Special
initiators exist for the non-auto-release case (ending in _NAR).


Signed-Off-By: David Howells <dhowells@redhat.com>


# b3438f82 20-Nov-2006 Linus Torvalds <torvalds@woody.osdl.org>

Add "pure_initcall" for static variable initialization

This is a quick hack to overcome the fact that SRCU currently does not
allow static initializers, and we need to sometimes initialize those
things before any other initializers (even "core" ones) can do so.

Currently we don't allow this at all for modules, and the only user that
needs is right now is cpufreq. As reported by Thomas Gleixner:

"Commit b4dfdbb3c707474a2254c5b4d7e62be31a4b7da9 ("[PATCH] cpufreq:
make the transition_notifier chain use SRCU breaks cpu frequency
notification users, which register the callback > on core_init
level."

Cc: Thomas Gleixner <tglx@timesys.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Andrew Morton <akpm@osdl.org>,
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# e08f5f5b 26-Oct-2006 Gautham R Shenoy <ego@in.ibm.com>

[CPUFREQ] Fix coding style issues in cpufreq.

Clean up cpufreq subsystem to fix coding style issues and to improve
the readability.

Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
Signed-off-by: Dave Jones <davej@redhat.com>


# dfde5d62 03-Oct-2006 Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>

[CPUFREQ][8/8] acpi-cpufreq: Add support for freq feedback from hardware

Enable ondemand governor and acpi-cpufreq to use IA32_APERF and IA32_MPERF MSR
to get active frequency feedback for the last sampling interval. This will
make ondemand take right frequency decisions when hardware coordination of
frequency is going on.

Without APERF/MPERF, ondemand can take wrong decision at times due
to underlying hardware coordination or TM2.
Example:
* CPU 0 and CPU 1 are hardware cooridnated.
* CPU 1 running at highest frequency.
* CPU 0 was running at highest freq. Now ondemand reduces it to
some intermediate frequency based on utilization.
* Due to underlying hardware coordination with other CPU 1, CPU 0 continues to
run at highest frequency (as long as other CPU is at highest).
* When ondemand samples CPU 0 again next time, without actual frequency
feedback from APERF/MPERF, it will think that previous frequency change
was successful and can go to wrong target frequency. This is because it
thinks that utilization it has got this sampling interval is when running at
intermediate frequency, rather than actual highest frequency.

More information about IA32_APERF IA32_MPERF MSR:
Refer to IA-32 Intel® Architecture Software Developer's Manual at
http://developer.intel.com

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>


# b4dfdbb3 04-Oct-2006 Alan Stern <stern@rowland.harvard.edu>

[PATCH] cpufreq: make the transition_notifier chain use SRCU

This patch (as762) changes the cpufreq_transition_notifier_list from a
blocking_notifier_head to an srcu_notifier_head. This will prevent errors
caused attempting to call down_read() to access the notifier chain at a
time when interrupts must remain disabled, during system suspend.

It's not clear to me whether this is really necessary; perhaps the chain
could be made into an atomic_notifier. However a couple of the callout
routines do use blocking operations, so this approach seems safer.

The head of the notifier chain needs to be initialized before use; this is
done by an __init routine at core_initcall time. If this turns out not to
be a good choice, it can easily be changed.

Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
Cc: Jesse Brandeburg <jesse.brandeburg@gmail.com>
Cc: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 0e37b159 26-Sep-2006 Dave Jones <davej@redhat.com>

[CPUFREQ] Fix cut-n-paste bug in suspend printk

Signed-off-by: Dave Jones <davej@redhat.com>


# cd878479 11-Aug-2006 Dave Jones <davej@redhat.com>

[CPUFREQ] Fix typo.

Signed-off-by: Dave Jones <davej@redhat.com>


# ea714970 06-Jul-2006 Jeremy Fitzhardinge <jeremy@goop.org>

[CPUFREQ] [2/2] demand load governor modules.

Demand-load cpufreq governor modules if needed.

Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Dave Jones <davej@redhat.com>


# 3bcb09a3 06-Jul-2006 Jeremy Fitzhardinge <jeremy@goop.org>

[CPUFREQ] [1/2] add __find_governor helper and clean up some error handling.

Adds a __find_governor() helper function to look up a governor by
name. Also restructures some error handling to conform to the
"single-exit" model which is generally preferred for kernel code.

Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Dave Jones <davej@redhat.com>


# 9c9a43ed 05-Jul-2006 Mattia Dongili <malattia@linux.it>

[CPUFREQ] return error when failing to set minfreq

I just stumbled on this bug/feature, this is how to reproduce it:

# echo 450000 > /sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq
# echo 450000 > /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
# echo powersave > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
# cpufreq-info -p
450000 450000 powersave
# echo 1800000 > /sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq ; echo $?
0
# cpufreq-info -p
450000 450000 powersave

Here it is. The kernel refuses to set a min_freq higher than the
max_freq but it allows a max_freq lower than min_freq (lowering min_freq
also).

This behaviour is pretty straightforward (but undocumented) and it
doesn't return an error altough failing to accomplish the requested
action (set min_freq).
The problem (IMO) is basically that userspace is not allowed to set a
full policy atomically while the kernel always does that thus it must
enforce an ordering on operations.

The attached patch returns -EINVAL if trying to increase frequencies
starting from scaling_min_freq and documents the correct ordering of writes.

Signed-off-by: Mattia Dongili <malattia@linux.it>
Signed-off-by: Dominik Brodowski <linux at dominikbrodowski.net>
Signed-off-by: Dave Jones <davej@redhat.com>

--


# 153d7f3f 26-Jul-2006 Arjan van de Ven <arjan@linux.intel.com>

[PATCH] Reorganize the cpufreq cpu hotplug locking to not be totally bizare

The patch below moves the cpu hotplugging higher up in the cpufreq
layering; this is needed to avoid recursive taking of the cpu hotplug
lock and to otherwise detangle the mess.

The new rules are:
1. you must do lock_cpu_hotplug() around the following functions:
__cpufreq_driver_target
__cpufreq_governor (for CPUFREQ_GOV_LIMITS operation only)
__cpufreq_set_policy
2. governer methods (.governer) must NOT take the lock_cpu_hotplug()
lock in any way; they are called with the lock taken already
3. if your governer spawns a thread that does things, like calling
__cpufreq_driver_target, your thread must honor rule #1.
4. the policy lock and other cpufreq internal locks nest within
the lock_cpu_hotplug() lock.

I'm not entirely happy about how the __cpufreq_governor rule ended up
(conditional locking rule depending on the argument) but basically all
callers pass this as a constant so it's not too horrible.

The patch also removes the cpufreq_governor() function since during the
locking audit it turned out to be entirely unused (so no need to fix it)

The patch works on my testbox, but it could use more testing
(otoh... it can't be much worse than the current code)

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# a496e25d 06-Jul-2006 Dave Jones <davej@redhat.com>

[PATCH] Fix cpufreq vs hotplug lockdep recursion.

[ There's some not quite baked bits in cpufreq-git right now
so sending this on as a patch instead ]

On Thu, 2006-07-06 at 07:58 -0700, Tom London wrote:

> After installing .2356 I get this each time I boot:
> =======================================================
> [ INFO: possible circular locking dependency detected ]
> -------------------------------------------------------
> S06cpuspeed/1620 is trying to acquire lock:
> (dbs_mutex){--..}, at: [<c060d6bb>] mutex_lock+0x21/0x24
>
> but task is already holding lock:
> (cpucontrol){--..}, at: [<c060d6bb>] mutex_lock+0x21/0x24
>
> which lock already depends on the new lock.
>

make sure the cpu hotplug recursive mutex (yuck) is taken early in the
cpufreq codepaths to avoid a AB-BA deadlock.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 6ab3d562 30-Jun-2006 Jörn Engel <joern@wohnheim.fh-wedel.de>

Remove obsolete #include <linux/config.h>

Signed-off-by: Jörn Engel <joern@wohnheim.fh-wedel.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>


# 74b85f37 27-Jun-2006 Chandra Seetharaman <sekharan@us.ibm.com>

[PATCH] cpu hotplug: make cpu_notifier related notifier blocks __cpuinit only

Make notifier_blocks associated with cpu_notifier as __cpuinitdata.

__cpuinitdata makes sure that the data is init time only unless
CONFIG_HOTPLUG_CPU is defined.

Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 65edc68c 27-Jun-2006 Chandra Seetharaman <sekharan@us.ibm.com>

[PATCH] cpu hotplug: make [un]register_cpu_notifier init time only

CPUs come online only at init time (unless CONFIG_HOTPLUG_CPU is defined).
So, cpu_notifier functionality need to be available only at init time.

This patch makes register_cpu_notifier() available only at init time, unless
CONFIG_HOTPLUG_CPU is defined.

This patch exports register_cpu_notifier() and unregister_cpu_notifier() only
if CONFIG_HOTPLUG_CPU is defined.

Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 9c7b216d 27-Jun-2006 Chandra Seetharaman <sekharan@us.ibm.com>

[PATCH] cpu hotplug: revert init patch submitted for 2.6.17

In 2.6.17, there was a problem with cpu_notifiers and XFS. I provided a
band-aid solution to solve that problem. In the process, i undid all the
changes you both were making to ensure that these notifiers were available
only at init time (unless CONFIG_HOTPLUG_CPU is defined).

We deferred the real fix to 2.6.18. Here is a set of patches that fixes the
XFS problem cleanly and makes the cpu notifiers available only at init time
(unless CONFIG_HOTPLUG_CPU is defined).

If CONFIG_HOTPLUG_CPU is defined then cpu notifiers are available at run
time.

This patch reverts the notifier_call changes made in 2.6.17

Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# b10eec22 28-Apr-2006 Jan Beulich <jbeulich@novell.com>

[CPUFREQ] cpufreq core {d,}printk adjustments

Remove KERN_* suffixes from some cpufreq driver's dprintk-s.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Dave Jones <davej@redhat.com>


# 83d722f7 24-Apr-2006 Chandra Seetharaman <sekharan@us.ibm.com>

[PATCH] Remove __devinit and __cpuinit from notifier_call definitions

Few of the notifier_chain_register() callers use __init in the definition
of notifier_call. It is incorrect as the function definition should be
available after the initializations (they do not unregister them during
initializations).

This patch fixes all such usages to _not_ have the notifier_call __init
section.

Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 7b14dedd 18-Apr-2006 Adrian Bunk <bunk@stusta.de>

[CPUFREQ] drivers/cpufreq/cpufreq.c: static functions mustn't be exported

This patch removes the EXPORT_SYMBOL_GPL of the static function cpufreq_parse_governor().

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Dave Jones <davej@redhat.com>


# 7970e08b 13-Apr-2006 Thomas Renninger <trenn@suse.de>

[CPUFREQ] If max_freq got reduced (e.g. by _PPC) a write to sysfs scaling_governor let cpufreq core stuck at low max_freq for ever

The previous patch had bugs (locking and refcount).

This one could also be related to the latest DELL reports.
But they only slip into this if a user prog (e.g. powersave daemon does when
AC got (un) plugged due to a scheme change) echos something to
/sys/../cpufreq/scaling_governor
while the frequencies got limited by BIOS.

This one works:

Subject: Max freq stucks at low freq if reduced by _PPC and sysfs gov access

The problem is reproducable by(if machine is limiting freqs via BIOS):
- Unplugging AC -> max freq gets limited
- echo ${governor} >/sys/.../cpufreq/scaling_governor (policy->user_data.max
gets overridden with policy->max and will never come up again.)

This patch exchanged the cpufreq_set_policy call to __cpufreq_set_policy and
duplicated it's functionality but did not override user_data.max.
The same happens with overridding min/max values. If freqs are limited and
you override the min freq value, the max freq global value will also get
stuck to the limited freq, even if BIOS allows all freqs again.
Last scenario does only happen if BIOS does not reduce the frequency
to the lowest value (should never happen, just for correctness...)

drivers/cpufreq/cpufreq.c | 17 +++++++++++++++--
1 files changed, 15 insertions(+), 2 deletions(-)

Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: "Pallipadi, Venkatesh" <venkatesh.pallipadi@intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>


# 87c32271 28-Mar-2006 Dave Jones <davej@redhat.com>

[CPUFREQ] trailing whitespace removal de-jour.

Signed-off-by: Dave Jones <davej@redhat.com>


# 1f8b2c9d 28-Mar-2006 Dave Jones <davej@redhat.com>

[CPUFREQ] extra debugging in cpufreq_add_dev()

Snipped from an otherwise rejected patch by Jan Beulich <jbeulich@novell.com>
Signed-off-by: Dave Jones <davej@redhat.com>


# e041c683 27-Mar-2006 Alan Stern <stern@rowland.harvard.edu>

[PATCH] Notifier chain update: API changes

The kernel's implementation of notifier chains is unsafe. There is no
protection against entries being added to or removed from a chain while the
chain is in use. The issues were discussed in this thread:

http://marc.theaimsgroup.com/?l=linux-kernel&m=113018709002036&w=2

We noticed that notifier chains in the kernel fall into two basic usage
classes:

"Blocking" chains are always called from a process context
and the callout routines are allowed to sleep;

"Atomic" chains can be called from an atomic context and
the callout routines are not allowed to sleep.

We decided to codify this distinction and make it part of the API. Therefore
this set of patches introduces three new, parallel APIs: one for blocking
notifiers, one for atomic notifiers, and one for "raw" notifiers (which is
really just the old API under a new name). New kinds of data structures are
used for the heads of the chains, and new routines are defined for
registration, unregistration, and calling a chain. The three APIs are
explained in include/linux/notifier.h and their implementation is in
kernel/sys.c.

With atomic and blocking chains, the implementation guarantees that the chain
links will not be corrupted and that chain callers will not get messed up by
entries being added or removed. For raw chains the implementation provides no
guarantees at all; users of this API must provide their own protections. (The
idea was that situations may come up where the assumptions of the atomic and
blocking APIs are not appropriate, so it should be possible for users to
handle these things in their own way.)

There are some limitations, which should not be too hard to live with. For
atomic/blocking chains, registration and unregistration must always be done in
a process context since the chain is protected by a mutex/rwsem. Also, a
callout routine for a non-raw chain must not try to register or unregister
entries on its own chain. (This did happen in a couple of places and the code
had to be changed to avoid it.)

Since atomic chains may be called from within an NMI handler, they cannot use
spinlocks for synchronization. Instead we use RCU. The overhead falls almost
entirely in the unregister routine, which is okay since unregistration is much
less frequent that calling a chain.

Here is the list of chains that we adjusted and their classifications. None
of them use the raw API, so for the moment it is only a placeholder.

ATOMIC CHAINS
-------------
arch/i386/kernel/traps.c: i386die_chain
arch/ia64/kernel/traps.c: ia64die_chain
arch/powerpc/kernel/traps.c: powerpc_die_chain
arch/sparc64/kernel/traps.c: sparc64die_chain
arch/x86_64/kernel/traps.c: die_chain
drivers/char/ipmi/ipmi_si_intf.c: xaction_notifier_list
kernel/panic.c: panic_notifier_list
kernel/profile.c: task_free_notifier
net/bluetooth/hci_core.c: hci_notifier
net/ipv4/netfilter/ip_conntrack_core.c: ip_conntrack_chain
net/ipv4/netfilter/ip_conntrack_core.c: ip_conntrack_expect_chain
net/ipv6/addrconf.c: inet6addr_chain
net/netfilter/nf_conntrack_core.c: nf_conntrack_chain
net/netfilter/nf_conntrack_core.c: nf_conntrack_expect_chain
net/netlink/af_netlink.c: netlink_chain

BLOCKING CHAINS
---------------
arch/powerpc/platforms/pseries/reconfig.c: pSeries_reconfig_chain
arch/s390/kernel/process.c: idle_chain
arch/x86_64/kernel/process.c idle_notifier
drivers/base/memory.c: memory_chain
drivers/cpufreq/cpufreq.c cpufreq_policy_notifier_list
drivers/cpufreq/cpufreq.c cpufreq_transition_notifier_list
drivers/macintosh/adb.c: adb_client_list
drivers/macintosh/via-pmu.c sleep_notifier_list
drivers/macintosh/via-pmu68k.c sleep_notifier_list
drivers/macintosh/windfarm_core.c wf_client_list
drivers/usb/core/notify.c usb_notifier_list
drivers/video/fbmem.c fb_notifier_list
kernel/cpu.c cpu_chain
kernel/module.c module_notify_list
kernel/profile.c munmap_notifier
kernel/profile.c task_exit_notifier
kernel/sys.c reboot_notifier_list
net/core/dev.c netdev_chain
net/decnet/dn_dev.c: dnaddr_chain
net/ipv4/devinet.c: inetaddr_chain

It's possible that some of these classifications are wrong. If they are,
please let us know or submit a patch to fix them. Note that any chain that
gets called very frequently should be atomic, because the rwsem read-locking
used for blocking chains is very likely to incur cache misses on SMP systems.
(However, if the chain's callout routines may sleep then the chain cannot be
atomic.)

The patch set was written by Alan Stern and Chandra Seetharaman, incorporating
material written by Keith Owens and suggestions from Paul McKenney and Andrew
Morton.

[jes@sgi.com: restructure the notifier chain initialization macros]
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Signed-off-by: Jes Sorensen <jes@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 8ff69732 05-Mar-2006 Dave Jones <davej@redhat.com>

[CPUFREQ] Fix handling for CPU hotplug

This patch adds proper logic to cpufreq driver in order to handle
CPU Hotplug.

When CPUs go on/offline, the affected CPUs data, cpufreq_policy->cpus,
is not updated properly. This causes sysfs directories and symlinks to
be in an incorrect state after few CPU on/offlines.

Signed-off-by: Jacob Shin <jacob.shin@amd.com>
Signed-off-by: Dave Jones <davej@redhat.com>


# 32ee8c3e 27-Feb-2006 Dave Jones <davej@redhat.com>

[CPUFREQ] Lots of whitespace & CodingStyle cleanup.

Signed-off-by: Dave Jones <davej@redhat.com>


# 7d5e350f 02-Feb-2006 Dave Jones <davej@redhat.com>

[CPUFREQ] Whitespace/CodingStyle cleanups

Signed-off-by: Dave Jones <davej@redhat.com>


# a85f7bd3 01-Feb-2006 Thomas Renninger <trenn@suse.de>

[CPUFREQ] Check whether driver init did not initialize current freq

Check whether driver init did not initialize current freq

Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Dave Jones <davej@redhat.com>


# e4472cb3 31-Jan-2006 Dave Jones <davej@redhat.com>

[CPUFREQ] cpufreq_notify_transition cleanup.

Introduce caching of cpufreq_cpu_data[freqs->cpu], which allows us to
make the function a lot more readable, and as a nice side-effect, it
now fits in < 80 column displays again.

Signed-off-by: Dave Jones <davej@redhat.com>


# 0961dd0d 26-Jan-2006 Thomas Renninger <trenn@suse.de>

[CPUFREQ] _PPC frequency change issues

BIOS might change frequency behind our back when BIOS changes allowed
frequencies via _PPC. In this case cpufreq core got out of sync.
Ask driver for current freq and notify governors about a change

Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>


# f3876c1b 18-Jan-2006 Andrew Morton <akpm@osdl.org>

[CPUFREQ] Don't free held mutex in cpufreq_add_dev()

Make the cpufreq code play nicely with the mutex debugging code: don't free a
held mutex.

Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Dave Jones <davej@redhat.com>


# 83933af4 14-Jan-2006 Arjan van de Ven <arjan@infradead.org>

[CPUFREQ] convert remaining cpufreq semaphore to a mutex

This one fell through the automation at first because it initializes the
semaphore to locked, but that's easily remedied

Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Dave Jones <davej@redhat.com>

drivers/cpufreq/cpufreq.c | 37 +++++++++++++++++++------------------
include/linux/cpufreq.h | 3 ++-
2 files changed, 21 insertions(+), 19 deletions(-)


# 3fc54d37 13-Jan-2006 akpm@osdl.org <akpm@osdl.org>

[CPUFREQ] Convert drivers/cpufreq semaphores to mutexes.

Semaphore to mutex conversion.

The conversion was generated via scripts, and the result was validated
automatically via a script as well.

Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Dave Jones <davej@redhat.com>


# 858119e1 14-Jan-2006 Arjan van de Ven <arjan@infradead.org>

[PATCH] Unlinline a bunch of other functions

Remove the "inline" keyword from a bunch of big functions in the kernel with
the goal of shrinking it by 30kb to 40kb

Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Jeff Garzik <jgarzik@pobox.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 95235ca2 02-Dec-2005 Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>

[CPUFREQ] CPU frequency display in /proc/cpuinfo

What is the value shown in "cpu MHz" of /proc/cpuinfo when CPUs are capable of
changing frequency?

Today the answer is: It depends.
On i386:
SMP kernel - It is always the boot frequency
UP kernel - Scales with the frequency change and shows that was last set.

On x86_64:
There is one single variable cpu_khz that gets written by all the CPUs. So,
the frequency set by last CPU will be seen on /proc/cpuinfo of all the
CPUs in the system. What you see also depends on whether you have constant_tsc
capable CPU or not.

On ia64:
It is always boot time frequency of a particular CPU that gets displayed.

The patch below changes this to:
Show the last known frequency of the particular CPU, when cpufreq is present. If
cpu doesnot support changing of frequency through cpufreq, then boot frequency
will be shown. The patch affects i386, x86_64 and ia64 architectures.

Signed-off-by: Venkatesh Pallipadi<venkatesh.pallipadi@intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>


# a9d9baa1 28-Nov-2005 Ashok Raj <ashok.raj@intel.com>

[PATCH] clean up lock_cpu_hotplug() in cpufreq

There are some callers in cpufreq hotplug notify path that the lowest
function calls lock_cpu_hotplug(). The lock is already held during
cpu_up() and cpu_down() calls when the notify calls are broadcast to
registered clients.

Ideally if possible, we could disable_preempt() at the highest caller and
make sure we dont sleep in the path down in cpufreq->driver_target() calls
but the calls are so intertwined and cumbersome to cleanup.

Hence we consistently use lock_cpu_hotplug() and unlock_cpu_hotplug() in
all places.

- Removed export of cpucontrol semaphore and made it static.
- removed explicit uses of up/down with lock_cpu_hotplug()
so we can keep track of the the callers in same thread context and
just keep refcounts without calling a down() that causes a deadlock.
- Removed current_in_hotplug() uses
- Removed PF_HOTPLUG_CPU in sched.h introduced for the current_in_hotplug()
temporary workaround.

Tested with insmod of cpufreq_stat.ko, and logical online/offline
to make sure we dont have any hang situations.

Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Cc: Zwane Mwaikambo <zwane@linuxpower.ca>
Cc: Shaohua Li <shaohua.li@intel.com>
Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# e738cf6d 21-Nov-2005 Grant Coady <gcoady@gmail.com>

[PATCH] cpufreq: silence cpufreq for UP

drivers/cpufreq/cpufreq.c: In function `cpufreq_remove_dev':
drivers/cpufreq/cpufreq.c:696: warning: unused variable `cpu_sys_dev'

Signed-off-by: Grant Coady <gcoady@gmail.com>
Cc: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 90d45d17 08-Nov-2005 Ashok Raj <ashok.raj@intel.com>

[PATCH] cpu hotplug: fix locking in cpufreq drivers

When calling target drivers to set frequency, we take cpucontrol lock.
When we modified the code to accomodate CPU hotplug, there was an attempt
to take a double lock of cpucontrol leading to a deadlock. Since the
current thread context is already holding the cpucontrol lock, we dont need
to make another attempt to acquire it.

Now we leave a trace in current->flags indicating current thread already is
under cpucontrol lock held, so we dont attempt to do this another time.

Thanks to Andrew Morton for the beating:-)

From: Brice Goglin <Brice.Goglin@ens-lyon.org>

Build fix

(akpm: this patch is still unpleasant. Ashok continues to look for a cleaner
solution, doesn't he? ;))

Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Brice Goglin <Brice.Goglin@ens-lyon.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# c32b6b8e 30-Oct-2005 Ashok Raj <ashok.raj@intel.com>

[PATCH] create and destroy cpufreq sysfs entries based on cpu notifiers

cpufreq entries in sysfs should only be populated when CPU is online state.
When we either boot with maxcpus=x and then boot the other cpus by echoing
to sysfs online file, these entries should be created and destroyed when
CPU_DEAD is notified. Same treatement as cache entries under sysfs.

We place the processor in the lowest frequency, so hw managed P-State
transitions can still work on the other threads to save power.

Primary goal was to just make these directories appear/disapper dynamically.

There is one in this patch i had to do, which i really dont like myself but
probably best if someone handling the cpufreq infrastructure could give
this code right treatment if this is not acceptable. I guess its probably
good for the first cut.

- Converting lock_cpu_hotplug()/unlock_cpu_hotplug() to disable/enable preempt.
The locking was smack in the middle of the notification path, when the
hotplug is already holding the lock. I tried another solution to avoid this
so avoid taking locks if we know we are from notification path. The solution
was getting very ugly and i decided this was probably good for this iteration
until someone who understands cpufreq could do a better job than me.

(akpm: export cpucontrol to GPL modules: drivers/cpufreq/cpufreq_stats.c now
does lock_cpu_hotplug())

Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Zwane Mwaikambo <zwane@holomorphy.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# d434fca7 30-Oct-2005 Ashok Raj <ashok.raj@intel.com>

[PATCH] Remove cpu_sys_devices in cpufreq subsystem.

cpu_sys_devices is redundant with the new API get_cpu_sysdev(). So nuking
this usage since its not needed.

Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Zwane Mwaikambo <zwane@holomorphy.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# e98df50c 20-Oct-2005 Dave Jones <davej@redhat.com>

[CPUFREQ] kzalloc conversions for cpufreq core.

Signed-off-by: Dave Jones <davej@redhat.com>


# 8085e1f1 25-Aug-2005 Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>

[CPUFREQ] Bugfix: Call driver exit in cpufreq_add_dev error path

A minor fix for cpufreq_add_dev() error path. We need to call driver->exit()
if driver_init() call has succeeded.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>


# cc993cab 28-Jul-2005 Dave Jones <davej@redhat.com>

Here are two possible cleanups in cpufreq.c:
* ret has no need to be unsigned in cpufreq_driver_target()
* ret has no need to be initialized in __cpufreq_governor()

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Dave Jones <davej@redhat.com>


# e00d9967 07-Jul-2005 Bernard Blackham <bernard@blackham.com.au>

[PATCH] pm: fix u32 vs. pm_message_t confusion in cpufreq

Fix u32 vs pm_message_t confusion in cpufreq.

Signed-off-by: Bernard Blackham <bernard@blackham.com.au>
Signed-off-by: Pavel Machek <pavel@suse.cz>
Cc: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 70f2817a 29-Apr-2005 Dmitry Torokhov <dtor_core@ameritech.net>

[PATCH] sysfs: (rest) if show/store is missing return -EIO

sysfs: fix the rest of the kernel so if an attribute doesn't
implement show or store method read/write will return
-EIO instead of 0 or -EINVAL or -EPERM.

Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>


# 78ee998f 31-May-2005 Dave Jones <davej@redhat.com>

[CPUFREQ] cpufreq-core: reduce warning messages.

cpufreq core is printing out messages at KERN_WARNING level that the core
recovers from without intervention, and that the system administrator can
do nothing about. Patch below reduces the severity of these messages to
debug.

Signed-off-by: Matt Domsch <Matt_Domsch@dell.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Dave Jones <davej@redhat.com>


# ac09f698 02-May-2005 Benjamin Herrenschmidt <benh@kernel.crashing.org>

[PATCH] cpufreq annoying warning fix

The cpufreq core patch I sent earlier got only half-applied. I added a
flag to let the low level driver disable an annoying warning on
suspend/resume that is normal on ppc, but the "resume" part of it wasn't
applied.

This just adds back that missing bit. The original patch also reworked
the resume() function to avoid nesting too many if () statements along
the way I did the suspend() one, but I didn't include that in the patch
below.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 42d4dc3f 29-Apr-2005 Benjamin Herrenschmidt <benh@kernel.crashing.org>

[PATCH] Add suspend method to cpufreq core

In order to properly fix some issues with cpufreq vs. sleep on
PowerBooks, I had to add a suspend callback to the pmac_cpufreq driver.
I must force a switch to full speed before sleep and I switch back to
previous speed on resume.

I also added a driver flag to disable the warnings in suspend/resume
since it is expected in this case to have different speed (and I want it
to fixup the jiffies properly).

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 1da177e4 16-Apr-2005 Linus Torvalds <torvalds@ppc970.osdl.org>

Linux-2.6.12-rc2

Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.

Let it rip!