Cross Reference: /linux-master/drivers/cpufreq/cpufreq

History log of /linux-master/drivers/cpufreq/cpufreq_governor.c
Revision	Date	Author	Comments
# 916f1388	29-Aug-2023	Liao Chang <liaochang1@huawei.com>	cpufreq: governor: Free dbs_data directly when gov->init() fails Due to the kobject embedded in the dbs_data doest not has a release() method yet, it needs to use kfree() to free dbs_data directly when governor fails to allocate the tunner field of dbs_data. Signed-off-by: Liao Chang <liaochang1@huawei.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
# a85ee640	23-Jan-2022	Kevin Hao <haokexin@gmail.com>	cpufreq: governor: Use kobject release() method to free dbs_data The struct dbs_data embeds a struct gov_attr_set and the struct gov_attr_set embeds a kobject. Since every kobject must have a release() method and we can't use kfree() to free it directly, so introduce cpufreq_dbs_data_release() to release the dbs_data via the kobject::release() method. This fixes the calltrace like below: ODEBUG: free active (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x34 WARNING: CPU: 12 PID: 810 at lib/debugobjects.c:505 debug_print_object+0xb8/0x100 Modules linked in: CPU: 12 PID: 810 Comm: sh Not tainted 5.16.0-next-20220120-yocto-standard+ #536 Hardware name: Marvell OcteonTX CN96XX board (DT) pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : debug_print_object+0xb8/0x100 lr : debug_print_object+0xb8/0x100 sp : ffff80001dfcf9a0 x29: ffff80001dfcf9a0 x28: 0000000000000001 x27: ffff0001464f0000 x26: 0000000000000000 x25: ffff8000090e3f00 x24: ffff80000af60210 x23: ffff8000094dfb78 x22: ffff8000090e3f00 x21: ffff0001080b7118 x20: ffff80000aeb2430 x19: ffff800009e8f5e0 x18: 0000000000000000 x17: 0000000000000002 x16: 00004d62e58be040 x15: 013590470523aff8 x14: ffff8000090e1828 x13: 0000000001359047 x12: 00000000f5257d14 x11: 0000000000040591 x10: 0000000066c1ffea x9 : ffff8000080d15e0 x8 : ffff80000a1765a8 x7 : 0000000000000000 x6 : 0000000000000001 x5 : ffff800009e8c000 x4 : ffff800009e8c760 x3 : 0000000000000000 x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff0001474ed040 Call trace: debug_print_object+0xb8/0x100 __debug_check_no_obj_freed+0x1d0/0x25c debug_check_no_obj_freed+0x24/0xa0 kfree+0x11c/0x440 cpufreq_dbs_governor_exit+0xa8/0xac cpufreq_exit_governor+0x44/0x90 cpufreq_set_policy+0x29c/0x570 store_scaling_governor+0x110/0x154 store+0xb0/0xe0 sysfs_kf_write+0x58/0x84 kernfs_fop_write_iter+0x12c/0x1c0 new_sync_write+0xf0/0x18c vfs_write+0x1cc/0x220 ksys_write+0x74/0x100 __arm64_sys_write+0x28/0x3c invoke_syscall.constprop.0+0x58/0xf0 do_el0_svc+0x70/0x170 el0_svc+0x54/0x190 el0t_64_sync_handler+0xa4/0x130 el0t_64_sync+0x1a0/0x1a4 irq event stamp: 189006 hardirqs last enabled at (189005): [<ffff8000080849d0>] finish_task_switch.isra.0+0xe0/0x2c0 hardirqs last disabled at (189006): [<ffff8000090667a4>] el1_dbg+0x24/0xa0 softirqs last enabled at (188966): [<ffff8000080106d0>] __do_softirq+0x4b0/0x6a0 softirqs last disabled at (188957): [<ffff80000804a618>] __irq_exit_rcu+0x108/0x1a4 [ rjw: Because can be freed by the gov_attr_set_put() in cpufreq_dbs_governor_exit() now, it is also necessary to put the invocation of the governor ->exit() callback into the new cpufreq_dbs_data_release() function. ] Fixes: c4435630361d ("cpufreq: governor: New sysfs show/store callbacks for governor tunables") Signed-off-by: Kevin Hao <haokexin@gmail.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
# 85750bcd	10-Mar-2022	Lianjie Zhang <zhanglianjie@uniontech.com>	cpufreq: unify show() and store() naming and use __ATTR_XX Usually, sysfs attributes have .show and .store and their naming convention is filename_show() and filename_store(). But in cpufreq the naming convention of these functions is show_filename() and store_filename() which prevents __ATTR_RW() and __ATTR_RO() from being used in there to simplify code. Accordingly, change the naming convention of the sysfs .show and .store methods in cpufreq to follow the one expected by __ATTR_RW() and __ATTR_RO() and use these macros in that code. Signed-off-by: Lianjie Zhang <zhanglianjie@uniontech.com> [ rjw: Subject and changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
# dd2e65f2	15-Jul-2020	Lee Jones <lee.jones@linaro.org>	cpufreq: cpufreq_governor: Demote store_sampling_rate() header to standard comment block There is no need for this to be denoted as kerneldoc. Fixes the following W=1 kernel build warning(s): drivers/cpufreq/cpufreq_governor.c:46: warning: Function parameter or member 'attr_set' not described in 'store_sampling_rate' drivers/cpufreq/cpufreq_governor.c:46: warning: Function parameter or member 'buf' not described in 'store_sampling_rate' drivers/cpufreq/cpufreq_governor.c:46: warning: Function parameter or member 'count' not described in 'store_sampling_rate' Signed-off-by: Lee Jones <lee.jones@linaro.org> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
# 5720821b	20-Nov-2019	Frederic Weisbecker <frederic@kernel.org>	cpufreq: Use vtime aware kcpustat accessors for user time We can now safely read user and guest kcpustat fields on nohz_full CPUs. Use the appropriate accessors. Reported-by: Yauheni Kaliuta <yauheni.kaliuta@redhat.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wanpeng Li <wanpengli@tencent.com> Link: https://lkml.kernel.org/r/20191121024430.19938-5-frederic@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
# d2912cb1	04-Jun-2019	Thomas Gleixner <tglx@linutronix.de>	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500 Based on 2 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license version 2 as published by the free software foundation this program is free software you can redistribute it and or modify it under the terms of the gnu general public license version 2 as published by the free software foundation # extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 4122 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Enrico Weigelt <info@metux.net> Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190604081206.933168790@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
# 4ebe36c9	30-Apr-2019	Viresh Kumar <viresh.kumar@linaro.org>	cpufreq: Fix kobject memleak Currently the error return path from kobject_init_and_add() is not followed by a call to kobject_put() - which means we are leaking the kobject. Fix it by adding a call to kobject_put() in the error path of kobject_init_and_add(). Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Tobin C. Harding <tobin@kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
# cc69b389	05-Nov-2018	Paul E. McKenney <paulmck@kernel.org>	cpufreq/cpufreq_governor: Replace synchronize_sched() with synchronize_rcu() Now that synchronize_rcu() waits for preempt-disable regions of code as well as RCU read-side critical sections, synchronize_sched() can be replaced by synchronize_rcu(). This commit therefore makes this change. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
# 2a3eb51e	14-Aug-2018	Henry Willard <henry.willard@oracle.com>	cpufreq: governor: Avoid accessing invalid governor_data If cppc_cpufreq.ko is deleted at the same time that tuned-adm is changing profiles, there is a small chance that a race can occur between cpufreq_dbs_governor_exit() and cpufreq_dbs_governor_limits() resulting in a system failure when the latter tries to use policy->governor_data that has been freed by the former. This patch uses gov_dbs_data_mutex to synchronize access. Fixes: e788892ba3cc (cpufreq: governor: Get rid of governor events) Signed-off-by: Henry Willard <henry.willard@oracle.com> [ rjw: Subject, minor white space adjustment ] Cc: 4.8+ <stable@vger.kernel.org> # 4.8+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
# 75920196	07-Jun-2018	Chen Yu <yu.c.chen@intel.com>	cpufreq: governors: Fix long idle detection logic in load calculation According to current code implementation, detecting the long idle period is done by checking if the interval between two adjacent utilization update handlers is long enough. Although this mechanism can detect if the idle period is long enough (no utilization hooks invoked during idle period), it might not cover a corner case: if the task has occupied the CPU for too long which causes no context switches during that period, then no utilization handler will be launched until this high prio task is scheduled out. As a result, the idle_periods field might be calculated incorrectly because it regards the 100% load as 0% and makes the conservative governor who uses this field confusing. Change the detection to compare the idle_time with sampling_rate directly. Reported-by: Artem S. Tashkinov <t.artem@mailcity.com> Signed-off-by: Chen Yu <yu.c.chen@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Cc: All applicable <stable@vger.kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
# 03639978	22-May-2018	Viresh Kumar <viresh.kumar@linaro.org>	cpufreq: Rename cpufreq_can_do_remote_dvfs() This routine checks if the CPU running this code belongs to the policy of the target CPU or if not, can it do remote DVFS for it remotely. But the current name of it implies as if it is only about doing remote updates. Rename it to make it more relevant. Suggested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
# 56026645	17-Dec-2017	Rafael J. Wysocki <rafael.j.wysocki@intel.com>	cpufreq: governor: Ensure sufficiently large sampling intervals After commit aa7519af450d (cpufreq: Use transition_delay_us for legacy governors as well) the sampling_rate field of struct dbs_data may be less than the tick period which causes dbs_update() to produce incorrect results, so make the code ensure that the value of that field will always be sufficiently large. Fixes: aa7519af450d (cpufreq: Use transition_delay_us for legacy governors as well) Reported-by: Andy Tang <andy.tang@nxp.com> Reported-by: Doug Smythies <dsmythies@telus.net> Tested-by: Andy Tang <andy.tang@nxp.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
# 674e7541	27-Jul-2017	Viresh Kumar <viresh.kumar@linaro.org>	sched: cpufreq: Allow remote cpufreq callbacks With Android UI and benchmarks the latency of cpufreq response to certain scheduling events can become very critical. Currently, callbacks into cpufreq governors are only made from the scheduler if the target CPU of the event is the same as the current CPU. This means there are certain situations where a target CPU may not run the cpufreq governor for some time. One testcase to show this behavior is where a task starts running on CPU0, then a new task is also spawned on CPU0 by a task on CPU1. If the system is configured such that the new tasks should receive maximum demand initially, this should result in CPU0 increasing frequency immediately. But because of the above mentioned limitation though, this does not occur. This patch updates the scheduler core to call the cpufreq callbacks for remote CPUs as well. The schedutil, ondemand and conservative governors are updated to process cpufreq utilization update hooks called for remote CPUs where the remote CPU is managed by the cpufreq policy of the local CPU. The intel_pstate driver is updated to always reject remote callbacks. This is tested with couple of usecases (Android: hackbench, recentfling, galleryfling, vellamo, Ubuntu: hackbench) on ARM hikey board (64 bit octa-core, single policy). Only galleryfling showed minor improvements, while others didn't had much deviation. The reason being that this patch only targets a corner case, where following are required to be true to improve performance and that doesn't happen too often with these tests: - Task is migrated to another CPU. - The task has high demand, and should take the target CPU to higher OPPs. - And the target CPU doesn't call into the cpufreq governor until the next tick. Based on initial work from Steve Muckle. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Acked-by: Saravana Kannan <skannan@codeaurora.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
# aa7519af	19-Jul-2017	Viresh Kumar <viresh.kumar@linaro.org>	cpufreq: Use transition_delay_us for legacy governors as well The policy->transition_delay_us field is used only by the schedutil governor currently, and this field describes how fast the driver wants the cpufreq governor to change CPUs frequency. It should rather be a common thing across all governors, as it doesn't have any schedutil dependency here. Create a new helper cpufreq_policy_transition_delay_us() to get the transition delay across all governors. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
# 2d045036	19-Jul-2017	Viresh Kumar <viresh.kumar@linaro.org>	cpufreq: governor: Drop min_sampling_rate The cpufreq core and governors aren't supposed to set a limit on how fast we want to try changing the frequency. This is currently done for the legacy governors with help of min_sampling_rate. At worst, we may end up setting the sampling rate to a value lower than the rate at which frequency can be changed and then one of the CPUs in the policy will be only changing frequency for ever. But that is something for the user to decide and there is no need to have special handling for such cases in the core. Leave it for the user to figure out. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
# 55687da1	08-Feb-2017	Ingo Molnar <mingo@kernel.org>	sched/headers: Prepare for new header dependencies before moving code to <linux/sched/cpufreq.h> We are going to split <linux/sched/cpufreq.h> out of <linux/sched.h>, which will have to be picked up from other headers and a couple of .c files. Create a trivial placeholder <linux/sched/cpufreq.h> file that just maps to <linux/sched.h> to make this patch obviously correct and bisectable. Include the new header in the files that are going to need it. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
# 7fb1327e	30-Jan-2017	Frederic Weisbecker <fweisbec@gmail.com>	sched/cputime: Convert kcpustat to nsecs Kernel CPU stats are stored in cputime_t which is an architecture defined type, and hence a bit opaque and requiring accessors and mutators for any operation. Converting them to nsecs simplifies the code and is one step toward the removal of cputime_t in the core code. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Stanislaw Gruszka <sgruszka@redhat.com> Cc: Wanpeng Li <wanpeng.li@hotmail.com> Link: http://lkml.kernel.org/r/1485832191-26889-4-git-send-email-fweisbec@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
# 00bfe058	16-Nov-2016	Stratos Karafotis <stratosk@semaphore.gr>	cpufreq: conservative: Decrease frequency faster for deferred updates Conservative governor changes the CPU frequency in steps. That means that if a CPU runs at max frequency, it will need several sampling periods to return to min frequency when the workload is finished. If the update function that calculates the load and target frequency is deferred, the governor might need even more time to decrease the frequency. This may have impact to power consumption and after all conservative should decrease the frequency if there is no workload at every sampling rate. To resolve the above issue calculate the number of sampling periods that the update is deferred. Considering that for each sampling period conservative should drop the frequency by a freq_step because the CPU was idle apply the proper subtraction to requested frequency. Below, the kernel trace with and without this patch. First an intensive workload is applied on a specific CPU. Then the workload is removed and the CPU goes to idle. WITHOUT <idle>-0 [007] dN.. 620.329153: cpu_idle: state=4294967295 cpu_id=7 kworker/7:2-556 [007] .... 620.350857: cpu_frequency: state=1700000 cpu_id=7 kworker/7:2-556 [007] .... 620.370856: cpu_frequency: state=1900000 cpu_id=7 kworker/7:2-556 [007] .... 620.390854: cpu_frequency: state=2100000 cpu_id=7 kworker/7:2-556 [007] .... 620.411853: cpu_frequency: state=2200000 cpu_id=7 kworker/7:2-556 [007] .... 620.432854: cpu_frequency: state=2400000 cpu_id=7 kworker/7:2-556 [007] .... 620.453854: cpu_frequency: state=2600000 cpu_id=7 kworker/7:2-556 [007] .... 620.494856: cpu_frequency: state=2900000 cpu_id=7 kworker/7:2-556 [007] .... 620.515856: cpu_frequency: state=3100000 cpu_id=7 kworker/7:2-556 [007] .... 620.536858: cpu_frequency: state=3300000 cpu_id=7 kworker/7:2-556 [007] .... 620.557857: cpu_frequency: state=3401000 cpu_id=7 <idle>-0 [007] d... 669.591363: cpu_idle: state=4 cpu_id=7 <idle>-0 [007] d... 669.591939: cpu_idle: state=4294967295 cpu_id=7 <idle>-0 [007] d... 669.591980: cpu_idle: state=4 cpu_id=7 <idle>-0 [007] dN.. 669.591989: cpu_idle: state=4294967295 cpu_id=7 ... <idle>-0 [007] d... 670.201224: cpu_idle: state=4 cpu_id=7 <idle>-0 [007] d... 670.221975: cpu_idle: state=4294967295 cpu_id=7 kworker/7:2-556 [007] .... 670.222016: cpu_frequency: state=3300000 cpu_id=7 <idle>-0 [007] d... 670.222026: cpu_idle: state=4 cpu_id=7 <idle>-0 [007] d... 670.234964: cpu_idle: state=4294967295 cpu_id=7 ... <idle>-0 [007] d... 670.801251: cpu_idle: state=4 cpu_id=7 <idle>-0 [007] d... 671.236046: cpu_idle: state=4294967295 cpu_id=7 kworker/7:2-556 [007] .... 671.236073: cpu_frequency: state=3100000 cpu_id=7 <idle>-0 [007] d... 671.236112: cpu_idle: state=4 cpu_id=7 <idle>-0 [007] d... 671.393437: cpu_idle: state=4294967295 cpu_id=7 ... <idle>-0 [007] d... 671.401277: cpu_idle: state=4 cpu_id=7 <idle>-0 [007] d... 671.404083: cpu_idle: state=4294967295 cpu_id=7 kworker/7:2-556 [007] .... 671.404111: cpu_frequency: state=2900000 cpu_id=7 <idle>-0 [007] d... 671.404125: cpu_idle: state=4 cpu_id=7 <idle>-0 [007] d... 671.404974: cpu_idle: state=4294967295 cpu_id=7 ... <idle>-0 [007] d... 671.501180: cpu_idle: state=4 cpu_id=7 <idle>-0 [007] d... 671.995414: cpu_idle: state=4294967295 cpu_id=7 kworker/7:2-556 [007] .... 671.995459: cpu_frequency: state=2800000 cpu_id=7 <idle>-0 [007] d... 671.995469: cpu_idle: state=4 cpu_id=7 <idle>-0 [007] d... 671.996287: cpu_idle: state=4294967295 cpu_id=7 ... <idle>-0 [007] d... 672.001305: cpu_idle: state=4 cpu_id=7 <idle>-0 [007] d... 672.078374: cpu_idle: state=4294967295 cpu_id=7 kworker/7:2-556 [007] .... 672.078410: cpu_frequency: state=2600000 cpu_id=7 <idle>-0 [007] d... 672.078419: cpu_idle: state=4 cpu_id=7 <idle>-0 [007] d... 672.158020: cpu_idle: state=4294967295 cpu_id=7 kworker/7:2-556 [007] .... 672.158040: cpu_frequency: state=2400000 cpu_id=7 <idle>-0 [007] d... 672.158044: cpu_idle: state=4 cpu_id=7 <idle>-0 [007] d... 672.160038: cpu_idle: state=4294967295 cpu_id=7 ... <idle>-0 [007] d... 672.234557: cpu_idle: state=4 cpu_id=7 <idle>-0 [007] d... 672.237121: cpu_idle: state=4294967295 cpu_id=7 kworker/7:2-556 [007] .... 672.237174: cpu_frequency: state=2100000 cpu_id=7 <idle>-0 [007] d... 672.237186: cpu_idle: state=4 cpu_id=7 <idle>-0 [007] d... 672.237778: cpu_idle: state=4294967295 cpu_id=7 ... <idle>-0 [007] d... 672.267902: cpu_idle: state=4 cpu_id=7 <idle>-0 [007] d... 672.269860: cpu_idle: state=4294967295 cpu_id=7 kworker/7:2-556 [007] .... 672.269906: cpu_frequency: state=1900000 cpu_id=7 <idle>-0 [007] d... 672.269914: cpu_idle: state=4 cpu_id=7 <idle>-0 [007] d... 672.271902: cpu_idle: state=4294967295 cpu_id=7 ... <idle>-0 [007] d... 672.751342: cpu_idle: state=4 cpu_id=7 <idle>-0 [007] d... 672.823056: cpu_idle: state=4294967295 cpu_id=7 kworker/7:2-556 [007] .... 672.823095: cpu_frequency: state=1600000 cpu_id=7 WITH <idle>-0 [007] dN.. 4380.928009: cpu_idle: state=4294967295 cpu_id=7 kworker/7:2-399 [007] .... 4380.949767: cpu_frequency: state=2000000 cpu_id=7 kworker/7:2-399 [007] .... 4380.969765: cpu_frequency: state=2200000 cpu_id=7 kworker/7:2-399 [007] .... 4381.009766: cpu_frequency: state=2500000 cpu_id=7 kworker/7:2-399 [007] .... 4381.029767: cpu_frequency: state=2600000 cpu_id=7 kworker/7:2-399 [007] .... 4381.049769: cpu_frequency: state=2800000 cpu_id=7 kworker/7:2-399 [007] .... 4381.069769: cpu_frequency: state=3000000 cpu_id=7 kworker/7:2-399 [007] .... 4381.089771: cpu_frequency: state=3100000 cpu_id=7 kworker/7:2-399 [007] .... 4381.109772: cpu_frequency: state=3400000 cpu_id=7 kworker/7:2-399 [007] .... 4381.129773: cpu_frequency: state=3401000 cpu_id=7 <idle>-0 [007] d... 4428.226159: cpu_idle: state=1 cpu_id=7 <idle>-0 [007] d... 4428.226176: cpu_idle: state=4294967295 cpu_id=7 <idle>-0 [007] d... 4428.226181: cpu_idle: state=4 cpu_id=7 <idle>-0 [007] d... 4428.227177: cpu_idle: state=4294967295 cpu_id=7 ... <idle>-0 [007] d... 4428.551640: cpu_idle: state=4 cpu_id=7 <idle>-0 [007] d... 4428.649239: cpu_idle: state=4294967295 cpu_id=7 kworker/7:2-399 [007] .... 4428.649268: cpu_frequency: state=2800000 cpu_id=7 <idle>-0 [007] d... 4428.649278: cpu_idle: state=4 cpu_id=7 <idle>-0 [007] d... 4428.689856: cpu_idle: state=4294967295 cpu_id=7 ... <idle>-0 [007] d... 4428.799542: cpu_idle: state=4 cpu_id=7 <idle>-0 [007] d... 4428.801683: cpu_idle: state=4294967295 cpu_id=7 kworker/7:2-399 [007] .... 4428.801748: cpu_frequency: state=1700000 cpu_id=7 <idle>-0 [007] d... 4428.801761: cpu_idle: state=4 cpu_id=7 <idle>-0 [007] d... 4428.806545: cpu_idle: state=4294967295 cpu_id=7 ... <idle>-0 [007] d... 4429.051880: cpu_idle: state=4 cpu_id=7 <idle>-0 [007] d... 4429.086240: cpu_idle: state=4294967295 cpu_id=7 kworker/7:2-399 [007] .... 4429.086293: cpu_frequency: state=1600000 cpu_id=7 Without the patch the CPU dropped to min frequency after 3.2s With the patch applied the CPU dropped to min frequency after 0.86s Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
# 26f0dbc9	07-Nov-2016	Viresh Kumar <viresh.kumar@linaro.org>	cpufreq: governor: Don't use 'timer' keyword The earlier implementation of governors used background timers and so functions, mutex, etc had 'timer' keyword in their names. But that's not true anymore. Replace 'timer' with 'update', as those functions, variables are based around updates to frequency. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
# 58919e83	16-Aug-2016	Rafael J. Wysocki <rafael.j.wysocki@intel.com>	cpufreq / sched: Pass flags to cpufreq_update_util() It is useful to know the reason why cpufreq_update_util() has just been called and that can be passed as flags to cpufreq_update_util() and to the ->func() callback in struct update_util_data. However, doing that in addition to passing the util and max arguments they already take would be clumsy, so avoid it. Instead, use the observation that the schedutil governor is part of the scheduler proper, so it can access scheduler data directly. This allows the util and max arguments of cpufreq_update_util() and the ->func() callback in struct update_util_data to be replaced with a flags one, but schedutil has to be modified to follow. Thus make the schedutil governor obtain the CFS utilization information from the scheduler and use the "RT" and "DL" flags instead of the special utilization value of ULONG_MAX to track updates from the RT and DL sched classes. Make it non-modular too to avoid having to export scheduler variables to modules at large. Next, update all of the other users of cpufreq_update_util() and the ->func() callback in struct update_util_data accordingly. Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
# f6709b8a	08-Jun-2016	Rafael J. Wysocki <rafael.j.wysocki@intel.com>	cpufreq: governor: Drop gov_cancel_work() There's no reason for gov_cancel_work() to exist at all, as it only has one caller and the only thing done by that caller is to invoke gov_cancel_work(). Accordingly, drop gov_cancel_work() and move its contents to the caller. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
# 9a15fb2c	18-May-2016	Rafael J. Wysocki <rafael.j.wysocki@intel.com>	cpufreq: Drop the 'initialized' field from struct cpufreq_governor The 'initialized' field in struct cpufreq_governor is only used by the conservative governor (as a usage counter) and the way that happens is far from straightforward and arguably incorrect. Namely, the value of 'initialized' is checked by cpufreq_dbs_governor_init() and cpufreq_dbs_governor_exit() and the results of those checks are passed (as the second argument) to the ->init() and ->exit() callbacks in struct dbs_governor. Those callbacks are only implemented by the ondemand and conservative governors and ondemand doesn't use their second argument at all. In turn, the conservative governor uses it to decide whether or not to either register or unregister a transition notifier. That whole mechanism is not only unnecessarily convoluted, but also racy, because the 'initialized' field of struct cpufreq_governor is updated in cpufreq_init_governor() and cpufreq_exit_governor() under policy->rwsem which doesn't help if one of these functions is run twice in parallel for different policies (which isn't impossible in principle), for example. Instead of it, add a proper usage counter to the conservative governor and update it from cs_init() and cs_exit() which is guaranteed to be non-racy, as those functions are only called under gov_dbs_data_mutex which is global. With that in place, drop the 'initialized' field from struct cpufreq_governor as it is not used any more. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
# bf2be2de	18-May-2016	Viresh Kumar <viresh.kumar@linaro.org>	cpufreq: governor: Create cpufreq_policy_apply_limits() Create a new helper to avoid code duplication across governors. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
# 666f4ccc	18-May-2016	Viresh Kumar <viresh.kumar@linaro.org>	cpufreq: governor: Remove unnecessary bits from print message pr_*() helpers already prefix the print messages with "cpufreq_governor:" and similar details aren't required in the actual message. For example, the print message getting fixed looks like this before this patch: cpufreq_governor: cpufreq: Governor initialization failed (dbs_data kobject init error 0) Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
# e788892b	02-Jun-2016	Rafael J. Wysocki <rafael.j.wysocki@intel.com>	cpufreq: governor: Get rid of governor events The design of the cpufreq governor API is not very straightforward, as struct cpufreq_governor provides only one callback to be invoked from different code paths for different purposes. The purpose it is invoked for is determined by its second "event" argument, causing it to act as a "callback multiplexer" of sorts. Unfortunately, that leads to extra complexity in governors, some of which implement the ->governor() callback as a switch statement that simply checks the event argument and invokes a separate function to handle that specific event. That extra complexity can be eliminated by replacing the all-purpose ->governor() callback with a family of callbacks to carry out specific governor operations: initialization and exit, start and stop and policy limits updates. That also turns out to reduce the code size too, so do it. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
# 9485e4ca	05-May-2016	Rafael J. Wysocki <rafael.j.wysocki@intel.com>	cpufreq: governor: Fix handling of special cases in dbs_update() As reported in KBZ 69821: "With CONFIG_HZ_PERIODIC=y cpu stays at the lowest frequcency 800MHz even if usage goes to 100%, frequency does not scale up, the governor in use is ondemand. Neither works conservative. Performance and userspace governors work as expected. With CONFIG_NO_HZ_IDLE or CONFIG_NO_HZ_FULL cpu scales up with ondemand as expected." Analysis carried out by Chen Yu leads to the conclusion that the observed issue is due to idle_time in dbs_update() representing a negative number in which case the function will return 0 as the load (unless load is greater than 0 for another CPU sharing the policy), although that need not be the right choice. Indeed, idle_time representing a negative number means that during the last sampling interval the CPU was almost 100% busy on the rough average, so 100 should be returned as the load in that case. Modify the code accordingly and rearrange it to clarify the handling of all of the special cases in it. While at it, also avoid returning zero as the load if time_elapsed is 0 (it doesn't really make sense to return 0 then). Link: https://bugzilla.kernel.org/show_bug.cgi?id=69821 Tested-by: Chen Yu <yu.c.chen@intel.com> Tested-by: Timo Valtoaho <timo.valtoaho@gmail.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
# b4f4b4b3	27-Apr-2016	Rafael J. Wysocki <rafael.j.wysocki@intel.com>	cpufreq: governor: Change confusing struct field and variable names The name of the prev_cpu_wall field in struct cpu_dbs_info is confusing, because it doesn't represent wall time, but the previous update time as returned by get_cpu_idle_time() (that may be the current value of jiffies_64 in some cases, for example). Moreover, the names of some related variables in dbs_update() take that confusion further. Rename all of those things to make their names reflect the purpose more accurately. While at it, drop unnecessary parens from one of the updated expressions. No functional changes. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Acked-by: Chen Yu <yu.c.chen@intel.com>
# ba1ca654	25-Apr-2016	Rafael J. Wysocki <rafael.j.wysocki@intel.com>	cpufreq: governor: Fix prev_load initialization in cpufreq_governor_start() The way cpufreq_governor_start() initializes j_cdbs->prev_load is questionable. First off, j_cdbs->prev_cpu_wall used as a denominator in the computation may be zero. The case this happens is when get_cpu_idle_time_us() returns -1 and get_cpu_idle_time_jiffy() used to return that number is called exactly at the jiffies_64 wrap time. It is rather hard to trigger that error, but it is not impossible and it will just crash the kernel then. Second, j_cdbs->prev_load is computed as the average load during the entire time since the system started and it may not reflect the load in the previous sampling period (as it is expected to). That doesn't play well with the way dbs_update() uses that value. Namely, if the update time delta (wall_time) happens do be greater than twice the sampling rate on the first invocation of it, the initial value of j_cdbs->prev_load (which may be completely off) will be returned to the caller as the current load (unless it is equal to zero and unless another CPU sharing the same policy object has a greater load value). For this reason, notice that the prev_load field of struct cpu_dbs_info is only used by dbs_update() and only in that one place, so if cpufreq_governor_start() is modified to always initialize it to 0, it will make dbs_update() always compute the actual load first time it checks the update time delta against the doubled sampling rate (after initialization) and there won't be any side effects of it. Consequently, modify cpufreq_governor_start() as described. Fixes: 18b46abd0009 (cpufreq: governor: Be friendly towards latency-sensitive bursty workloads) Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
# 94862a62	21-Apr-2016	Rafael J. Wysocki <rafael.j.wysocki@intel.com>	Revert "cpufreq: governor: Fix negative idle_time when configured with CONFIG_HZ_PERIODIC" Revert commit 0df35026c6a5 (cpufreq: governor: Fix negative idle_time when configured with CONFIG_HZ_PERIODIC) that introduced a regression by causing the ondemand cpufreq governor to misbehave for CONFIG_TICK_CPU_ACCOUNTING unset (the frequency goes up to the max at one point and stays there indefinitely). The revert takes subsequent modifications of the code in question into account. Fixes: 0df35026c6a5 (cpufreq: governor: Fix negative idle_time when configured with CONFIG_HZ_PERIODIC) Link: https://bugzilla.kernel.org/show_bug.cgi?id=115261 Reported-and-tested-by: Timo Valtoaho <timo.valtoaho@gmail.com> Cc: 4.5+ <stable@vger.kernel.org> # 4.5+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
# 2d0c58ad	21-Mar-2016	Rafael J. Wysocki <rafael.j.wysocki@intel.com>	cpufreq: governor: Move abstract gov_attr_set code to seperate file Move abstract code related to struct gov_attr_set to a separate (new) file so it can be shared with (future) goverernors that won't share more code with "ondemand" and "conservative". No intentional functional changes. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
# 0dd3c1d6	21-Mar-2016	Rafael J. Wysocki <rafael.j.wysocki@intel.com>	cpufreq: governor: New data type for management part of dbs_data In addition to fields representing governor tunables, struct dbs_data contains some fields needed for the management of objects of that type. As it turns out, that part of struct dbs_data may be shared with (future) governors that won't use the common code used by "ondemand" and "conservative", so move it to a separate struct type and modify the code using struct dbs_data to follow. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
# 0bed612b	01-Apr-2016	Rafael J. Wysocki <rafael.j.wysocki@intel.com>	cpufreq: sched: Helpers to add and remove update_util hooks Replace the single helper for adding and removing cpufreq utilization update hooks, cpufreq_set_update_util_data(), with a pair of helpers, cpufreq_add_update_util_hook() and cpufreq_remove_update_util_hook(), and modify the users of cpufreq_set_update_util_data() accordingly. With the new helpers, the code using them doesn't need to worry about the internals of struct update_util_data and in particular it doesn't need to worry about populating the func field in it properly upfront. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
# 539a4c42	21-Mar-2016	Rafael J. Wysocki <rafael.j.wysocki@intel.com>	cpufreq: governor: Always schedule work on the CPU running update Modify dbs_irq_work() to always schedule the process-context work on the current CPU which also ran the dbs_update_util_handler() that the irq_work being handled came from. This causes the entire frequency update handling (involving the "ondemand" or "conservative" governors) to be carried out by the CPU whose frequency is to be updated and reduces the overall amount of inter-CPU noise related to cpufreq. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
# adaf9fcd	10-Mar-2016	Rafael J. Wysocki <rafael.j.wysocki@intel.com>	cpufreq: Move scheduler-related code to the sched directory Create cpufreq.c under kernel/sched/ and move the cpufreq code related to the scheduler to that file and to sched.h. Redefine cpufreq_update_util() as a static inline function to avoid function calls at its call sites in the scheduler code (as suggested by Peter Zijlstra). Also move the definition of struct update_util_data and declaration of cpufreq_set_update_util_data() from include/linux/cpufreq.h to include/linux/sched.h. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
# 08f511fd	03-Mar-2016	Rafael J. Wysocki <rafael.j.wysocki@intel.com>	cpufreq: Reduce cpufreq_update_util() overhead a bit Use the observation that cpufreq_update_util() is only called by the scheduler with rq->lock held, so the callers of cpufreq_set_update_util_data() can use synchronize_sched() instead of synchronize_rcu() to wait for cpufreq_update_util() to complete. Moreover, if they are updated to do that, rcu_read_(un)lock() calls in cpufreq_update_util() might be replaced with rcu_read_(un)lock_sched(), respectively, but those aren't really necessary, because the scheduler calls that function from RCU-sched read-side critical sections already. In addition to that, if cpufreq_set_update_util_data() checks the func field in the struct update_util_data before setting the per-CPU pointer to it, the data->func check may be dropped from cpufreq_update_util() as well. Make the above changes to reduce the overhead from cpufreq_update_util() in the scheduler paths invoking it and to make the cleanup after removing its callbacks less heavy-weight somewhat. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
# f737236b	22-Feb-2016	Viresh Kumar <viresh.kumar@linaro.org>	cpufreq: governor: Drop unnecessary checks from show() and store() The show() and store() routines in the cpufreq-governor core don't need to check if the struct governor_attr they want to use really provides the callbacks they need as expected (if that's not the case, it means a bug in the code anyway), so change them to avoid doing that. Also change the error value to -EBUSY, if the governor is getting removed and we aren't allowed to store any more changes. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
# 27de3482	22-Feb-2016	Rafael J. Wysocki <rafael.j.wysocki@intel.com>	cpufreq: governor: Fix race in dbs_update_util_handler() There is a scenario that may lead to undesired results in dbs_update_util_handler(). Namely, if two CPUs sharing a policy enter the funtion at the same time, pass the sample delay check and then one of them is stalled until dbs_work_handler() (queued up by the other CPU) clears the work counter, it may update the work counter and queue up another work item prematurely. To prevent that from happening, use the observation that the CPU queuing up a work item in dbs_update_util_handler() updates the last sample time. This means that if another CPU was stalling after passing the sample delay check and now successfully updated the work counter as a result of the race described above, it will see the new value of the last sample time which is different from what it used in the sample delay check before. If that happens, the sample delay check passed previously is not valid any more, so the CPU should not continue. Fixes: f17cbb53783c (cpufreq: governor: Avoid atomic operations in hot paths) Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
# 94ab5e03	20-Feb-2016	Rafael J. Wysocki <rafael.j.wysocki@intel.com>	cpufreq: governor: Make gov_set_update_util() static The gov_set_update_util() routine is only used internally by the common governor code and it doesn't need to be exported, so make it static. No functional changes. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
# 1112e9d8	20-Feb-2016	Rafael J. Wysocki <rafael.j.wysocki@intel.com>	cpufreq: governor: Narrow down the dbs_data_mutex coverage Since cpufreq_governor_dbs() is now always called with policy->rwsem held, it cannot be executed twice in parallel for the same policy. Thus it is not necessary to hold dbs_data_mutex around the invocations of cpufreq_governor_start/stop/limits() from it as those functions never modify any data that can be shared between different policies. However, cpufreq_governor_dbs() may be executed twice in parallal for different policies using the same gov->gdbs_data object and dbs_data_mutex is still necessary to protect that object against concurrent updates. For this reason, narrow down the dbs_data_mutex locking to cpufreq_governor_init/exit() where it is needed and rename the mutex to gov_dbs_data_mutex to reflect its purpose. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
# e3f5ed93	17-Feb-2016	Rafael J. Wysocki <rafael.j.wysocki@intel.com>	cpufreq: governor: Make dbs_data_mutex static That mutex is only used by cpufreq_governor_dbs() and it doesn't need to be exported to modules, so make it static and drop the export incantation. No functional changes. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>