#
5f5cc9ed |
|
10-Jan-2023 |
Yair Podemsky <ypodemsk@redhat.com> |
x86/aperfmperf: Erase stale arch_freq_scale values when disabling frequency invariance readings Once disable_freq_invariance_work is called the scale_freq_tick function will not compute or update the arch_freq_scale values. However the scheduler will still read these values and use them. The result is that the scheduler might perform unfair decisions based on stale values. This patch adds the step of setting the arch_freq_scale values for all cpus to the default (max) value SCHED_CAPACITY_SCALE, Once all cpus have the same arch_freq_scale value the scaling is meaningless. Signed-off-by: Yair Podemsky <ypodemsk@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20230110160206.75912-1-ypodemsk@redhat.com
|
#
fb4c77c2 |
|
25-Apr-2022 |
Thomas Gleixner <tglx@linutronix.de> |
x86/aperfmperf: Integrate the fallback code from show_cpuinfo() Due to the avoidance of IPIs to idle CPUs arch_freq_get_on_cpu() can return 0 when the last sample was too long ago. show_cpuinfo() has a fallback to cpufreq_quick_get() and if that fails to return cpu_khz, but the readout code for the per CPU scaling frequency in sysfs does not. Move that fallback into arch_freq_get_on_cpu() so the behaviour is the same when reading /proc/cpuinfo and /sys/..../cur_scaling_freq. Suggested-by: "Rafael J. Wysocki" <rafael@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Doug Smythies <dsmythies@telus.net> Link: https://lore.kernel.org/r/87pml5180p.ffs@tglx
|
#
f3eca381 |
|
15-Apr-2022 |
Thomas Gleixner <tglx@linutronix.de> |
x86/aperfmperf: Replace arch_freq_get_on_cpu() Reading the current CPU frequency from /sys/..../scaling_cur_freq involves in the worst case two IPIs due to the ad hoc sampling. The frequency invariance infrastructure provides the APERF/MPERF samples already. Utilize them and consolidate this with the /proc/cpuinfo readout. The sample is considered valid for 20ms. So for idle or isolated NOHZ full CPUs the function returns 0, which is matching the previous behaviour. The resulting text size vs. the original APERF/MPERF plus the separate frequency invariance code: text: 2411 -> 723 init.text: 0 -> 767 Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Paul E. McKenney <paulmck@kernel.org> Link: https://lore.kernel.org/r/20220415161206.934040006@linutronix.de
|
#
7d84c1eb |
|
15-Apr-2022 |
Thomas Gleixner <tglx@linutronix.de> |
x86/aperfmperf: Replace aperfmperf_get_khz() The frequency invariance infrastructure provides the APERF/MPERF samples already. Utilize them for the cpu frequency display in /proc/cpuinfo. The sample is considered valid for 20ms. So for idle or isolated NOHZ full CPUs the function returns 0, which is matching the previous behaviour. This gets rid of the mass IPIs and a delay of 20ms for stabilizing observed by Eric when reading /proc/cpuinfo. Reported-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Paul E. McKenney <paulmck@kernel.org> Link: https://lore.kernel.org/r/20220415161206.875029458@linutronix.de
|
#
cd8c0e14 |
|
15-Apr-2022 |
Thomas Gleixner <tglx@linutronix.de> |
x86/aperfmperf: Store aperf/mperf data for cpu frequency reads Now that the MSR readout is unconditional, store the results in the per CPU data structure along with a jiffies timestamp for the CPU frequency readout code. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Paul E. McKenney <paulmck@kernel.org> Link: https://lore.kernel.org/r/20220415161206.817702355@linutronix.de
|
#
bb6e89df |
|
15-Apr-2022 |
Thomas Gleixner <tglx@linutronix.de> |
x86/aperfmperf: Make parts of the frequency invariance code unconditional The frequency invariance support is currently limited to x86/64 and SMP, which is the vast majority of machines. arch_scale_freq_tick() is called every tick on all CPUs and reads the APERF and MPERF MSRs. The CPU frequency getters function do the same via dedicated IPIs. While it could be argued that on systems where frequency invariance support is disabled (32bit, !SMP) the per tick read of the APERF and MPERF MSRs can be avoided, it does not make sense to keep the extra code and the resulting runtime issues of mass IPIs around. As a first step split out the non frequency invariance specific initialization code and the read MSR portion of arch_scale_freq_tick(). The rest of the code is still conditional and guarded with a static key. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Paul E. McKenney <paulmck@kernel.org> Link: https://lore.kernel.org/r/20220415161206.761988704@linutronix.de
|
#
73a5fa7d |
|
15-Apr-2022 |
Thomas Gleixner <tglx@linutronix.de> |
x86/aperfmperf: Restructure arch_scale_freq_tick() Preparation for sharing code with the CPU frequency portion of the aperf/mperf code. No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Paul E. McKenney <paulmck@kernel.org> Link: https://lore.kernel.org/r/20220415161206.706185092@linutronix.de
|
#
24620d94 |
|
15-Apr-2022 |
Thomas Gleixner <tglx@linutronix.de> |
x86/aperfmperf: Put frequency invariance aperf/mperf data into a struct Preparation for sharing code with the CPU frequency portion of the aperf/mperf code. No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Paul E. McKenney <paulmck@kernel.org> Link: https://lore.kernel.org/r/20220415161206.648485667@linutronix.de
|
#
0dfaf3f6 |
|
15-Apr-2022 |
Thomas Gleixner <tglx@linutronix.de> |
x86/aperfmperf: Untangle Intel and AMD frequency invariance init AMD boot CPU initialization happens late via ACPI/CPPC which prevents the Intel parts from being marked __init. Split out the common code and provide a dedicated interface for the AMD initialization and mark the Intel specific code and data __init. The remaining text size is almost cut in half: text: 2614 -> 1350 init.text: 0 -> 786 Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Paul E. McKenney <paulmck@kernel.org> Link: https://lore.kernel.org/r/20220415161206.592465719@linutronix.de
|
#
138a7f9c |
|
15-Apr-2022 |
Thomas Gleixner <tglx@linutronix.de> |
x86/aperfmperf: Separate AP/BP frequency invariance init This code is convoluted and because it can be invoked post init via the ACPI/CPPC code, all of the initialization functionality is built in instead of being part of init text and init data. As a first step create separate calls for the boot and the application processors. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Paul E. McKenney <paulmck@kernel.org> Link: https://lore.kernel.org/r/20220415161206.536733494@linutronix.de
|
#
55cb0b70 |
|
15-Apr-2022 |
Thomas Gleixner <tglx@linutronix.de> |
x86/smp: Move APERF/MPERF code where it belongs as this can share code with the preexisting APERF/MPERF code. No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Paul E. McKenney <paulmck@kernel.org> Link: https://lore.kernel.org/r/20220415161206.478362457@linutronix.de
|
#
6d108c96 |
|
15-Apr-2022 |
Thomas Gleixner <tglx@linutronix.de> |
x86/aperfmperf: Dont wake idle CPUs in arch_freq_get_on_cpu() aperfmperf_get_khz() already excludes idle CPUs from APERF/MPERF sampling and that's a reasonable decision. There is no point in sending up to two IPIs to an idle CPU just because someone reads a sysfs file. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Paul E. McKenney <paulmck@kernel.org> Link: https://lore.kernel.org/r/20220415161206.419880163@linutronix.de
|
#
04d4e665 |
|
07-Feb-2022 |
Frederic Weisbecker <frederic@kernel.org> |
sched/isolation: Use single feature type while referring to housekeeping cpumask Refer to housekeeping APIs using single feature types instead of flags. This prevents from passing multiple isolation features at once to housekeeping interfaces, which soon won't be possible anymore as each isolation features will have their own cpumask. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Juri Lelli <juri.lelli@redhat.com> Reviewed-by: Phil Auld <pauld@redhat.com> Link: https://lore.kernel.org/r/20220207155910.527133-5-frederic@kernel.org
|
#
3fcd6a23 |
|
03-Sep-2020 |
Paul E. McKenney <paulmck@kernel.org> |
x86/cpu: Avoid cpuinfo-induced IPIing of idle CPUs Currently, accessing /proc/cpuinfo sends IPIs to idle CPUs in order to learn their clock frequency. Which is a bit strange, given that waking them from idle likely significantly changes their clock frequency. This commit therefore avoids sending /proc/cpuinfo-induced IPIs to idle CPUs. [ paulmck: Also check for idle in arch_freq_prepare_all(). ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: <x86@kernel.org>
|
#
f4deaf90 |
|
02-Sep-2020 |
Paul E. McKenney <paulmck@kernel.org> |
x86/cpu: Avoid cpuinfo-induced IPI pileups The aperfmperf_snapshot_cpu() function is invoked upon access to /proc/cpuinfo, and it does do an early exit if the specified CPU has recently done a snapshot. Unfortunately, the indication that a snapshot has been completed is set in an IPI handler, and the execution of this handler can be delayed by any number of unfortunate events. This means that a system that starts a number of applications, each of which parses /proc/cpuinfo, can suffer from an smp_call_function_single() storm, especially given that each access to /proc/cpuinfo invokes smp_call_function_single() for all CPUs. Please note that this is not theoretical speculation. Note also that one CPU's pending IPI serves all requests, so there is no point in ever having more than one IPI pending to a given CPU. This commit therefore suppresses duplicate IPIs to a given CPU via a new ->scfpending field in the aperfmperf_sample structure. This field is set to the value one if an IPI is pending to the corresponding CPU and to zero otherwise. The aperfmperf_snapshot_cpu() function uses atomic_xchg() to set this field to the value one and sample the old value. If this function's "wait" parameter is zero, smp_call_function_single() is called only if the old value of the ->scfpending field was zero. The IPI handler uses atomic_set_release() to set this new field to zero just before returning, so that the prior stores into the aperfmperf_sample structure are seen by future requests that get to the atomic_xchg(). Future requests that pass the elapsed-time check are ordered by the fact that on x86 loads act as acquire loads, just as was the case prior to this change. The return value is based off of the age of the prior snapshot, just as before. Reported-by: Dave Jones <davej@codemonkey.org.uk> [ paulmck: Allow /proc/cpuinfo to take advantage of arch_freq_get_on_cpu(). ] [ paulmck: Add comment on memory barrier. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: <x86@kernel.org>
|
#
cc9e303c |
|
15-May-2019 |
Konstantin Khlebnikov <koct9i@gmail.com> |
x86/cpu: Disable frequency requests via aperfmperf IPI for nohz_full CPUs Since commit 7d5905dc14a8 ("x86 / CPU: Always show current CPU frequency in /proc/cpuinfo") open and read of /proc/cpuinfo sends IPI to all CPUs. Many applications read /proc/cpuinfo at the start for trivial reasons like counting cores or detecting cpu features. While sensitive workloads like DPDK network polling don't like any interrupts. Integrates this feature with cpu isolation and do not send IPIs to CPUs without housekeeping flag HK_FLAG_MISC (set by nohz_full). Code that requests cpu frequency like show_cpuinfo() falls back to the last frequency set by the cpufreq driver if this method returns 0. Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Len Brown <len.brown@intel.com> Cc: Frederic Weisbecker <frederic@kernel.org> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Link: https://lkml.kernel.org/r/155790354043.1104.15333317408370209.stgit@buzz
|
#
82664963 |
|
01-Jun-2019 |
Thomas Gleixner <tglx@linutronix.de> |
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 437 Based on 1 normalized pattern(s): this file is licensed under gplv2 extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 22 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Armijn Hemel <armijn@tjaldur.nl> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190531190115.129548190@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
#
67e87d43 |
|
29-Mar-2019 |
Borislav Petkov <bp@suse.de> |
x86: Convert some slow-path static_cpu_has() callers to boot_cpu_has() Using static_cpu_has() is pointless on those paths, convert them to the boot_cpu_has() variant. No functional changes. Reported-by: Nadav Amit <nadav.amit@gmail.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Rik van Riel <riel@surriel.com> Reviewed-by: Juergen Gross <jgross@suse.com> # for paravirt Cc: Aubrey Li <aubrey.li@intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Dominik Brodowski <linux@dominikbrodowski.net> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Joerg Roedel <jroedel@suse.de> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Thomas Lendacky <Thomas.Lendacky@amd.com> Cc: linux-edac@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: virtualization@lists.linux-foundation.org Cc: x86@kernel.org Link: https://lkml.kernel.org/r/20190330112022.28888-3-bp@alien8.de
|
#
ad3bc25a |
|
04-Dec-2018 |
Borislav Petkov <bp@suse.de> |
x86/kernel: Fix more -Wmissing-prototypes warnings ... with the goal of eventually enabling -Wmissing-prototypes by default. At least on x86. Make functions static where possible, otherwise add prototypes or make them visible through includes. asm/trace/ changes courtesy of Steven Rostedt <rostedt@goodmis.org>. Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Reviewed-by: Ingo Molnar <mingo@kernel.org> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> # ACPI + cpufreq bits Cc: Andrew Banman <andrew.banman@hpe.com> Cc: Dimitri Sivanich <dimitri.sivanich@hpe.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mike Travis <mike.travis@hpe.com> Cc: "Steven Rostedt (VMware)" <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yi Wang <wang.yi59@zte.com.cn> Cc: linux-acpi@vger.kernel.org
|
#
7d5905dc |
|
14-Nov-2017 |
Rafael J. Wysocki <rafael.j.wysocki@intel.com> |
x86 / CPU: Always show current CPU frequency in /proc/cpuinfo After commit 890da9cf0983 (Revert "x86: do not use cpufreq_quick_get() for /proc/cpuinfo "cpu MHz"") the "cpu MHz" number in /proc/cpuinfo on x86 can be either the nominal CPU frequency (which is constant) or the frequency most recently requested by a scaling governor in cpufreq, depending on the cpufreq configuration. That is somewhat inconsistent and is different from what it was before 4.13, so in order to restore the previous behavior, make it report the current CPU frequency like the scaling_cur_freq sysfs file in cpufreq. To that end, modify the /proc/cpuinfo implementation on x86 to use aperfmperf_snapshot_khz() to snapshot the APERF and MPERF feedback registers, if available, and use their values to compute the CPU frequency to be reported as "cpu MHz". However, do that carefully enough to avoid accumulating delays that lead to unacceptable access times for /proc/cpuinfo on systems with many CPUs. Run aperfmperf_snapshot_khz() once on all CPUs asynchronously at the /proc/cpuinfo open time, add a single delay upfront (if necessary) at that point and simply compute the current frequency while running show_cpuinfo() for each individual CPU. Also, to avoid slowing down /proc/cpuinfo accesses too much, reduce the default delay between consecutive APERF and MPERF reads to 10 ms, which should be sufficient to get large enough numbers for the frequency computation in all cases. Fixes: 890da9cf0983 (Revert "x86: do not use cpufreq_quick_get() for /proc/cpuinfo "cpu MHz"") Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Ingo Molnar <mingo@kernel.org>
|
#
b29c6ef7 |
|
12-Nov-2017 |
Rafael J. Wysocki <rafael.j.wysocki@intel.com> |
x86 / CPU: Avoid unnecessary IPIs in arch_freq_get_on_cpu() Even though aperfmperf_snapshot_khz() caches the samples.khz value to return if called again in a sufficiently short time, its caller, arch_freq_get_on_cpu(), still uses smp_call_function_single() to run it which may allow user space to trigger an IPI storm by reading from the scaling_cur_freq cpufreq sysfs file in a tight loop. To avoid that, move the decision on whether or not to return the cached samples.khz value to arch_freq_get_on_cpu(). This change was part of commit 941f5f0f6ef5 ("x86: CPU: Fix up "cpu MHz" in /proc/cpuinfo"), but it was not the reason for the revert and it remains applicable. Fixes: 4815d3c56d1e (cpufreq: x86: Make scaling_cur_freq behave more as expected) Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: WANG Chao <chao.wang@ucloud.cn> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
ea0ee339 |
|
10-Nov-2017 |
Linus Torvalds <torvalds@linux-foundation.org> |
Revert "x86: CPU: Fix up "cpu MHz" in /proc/cpuinfo" This reverts commit 941f5f0f6ef5338814145cf2b813cf1f98873e2f. Sadly, it turns out that we really can't just do the cross-CPU IPI to all CPU's to get their proper frequencies, because it's much too expensive on systems with lots of cores. So we'll have to revert this for now, and revisit it using a smarter model (probably doing one system-wide IPI at open time, and doing all the frequency calculations in parallel). Reported-by: WANG Chao <chao.wang@ucloud.cn> Reported-by: Ingo Molnar <mingo@kernel.org> Cc: Rafael J Wysocki <rafael.j.wysocki@intel.com> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
941f5f0f |
|
03-Nov-2017 |
Rafael J. Wysocki <rafael.j.wysocki@intel.com> |
x86: CPU: Fix up "cpu MHz" in /proc/cpuinfo Commit 890da9cf0983 (Revert "x86: do not use cpufreq_quick_get() for /proc/cpuinfo "cpu MHz"") is not sufficient to restore the previous behavior of "cpu MHz" in /proc/cpuinfo on x86 due to some changes made after the commit it has reverted. To address this, make the code in question use arch_freq_get_on_cpu() which also is used by cpufreq for reporting the current frequency of CPUs and since that function doesn't really depend on cpufreq in any way, drop the CONFIG_CPU_FREQ dependency for the object file containing it. Also refactor arch_freq_get_on_cpu() somewhat to avoid IPIs and return cached values right away if it is called very often over a short time (to prevent user space from triggering IPI storms through it). Fixes: 890da9cf0983 (Revert "x86: do not use cpufreq_quick_get() for /proc/cpuinfo "cpu MHz"") Cc: stable@kernel.org # 4.13 - together with 890da9cf0983 Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
8e2f3bce |
|
08-Aug-2017 |
Doug Smythies <doug.smythies@gmail.com> |
cpufreq: x86: Disable interrupts during MSRs reading According to Intel 64 and IA-32 Architectures SDM, Volume 3, Chapter 14.2, "Software needs to exercise care to avoid delays between the two RDMSRs (for example interrupts)". So, disable interrupts during reading MSRs IA32_APERF and IA32_MPERF. See also: commit 4ab60c3f32c7 (cpufreq: intel_pstate: Disable interrupts during MSRs reading). Signed-off-by: Doug Smythies <dsmythies@telus.net> Reviewed-by: Len Brown <len.brown@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
#
4815d3c5 |
|
28-Jul-2017 |
Rafael J. Wysocki <rafael.j.wysocki@intel.com> |
cpufreq: x86: Make scaling_cur_freq behave more as expected After commit f8475cef9008 "x86: use common aperfmperf_khz_on_cpu() to calculate KHz using APERF/MPERF" the scaling_cur_freq policy attribute in sysfs only behaves as expected on x86 with APERF/MPERF registers available when it is read from at least twice in a row. The value returned by the first read may not be meaningful, because the computations in there use cached values from the previous iteration of aperfmperf_snapshot_khz() which may be stale. To prevent that from happening, modify arch_freq_get_on_cpu() to call aperfmperf_snapshot_khz() twice, with a short delay between these calls, if the previous invocation of aperfmperf_snapshot_khz() was too far back in the past (specifically, more that 1s ago). Also, as pointed out by Doug Smythies, aperf_delta is limited now and the multiplication of it by cpu_khz won't overflow, so simplify the s->khz computations too. Fixes: f8475cef9008 "x86: use common aperfmperf_khz_on_cpu() to calculate KHz using APERF/MPERF" Reported-by: Doug Smythies <dsmythies@telus.net> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
#
f8475cef |
|
23-Jun-2017 |
Len Brown <len.brown@intel.com> |
x86: use common aperfmperf_khz_on_cpu() to calculate KHz using APERF/MPERF The goal of this change is to give users a uniform and meaningful result when they read /sys/...cpufreq/scaling_cur_freq on modern x86 hardware, as compared to what they get today. Modern x86 processors include the hardware needed to accurately calculate frequency over an interval -- APERF, MPERF, and the TSC. Here we provide an x86 routine to make this calculation on supported hardware, and use it in preference to any driver driver-specific cpufreq_driver.get() routine. MHz is computed like so: MHz = base_MHz * delta_APERF / delta_MPERF MHz is the average frequency of the busy processor over a measurement interval. The interval is defined to be the time between successive invocations of aperfmperf_khz_on_cpu(), which are expected to to happen on-demand when users read sysfs attribute cpufreq/scaling_cur_freq. As with previous methods of calculating MHz, idle time is excluded. base_MHz above is from TSC calibration global "cpu_khz". This x86 native method to calculate MHz returns a meaningful result no matter if P-states are controlled by hardware or firmware and/or if the Linux cpufreq sub-system is or is-not installed. When this routine is invoked more frequently, the measurement interval becomes shorter. However, the code limits re-computation to 10ms intervals so that average frequency remains meaningful. Discerning users are encouraged to take advantage of the turbostat(8) utility, which can gracefully handle concurrent measurement intervals of arbitrary length. Signed-off-by: Len Brown <len.brown@intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|