History log of /linux-master/tools/power/x86/turbostat/turbostat.c
Revision Date Author Comments
# 3ab7296a 08-Apr-2024 Len Brown <len.brown@intel.com>

tools/power turbostat: v2024.04.10

Much of turbostat can now run with perf, rather than using the MSR driver

Some of turbostat can now run as a regular non-root user.

Add some new output columns for some new GFX hardware.

[This patch updates the version, but otherwise changes no function;
it touches up some checkpatch issues from previous patches]

Signed-off-by: Len Brown <len.brown@intel.com>


# 91a91d38 12-Mar-2024 Zhang Rui <rui.zhang@intel.com>

tools/power/turbostat: Add support for Xe sysfs knobs

Xe graphics driver uses different graphics sysfs knobs including
/sys/class/drm/card0/device/tile0/gt0/gtidle/idle_residency_ms
/sys/class/drm/card0/device/tile0/gt0/freq0/cur_freq
/sys/class/drm/card0/device/tile0/gt0/freq0/act_freq
/sys/class/drm/card0/device/tile0/gt1/gtidle/idle_residency_ms
/sys/class/drm/card0/device/tile0/gt1/freq0/cur_freq
/sys/class/drm/card0/device/tile0/gt1/freq0/act_freq

Plus that,
/sys/class/drm/card0/device/tile0/gt<n>/gtidle/name
returns either gt<n>-rc or gt<n>-mc. rc is for GFX and mc is SA Media.

Enhance turbostat to prefer the Xe sysfs knobs when they are available.
Export gt<n>-rc via BIC_GFX_rc6/BIC_GFXMHz/BIC_GFXACTMHz.
Export gt<n>-mc via BIC_SMA_mc6/BIC_SMAMHz/BIC_SMAACTMHz.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>


# dc02dc93 21-Mar-2024 Zhang Rui <rui.zhang@intel.com>

tools/power/turbostat: Add support for new i915 sysfs knobs

On Meteorlake platform, i915 driver supports the traditional graphics
sysfs knobs including
/sys/class/drm/card0/power/rc6_residency_ms
/sys/class/drm/card0/gt_cur_freq_mhz
/sys/class/drm/card0/gt_act_freq_mhz

At the same time, it also supports
/sys/class/drm/card0/gt/gt0/rc6_residency_ms
/sys/class/drm/card0/gt/gt0/rps_cur_freq_mhz
/sys/class/drm/card0/gt/gt0/rps_act_freq_mhz
/sys/class/drm/card0/gt/gt1/rc6_residency_ms
/sys/class/drm/card0/gt/gt1/rps_cur_freq_mhz
/sys/class/drm/card0/gt/gt1/rps_act_freq_mhz
gt0 is for GFX and gt1 is for SA Media.

Enhance turbostat to prefer the i915 new sysfs knobs.
Export gt0 via BIC_GFX_rc6/BIC_GFXMHz/BIC_GFXACTMHz.
Export gt1 via BIC_SMA_mc6/BIC_SMAMHz/BIC_SMAACTMHz.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>


# 3bbb331c 12-Mar-2024 Zhang Rui <rui.zhang@intel.com>

tools/power/turbostat: Introduce BIC_SAM_mc6/BIC_SAMMHz/BIC_SAMACTMHz

Graphics driver (i915/Xe) on mordern platforms splits GFX and SA Media
information via different sysfs knobs.

Existing BIC_GFX_rc6/BIC_GFXMHz/BIC_GFXACTMHz columns can be reused for
GFX.

Introduce BIC_SAM_mc6/BIC_SAMMHz/BIC_SAMACTMHz columns for SA Media.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>


# 60add818 02-Apr-2024 Justin Ernst <justin.ernst@hpe.com>

tools/power/turbostat: Fix uncore frequency file string

Running turbostat on a 16 socket HPE Scale-up Compute 3200 (SapphireRapids) fails with:
turbostat: /sys/devices/system/cpu/intel_uncore_frequency/package_010_die_00/current_freq_khz: open failed: No such file or directory

We observe the sysfs uncore frequency directories named:
...
package_09_die_00/
package_10_die_00/
package_11_die_00/
...
package_15_die_00/

The culprit is an incorrect sprintf format string "package_0%d_die_0%d" used
with each instance of reading uncore frequency files. uncore-frequency-common.c
creates the sysfs directory with the format "package_%02d_die_%02d". Once the
package value reaches double digits, the formats diverge.

Change each instance of "package_0%d_die_0%d" to "package_%02d_die_%02d".

[lenb: deleted the probe part of this patch, as it was already fixed]

Signed-off-by: Justin Ernst <justin.ernst@hpe.com>
Reviewed-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Len Brown <len.brown@intel.com>


# de39d38c 12-Mar-2024 Zhang Rui <rui.zhang@intel.com>

tools/power/turbostat: Unify graphics sysfs snapshots

Graphics sysfs snapshots share similar logic.
Combine them into one function to avoid code duplication.

No functional change.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# 4e2bbbf7 12-Mar-2024 Zhang Rui <rui.zhang@intel.com>

tools/power/turbostat: Cache graphics sysfs path

Graphics drivers (i915/Xe) have different sysfs knobs on different
platforms, and it is possible that different sysfs knobs fit into the
same turbostat columns.

Instead of specifying different sysfs knobs every time, detect them
once and cache the path for future use.

No functional change.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# bb5db22c 11-Mar-2024 Zhang Rui <rui.zhang@intel.com>

tools/power/turbostat: Enable MSR_CORE_C1_RES support for ICX

Enable Core C1 hardware residency counter (MSR_CORE_C1_RES) on ICX.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# 05a2f07d 04-Mar-2024 Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com>

tools/power turbostat: read RAPL counters via perf

Some of the future Intel platforms will require reading the RAPL
counters via perf and not MSR. On current platforms we can still read
them using both ways.

Signed-off-by: Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# ebf8449c 14-Feb-2024 Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com>

tools/power turbostat: Add proper re-initialization for perf file descriptors

Signed-off-by: Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# 4a1bb4da 14-Mar-2024 Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com>

tools/power turbostat: Clear added counters when in no-msr mode

If user request --no-msr or is not able to access the MSRs,
turbostat should clear all the counters added with --add.
Because MSR access permission checks are done after the cmdline is
parsed, the decision has to be defered up until the transition into
no-msr mode happen.

Signed-off-by: Len Brown <len.brown@intel.com>


# aed48c48 30-Jan-2024 Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com>

tools/power turbostat: add early exits for permission checks

Checking early if the permissions are even needed gets rid of the
warnings about some of them missing. Earlier we issued a warning in case
of missing MSR and/or perf permissions, even when user never asked for
counters that require those.

Signed-off-by: Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# 5088741e 15-Jan-2024 Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com>

tools/power turbostat: detect and disable unavailable BICs at runtime

To allow unprivileged user to run turbostat seamlessly.

Signed-off-by: Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# e48934c9 11-Jan-2024 Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com>

tools/power turbostat: Add reading aperf and mperf via perf API

By using the perf API we spend less time in between the reads of the
counters, resulting in more accurate calculations of the dependent
metrics.

Using perf API is also usually faster overall, although cache miss, if
we get one, is more costly when using perf vs MSR driver.

We would fallback to the msr reads if the sysfs isn't there or when in
--no-perf mode.

Signed-off-by: Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# a0e86c90 11-Jan-2024 Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com>

tools/power turbostat: Add --no-perf option

Add the --no-perf option to allow users to run turbostat without
accessing perf.

Signed-off-by: Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# 3e404846 11-Jan-2024 Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com>

tools/power turbostat: Add --no-msr option

Add --no-msr option to allow users to run turbostat without
accessing MSRs via the MSR driver.

Signed-off-by: Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# 2d2ccd57 05-Feb-2024 Len Brown <len.brown@intel.com>

tools/power turbostat: enhance -D (debug counter dump) output

Eliminate redundant debug output for core and package scope counters.

Include name and path for all "ADDED" counters.

Signed-off-by: Len Brown <len.brown@intel.com>


# b6fe9383 18-Jan-2024 Len Brown <len.brown@intel.com>

tools/power turbostat: Fix warning upon failed /dev/cpu_dma_latency read

Previously a failed read of /dev/cpu_dma_latency erroneously complained
turbostat: capget(CAP_SYS_ADMIN) failed, try "# setcap cap_sys_admin=ep ./turbostat

This went unnoticed because this file is typically visible to root,
and turbostat was typically run as root.

Going forward, when a non-root user can run turbostat...
Complain about failed read access to this file only if --debug is used.

Signed-off-by: Len Brown <len.brown@intel.com>


# 538d505f 22-Jan-2024 Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com>

tools/power turbostat: Read base_hz and bclk from CPUID.16H if available

If MSRs cannot be read, values can be obtained from cpuid.

Signed-off-by: Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# fb5ceca0 12-Jan-2024 Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com>

tools/power turbostat: Print ucode revision only if valid

If the MSR read were to fail, turbostat would print "microcode 0x0"

Signed-off-by: Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# bb6181fa 20-Dec-2023 Len Brown <len.brown@intel.com>

tools/power turbostat: Expand probe_intel_uncore_frequency()

Print current frequency along with the current (and initial) limits

Probe and print uncore config also for machines using the new cluster API

Signed-off-by: Len Brown <len.brown@intel.com>


# 227ed18f 21-Oct-2023 Chen Yu <yu.c.chen@intel.com>

tools/power turbostat: Do not print negative LPI residency

turbostat prints the abnormal SYS%LPI across suspend-to-idle:
SYS%LPI = 114479815993277.50

This is reproduced by:
Run a freeze cycle, e.g. "sleepgraph -m freeze -rtcwake 15".
Then do a reboot. After boot up, launch the suspend-idle-idle
and check the SYS%LPI field.

The slp_so residence counter is in LPIT table, and BIOS does not
clears this register across reset. The PMC expects the OS to calculate
the LPI residency based on the delta. However, there is an firmware
issue that the LPIT gets cleared to 0 during the second suspend
to idle after the reboot, which brings negative delta value.

[lenb: updated to print "neg" upon this BIOS failure]

Reported-by: Todd Brandt <todd.e.brandt@intel.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# 3ac1d14d 02-Oct-2023 Wyes Karny <wyes.karny@amd.com>

tools/power turbostat: Increase the limit for fd opened

When running turbostat, a system with 512 cpus reaches the limit for
maximum number of file descriptors that can be opened. To solve this
problem, the limit is raised to 2^15, which is a large enough number.

Below data is collected from AMD server systems while running turbostat:

|-----------+-------------------------------|
| # of cpus | # of opened fds for turbostat |
|-----------+-------------------------------|
| 128 | 260 |
|-----------+-------------------------------|
| 192 | 388 |
|-----------+-------------------------------|
| 512 | 1028 |
|-----------+-------------------------------|

So, the new max limit would be sufficient up to 2^14 cpus (but this
also depends on how many counters are enabled).

Reviewed-by: Doug Smythies <dsmythies@telus.net>
Tested-by: Doug Smythies <dsmythies@telus.net>
Signed-off-by: Wyes Karny <wyes.karny@amd.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# e5f4e68e 03-Apr-2023 Doug Smythies <dsmythies@telus.net>

tools/power turbostat: Fix added raw MSR output

When using --Summary mode, added MSRs in raw mode always
print zeros. Print the actual register contents.

Example, with patch:

note the added column:
--add msr0x64f,u32,package,raw,REASON

Where:

0x64F is MSR_CORE_PERF_LIMIT_REASONS

Busy% Bzy_MHz PkgTmp PkgWatt CorWatt REASON
0.00 4800 35 1.42 0.76 0x00000000
0.00 4801 34 1.42 0.76 0x00000000
80.08 4531 66 108.17 107.52 0x08000000
98.69 4530 66 133.21 132.54 0x08000000
99.28 4505 66 128.26 127.60 0x0c000400
99.65 4486 68 124.91 124.25 0x0c000400
99.63 4483 68 124.90 124.25 0x0c000400
79.34 4481 41 99.80 99.13 0x0c000000
0.00 4801 41 1.40 0.73 0x0c000000

Where, for the test processor (i5-10600K):

PKG Limit #1: 125.000 Watts, 8.000000 sec
MSR bit 26 = log; bit 10 = status

PKG Limit #2: 136.000 Watts, 0.002441 sec
MSR bit 27 = log; bit 11 = status

Example, without patch:

Busy% Bzy_MHz PkgTmp PkgWatt CorWatt REASON
0.01 4800 35 1.43 0.77 0x00000000
0.00 4801 35 1.39 0.73 0x00000000
83.49 4531 66 112.71 112.06 0x00000000
98.69 4530 68 133.35 132.69 0x00000000
99.31 4500 67 127.96 127.30 0x00000000
99.63 4483 69 124.91 124.25 0x00000000
99.61 4481 69 124.90 124.25 0x00000000
99.61 4481 71 124.92 124.25 0x00000000
59.35 4479 42 75.03 74.37 0x00000000
0.00 4800 42 1.39 0.73 0x00000000
0.00 4801 42 1.42 0.76 0x00000000

c000000

[lenb: simplified patch to apply only to package scope]

Signed-off-by: Doug Smythies <dsmythies@telus.net>
Signed-off-by: Len Brown <len.brown@intel.com>


# b8337e6a 07-Nov-2023 Len Brown <len.brown@intel.com>

tools/power turbostat: version 2023.11.07

Turbostat features are now table-driven (Rui Zhang)
Add support for some new platforms (Sumeet Pawnikar, Rui Zhang)
Gracefully run in configs when CPUs are limited (Rui Zhang, Srinivas Pandruvada)
misc minor fixes.

Signed-off-by: Len Brown <len.brown@intel.com>


# f2c1dba3 28-Jun-2023 Len Brown <len.brown@intel.com>

tools/power/turbostat: bugfix "--show IPC"

turbostat --show IPC

displays "inf" for the IPC column

turbostat was missing the explicit dependency of IPC on APERF,
and thus neglected to collect APERF when only IPC was requested.

typcial use:

turbostat --quiet --show CPU,IPC

Signed-off-by: Len Brown <len.brown@intel.com>


# 956dbd3d 23-May-2023 Sumeet Pawnikar <sumeet.r.pawnikar@intel.com>

tools/power/turbostat: Add initial support for LunarLake

Add initial support for LunarLake platform.

It shares the same features with CannonLake.

Signed-off-by: Sumeet Pawnikar <sumeet.r.pawnikar@intel.com>


# 7b57e7b6 23-May-2023 Sumeet Pawnikar <sumeet.r.pawnikar@intel.com>

tools/power/turbostat: Add initial support for ArrowLake

Add initial support for ArrowLake platform.

It shares the same features with CannonLake.

Signed-off-by: Sumeet Pawnikar <sumeet.r.pawnikar@intel.com>


# 5a6efcb9 18-Oct-2023 Zhang Rui <rui.zhang@intel.com>

tools/power/turbostat: Add initial support for GrandRidge

Add initial support for GrandRidge.

It shares the same features as SierraForest, except that it does not
support PC2/PC6.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>


# d33605f3 27-Sep-2023 Zhang Rui <rui.zhang@intel.com>

tools/power/turbostat: Add initial support for SierraForest

Add initial support for SierraForest.

It shares the same features with SapphireRapids, except that it has
MSR_MODULE_C6_RES_MS support.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>


# 5feab4a6 08-Sep-2023 Zhang Rui <rui.zhang@intel.com>

tools/power/turbostat: Add initial support for GraniteRapids

Add initial support for GraniteRapids.

It shares the same features with SapphireRapids.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>


# 0e3f10e6 18-Oct-2023 Zhang Rui <rui.zhang@intel.com>

tools/power/turbostat: Add MSR_CORE_C1_RES support for spr_features

Add MSR_CORE_C1_RES support for spr_features because both Sapphirerapids
and Emeraldrapids support this MSR.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>


# 37f68a29 24-Jan-2023 Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>

tools/power/turbostat: Move process to root cgroup

When available CPUs are reduced via cgroup cpuset controller, turbostat
will exit with errors (For example):
get_counters: Could not migrate to CPU 0
turbostat: re-initialized with num_cpus 20
get_counters: Could not migrate to CPU 0
turbostat: re-initialized with num_cpus 20

Move the turbostat to root cgroup, which has every CPU.

Writing the value 0 to a cgroup.procs file causes the writing
process to be moved to the corresponding cgroup.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Tested-by: Zhang Rui <rui.zhang@intel.com>


# f638858d 19-Oct-2023 Zhang Rui <rui.zhang@intel.com>

tools/power/turbostat: Handle cgroup v2 cpu limitation

CPUs can be isolated via cgroup settings and turbostat should avoid
migrating to these CPUs, just like it does for the '-c' cpus.

Introduce cpu_effective_set to save the cgroup cpu limitation info from
/sys/fs/cgroup/cpuset.cpus.effective. And use cpu_allowed_set as the
intersection of cpu_present_set, cpu_effective_set and cpu_subset.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>


# 8c3dd2c9 19-Oct-2023 Zhang Rui <rui.zhang@intel.com>

tools/power/turbostat: Abstrct function for parsing cpu string

Abstract parse_cpu_str() which can update any specified cpu_set by a
given cpu string. This can be used to handle further CPU limitations
from other sources like cgroup.

The cpu string parsing code is also enhanced to handle the strings that
have an extra '\n' before string terminator.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>


# c25ef0e5 20-Oct-2023 Zhang Rui <rui.zhang@intel.com>

tools/power/turbostat: Handle offlined CPUs in cpu_subset

It is possible that the cpu_subset contains offlined CPUs.

If this happens during start, exit immediately because this is likely an
operator error that is best fixed by re-invoking.
If this happens at runtime, give a warning only because turbostat should
do its best effort to continue running.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>


# 0fe37529 06-Oct-2023 Zhang Rui <rui.zhang@intel.com>

tools/power/turbostat: Obey allowed CPUs for system summary

System summary should summarize the information for allowed CPUs instead
of all the present CPUs.

Introduce topology information for allowed CPUs, and use them to
get system summary.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>


# ccf8a052 05-Oct-2023 Zhang Rui <rui.zhang@intel.com>

tools/power/turbostat: Obey allowed CPUs for primary thread/core detection

Thread_id doesn't tell if a CPU is allowed or not.

Detect allowed CPUs only and use the first detected thread/core as the
primary thread/core of a core/package.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>


# 74318add 04-Oct-2023 Zhang Rui <rui.zhang@intel.com>

tools/power/turbostat: Abstract several functions

When detecting the primary thread/core in a core/package, current code
doesn't handle the allowed CPUs.

Abstract several functions for further fix of this issue.

No functional change.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>


# 7bb3fe27 04-Oct-2023 Zhang Rui <rui.zhang@intel.com>

tools/power/turbostat: Obey allowed CPUs during startup

Set turbostat CPU affinity to make sure turbostat is running on one of
the allowed CPUs.

Set base_cpu to the first allowed CPU so that some platform information
is dumped using one of the allowed CPUs.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>


# 4ede6d1c 03-Oct-2023 Zhang Rui <rui.zhang@intel.com>

tools/power/turbostat: Obey allowed CPUs when accessing CPU counters

for_all_cpus/for_all_cpus_2 are used for accessing the per CPU counters,
and they should follow the cpu_allowed_set instead of cpu_present_set.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>


# 71cfd1da 05-Oct-2023 Zhang Rui <rui.zhang@intel.com>

tools/power/turbostat: Introduce cpu_allowed_set

Turbostat supports "-c" parameter which limits output to system summary
plus the specified cpu-set. But some code still uses cpu_present_set to
read and dump the counters.

Introduce cpu_allowed_set for code that should obey the specified cpu-set.

No functional change.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>


# 6b74a30b 08-Sep-2023 Zhang Rui <rui.zhang@intel.com>

tools/power/turbostat: Remove PC7/PC9 support on ADL/RPL

Compared with other platforms that share cnl_features, ADL/RPL don't
have PC7/PC9.

Clone a new platform feature set from cnl_features for ADL/RPL, with
PC7/PC9 removed.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>


# 05ad96ff 27-Aug-2023 Zhang Rui <rui.zhang@intel.com>

tools/power/turbostat: Enable MSR_CORE_C1_RES on recent Intel client platforms

All recent Intel client platforms have MSR_CORE_C1_RES. Enable the
support on these platforms, including CNL/ICL/LKF/RKL/TGL/ADL/RPL/MTL.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>


# 7ee39d8d 13-Sep-2023 Zhang Rui <rui.zhang@intel.com>

tools/power/turbostat: Introduce probe_pm_features()

Feature probe has nothing to do with CPUID, thus it should not be in
process_cpuids().

Introduce probe_pm_features() and move all feature probing functions
into it.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>


# 5612b2c8 30-Aug-2023 Zhang Rui <rui.zhang@intel.com>

tools/power/turbostat: Relocate more probing related code

Relocate more feature probing code outside of process_cpuids() into the
corresponding probing functions.

This improves the readability of code and the turbostat output.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>


# ce7a32c2 30-Aug-2023 Zhang Rui <rui.zhang@intel.com>

tools/power/turbostat: Reorder some functions

Reorder some functions to solve code depdency introduced by next patch.

No functional change.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>


# db735f8b 31-Aug-2023 Zhang Rui <rui.zhang@intel.com>

tools/power/turbostat: Relocate thermal probing code

Introduce probe_thermal(), and move all thermal probing related code
into it.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>


# e7d7b82d 31-Aug-2023 Zhang Rui <rui.zhang@intel.com>

tools/power/turbostat: Relocate lpi probing code

Introduce probe_lpi(), and move all lpi probing related code into it.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>


# 2538d167 31-Aug-2023 Zhang Rui <rui.zhang@intel.com>

tools/power/turbostat: Relocate graphics probing code

Introduce probe_graphics(), and move all graphics probing related code
into it.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>


# 6cb13609 31-Aug-2023 Zhang Rui <rui.zhang@intel.com>

tools/power/turbostat: Rename rapl probing function

Rename rapl_probe() to probe_rapl() to be consistent with other probing
function names.

Probe rapl after probing uncore frequency.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>


# 622c8f23 31-Aug-2023 Zhang Rui <rui.zhang@intel.com>

tools/power/turbostat: Rename uncore probing function

Rename intel_uncore_frequency_probe() to probe_intel_uncore_frequency()
to be consistent with other probing function names.

Probe uncore frequency right after probing cstates.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>


# 11cd9a09 31-Aug-2023 Zhang Rui <rui.zhang@intel.com>

tools/power/turbostat: Relocate pstate probing code

Introduce probe_pstates() and move all pstate probing related code into
it.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>


# 045acf60 28-Aug-2023 Zhang Rui <rui.zhang@intel.com>

tools/power/turbostat: Relocate cstate probing code

Move all cstate probing related code into probe_cstates().

Note that dump_platform_info() actually dumps both MSR_PLATFORM_INFO and
MSR_IA32_POWER_CTL. MSR_PLATFORM_INFO is for pstate and
MSR_IA32_POWER_CTL is for cstate. So split dump_platform_info() and dump
MSR_IA32_POWER_CTL in probe_cstates().

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>


# 32e8c616 31-Aug-2023 Zhang Rui <rui.zhang@intel.com>

tools/power/turbostat: Improve probe_platform_features() logic

AMD/Hygon platforms that don't have RAPL use 'amd_features' to describe
the platform features. Unknown Intel platforms use 'default_features' to
describe the platform features.

As none of the platform feature is set for 'amd_features' or
'default_features', there is no need to maintain both of them.

Remove 'amd_features' structure and improve the logic in
probe_platform_features().

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>


# d085b3b0 27-Aug-2023 Zhang Rui <rui.zhang@intel.com>

tools/power/turbostat: Delete intel_model_duplicates()

Now CPU model checks have been cleaned up, no code depends on the
duplicated CPU model value.

Delete intel_model_duplicates().

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>


# 7d0ebe6f 31-Aug-2023 Zhang Rui <rui.zhang@intel.com>

tools/power/turbostat: Abstract cstate prewake bit support

Abstract cstate prewake bit support.

Delete is_icx()/is_spr() CPU model checks.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>


# ed43247b 31-Aug-2023 Zhang Rui <rui.zhang@intel.com>

tools/power/turbostat: Abstract aperf/mperf multiplier support

Abstract aperf/mperf multiplier support.

Delete is_knl() CPU model check.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>


# 58ddb691 27-Aug-2023 Zhang Rui <rui.zhang@intel.com>

tools/power/turbostat: Abstract extended cstate MSRs support

Abstract the support for MSR_PKG_WEIGHTED_CORE_C0_RES,
MSR_PKG_ANY_CORE_C0_RES, MSR_PKG_ANY_GFXE_C0_RES and
MSR_PKG_BOTH_CORE_GFXE_C0_RES.

Delete has_skl_msrs() CPU model check.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>


# 80d132cb 27-Aug-2023 Zhang Rui <rui.zhang@intel.com>

tools/power/turbostat: Abstract MSR_KNL_CORE_C6_RESIDENCY support

Abstract the support for MSR_KNL_CORE_C6_RESIDENCY.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>


# c8202a6c 27-Aug-2023 Zhang Rui <rui.zhang@intel.com>

tools/power/turbostat: Abstract MSR_ATOM_PKG_C6_RESIDENCY support

Abstract the support for MSR_ATOM_PKG_C6_RESIDENCY.

Delete is_slm() CPU model check.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>


# 6c36882e 27-Aug-2023 Zhang Rui <rui.zhang@intel.com>

tools/power/turbostat: Abstract MSR_CC6/MC6_DEMOTION_POLICY_CONFIG support

Abstract the support for MSR_CC6/MC6_DEMOTION_POLICY_CONFIG.

Delete has_slv_msrs() CPU model check.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>


# 9cc1c103 27-Aug-2023 Zhang Rui <rui.zhang@intel.com>

tools/power/turbostat: Abstract MSR_MODULE_C6_RES_MS support

Abstract MSR_MODULE_C6_RES_MS support.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>


# 76d83d2a 27-Aug-2023 Zhang Rui <rui.zhang@intel.com>

tools/power/turbostat: Abstract MSR_CORE_C1_RES support

Abstract the support for MSR_CORE_C1_RES.

Delete is_dnv() CPU model check.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>


# 148df4fd 08-Sep-2023 Zhang Rui <rui.zhang@intel.com>

tools/power/turbostat: Abstract IRTL support

Abstract the support for MSR_PKGC3/PKGC6/PKGC7/PKGC8/PKGC9/PKGC10_IRTL.

Delete has_snb_msrs() CPU model check.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>


# 8c382f9e 27-Aug-2023 Zhang Rui <rui.zhang@intel.com>

tools/power/turbostat: Use fine grained IRTL output

It is pointless to dump the IRTL register for a package cstate that is
not supported by the platform.

Print IRTL only for states that are available in
platform->supported_cstates.

Delete has_c8910_msrs() CPU model check.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>


# cd7a2b6a 27-Aug-2023 Zhang Rui <rui.zhang@intel.com>

tools/power/turbostat: Adjust cstate for is_slm()/is_knl()/is_cnl()/is_ehl() models

Disable CC3 for is_slm()/is_knl()/is_cnl()/is_ehl() models.

Delete is_cnl()/is_ehl() CPU model checks.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>


# 4d2c95d4 27-Aug-2023 Zhang Rui <rui.zhang@intel.com>

tools/power/turbostat: Adjust cstate for has_c8910_msrs() models

Enable PC8/PC9/PC10 for has_c8910_msrs() models.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>


# 11096948 27-Aug-2023 Zhang Rui <rui.zhang@intel.com>

tools/power/turbostat: Adjust cstate for is_bdx() models

Disable CC7/PC7 for is_bdx() models.

Delete is_bdx() CPU model check.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>


# 24d16bec 08-Sep-2023 Zhang Rui <rui.zhang@intel.com>

tools/power/turbostat: Adjust cstate for is_skx()/is_icx()/is_spr() models

Disable CC3/CC7/PC3/PC7 for is_skx()/is_icx()/is_spr() models.

Delete is_skx() CPU model check.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>


# 8e20ced0 27-Aug-2023 Zhang Rui <rui.zhang@intel.com>

tools/power/turbostat: Adjust cstate for is_dnv() models

Enable CC1 and disable CC3/CC7/PC3/PC7 for is_dnv() models.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>


# 3d982ac0 27-Aug-2023 Zhang Rui <rui.zhang@intel.com>

tools/power/turbostat: Adjust cstate for is_jvl() models

Disable CC3/CC7/PC2/PC3/PC6/PC7 for is_jvl() models.

Delete is_jvl() CPU model check.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>


# ff206149 27-Aug-2023 Zhang Rui <rui.zhang@intel.com>

tools/power/turbostat: Adjust cstate for has_slv_msrs() models

Disable PC2/PC3/PC7 and enable PC6 for has_slv_msrs() models.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>


# 192cbf04 08-Sep-2023 Zhang Rui <rui.zhang@intel.com>

tools/power/turbostat: Adjust cstate for has_snb_msrs() models

Enable PC7 for has_snb_msrs() models.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>


# 6f1935c0 08-Sep-2023 Zhang Rui <rui.zhang@intel.com>

tools/power/turbostat: Adjust cstate for models with .cst_limit set

Enable PC3/PC6 for platforms with .cst_limit set because package cstates
are guarded by pkg_cstate_limit.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>


# 942c854d 08-Sep-2023 Zhang Rui <rui.zhang@intel.com>

tools/power/turbostat: Adjust cstate for has_snb_msrs() models

Enable CC7 and PC2 for has_snb_msrs() models.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>


# ce7ddf8a 08-Sep-2023 Zhang Rui <rui.zhang@intel.com>

tools/power/turbostat: Adjust cstate for models with .has_nhm_msrs set

Enable CC1/CC3/CC6 for platforms with .has_nhm_msrs set.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>


# 3c6a17b8 27-Aug-2023 Zhang Rui <rui.zhang@intel.com>

tools/power/turbostat: Add skeleton support for cstate enumeration

Add skeleton support for cstate enumeration.

Note that the previous logic may override the cstate setting for
multiple times for different reasons. The conversion to new cstate
enumeration must be done step by step following the previous code
order strictly.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>


# 485a017c 26-Aug-2023 Zhang Rui <rui.zhang@intel.com>

tools/power/turbostat: Abstract TSC tweak support

On some models, the CPU base frequency is different from the TSC
frequency, and the aperf/mperf counters are running at CPU base
frequency instead of TSC frequency.

Abstract support for TSC tweak.

Given that tsc_tweak depends on base_hz, move the code to probe_bclk()
after base_hz is available.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>


# bf1ad57c 26-Aug-2023 Zhang Rui <rui.zhang@intel.com>

tools/power/turbostat: Remove unused family/model parameters for RAPL functions

RAPL probing can be done without family/model checking. Remove these
parameters in rapl probe functions.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>


# 7c604093 26-Aug-2023 Zhang Rui <rui.zhang@intel.com>

tools/power/turbostat: Abstract hardcoded TDP value

Different hardcoded TDP values are used when TDP can not be retrieved
from the hardware.

Abstract hardcoded TDP value.

Delete CPU model checks in get_tdp_intel().

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>


# 9e6f3515 21-Apr-2023 Zhang Rui <rui.zhang@intel.com>

tools/power/turbostat: Abstract fixed DRAM Energy unit support

Abstract the support for fixed Dram domain energy unit.

Delete rapl_dram_energy_units_probe() CPU model check.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>


# 6d35b8c4 21-Apr-2023 Zhang Rui <rui.zhang@intel.com>

tools/power/turbostat: Abstract RAPL divisor support

INTEL_FAM6_ATOM_SILVERMONT model needs a divisor to convert the raw
Energy Units value from MSR_RAPL_POWER_UNIT.

Abstract the support for RAPL divisor.

Delete CPU model check in rapl_probe_intel().

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>


# e338831b 27-Aug-2023 Zhang Rui <rui.zhang@intel.com>

tools/power/turbostat: Abstract Per Core RAPL support

Abstract the support for Per Core RAPL.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>


# 86ba263d 28-Aug-2023 Zhang Rui <rui.zhang@intel.com>

tools/power/turbostat: Abstract RAPL MSRs support

Abstract the support for RAPL MSRs.

Delete CPU model checks in rapl_probe_intel().

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>


# a98f8860 26-Aug-2023 Zhang Rui <rui.zhang@intel.com>

tools/power/turbostat: Simplify the logic for RAPL enumeration

The support for each RAPL domains, as well as the support for the perf
status of each RAPL domains, can be detected by checking the
availabilities of the corresponding RAPL MSRs.

Change the code accordingly and remove the hardcoded logic for each
model.

Note that this also fixes the INTEL_FAM6_ATOM_TREMONT model, which has
RAPL_PKG_PERF_STATUS and MSR_DRAM_PERF_STATUS but doesn't have BIC_PKG__
and BIC_RAM__ set.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>


# b9cd6683 26-Aug-2023 Zhang Rui <rui.zhang@intel.com>

tools/power/turbostat: Redefine RAPL macros

Redefine RAPL macros to make the code more readable.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>


# a5d1ab93 21-Apr-2023 Zhang Rui <rui.zhang@intel.com>

tools/power/turbostat: Abstract hardcoded Crystal Clock frequency

Abstract the support for hardcoded Crystal Clock frequency, which is
used when crystal clock is not available from CPUID.15.

Delete CPU model checks in process_cpuid().

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>


# d90120bf 21-Apr-2023 Zhang Rui <rui.zhang@intel.com>

tools/power/turbostat: Abstract Automatic Cstate Conversion support

Abstract the support for AUTOMATIC_CSTATE_CONVERSION bit in
MSR_PKG_CST_CONFIG_CONTROL.

Delete automatic_cstate_conversion_probe() CPU model check.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>


# 0c057cf7 30-Jul-2023 Zhang Rui <rui.zhang@intel.com>

tools/power/turbostat: Abstract Perf Limit Reasons MSRs support

Abstract the support for MSR_CORE/GFX/RING_PERF_LIMIT_REASONS MSRs.

Delete perf_limit_reasons_probe() CPU model check.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>


# d8e1623b 25-Aug-2023 Zhang Rui <rui.zhang@intel.com>

tools/power/turbostat: Abstract TCC Offset bits support

Abstract the support for different TCC Offset bits in
MSR_IA32_TEMPERATURE_TARGET.

Delete check_tcc_offset() CPU model check.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>


# a61c9cb4 31-Aug-2023 Zhang Rui <rui.zhang@intel.com>

tools/power/turbostat: Abstract Config TDP MSRs support

Abstract the support for MSR_CONFIG_TDP_NOMINAL/LEVEL_1/LEVEL_2/CONTROL
and MSR_TURBO_ACTIVATION_RATIO.

Delete has_config_tdp() CPU model check.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>


# a3943dea 29-Jul-2023 Zhang Rui <rui.zhang@intel.com>

tools/power/turbostat: Rename some TRL functions

Rename dump_hsw_turbo_ratio_limits() and dump_ivt_turbo_ratio_limits()
to dump_turbo_ratio_limit2() and dump_turbo_ratio_limit1() because they
dump MSR_TURBO_RATIO_LIMIT1/LIMIT2, and the MSRs' behavior is
consistent when they are available.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>


# 10d85d85 08-Sep-2023 Zhang Rui <rui.zhang@intel.com>

tools/power/turbostat: Abstract Turbo Ratio Limit MSRs support

Abstract the support for MSR_TURBO_RATIO_LIMIT, MSR_TRUBO_RATIO_LIMIT1,
MSR_TURBO_RATIO_LIMIT2, MSR_SECONDARY_TURBO_RATIO_LIMIT,
MSR_ATOM_CORE_RATIOS and MSR_ATOM_CORE_TURBO_RATIOS.

Delete has_turbo_ratio_group_limits(), has_turbo_ratio_limit(),
has_atom_turbo_ratio_limit(), has_ivt_turbo_ratio_limit(),
has_hsw_turbo_ratio_limit(), has_knl_turbo_ratio_limit() and
has_glm_turbo_ratio_limit() CPU model checks.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>


# 8b7199c0 27-Mar-2023 Zhang Rui <rui.zhang@intel.com>

tools/power/turbostat: Rename some functions

Rename dump_nhm_platform_info() and dump_nhm_cst_cfg() to
dump_platform_info() and dump_cst_cfg() because these MSRs' behavior is
consistent when they're available.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>


# c2c25e85 27-Mar-2023 Zhang Rui <rui.zhang@intel.com>

tools/power/turbostat: Remove a redundant check

Platforms with has_msr_misc_pwr_mgmt set is a subset of platforms with
has_nhm_msrs set.

Thus remove the redudant check for platform->has_nhm_msrs.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>


# fcfa1ce0 25-Aug-2023 Zhang Rui <rui.zhang@intel.com>

tools/power/turbostat: Abstract Nehalem MSRs support

MSR_PLATFORM_INFO, MSR_IA32_TEMPERATURE_TARGET, MSR_SMI_COUNT,
MSR_PKG_CST_CONFIG_CONTROL, and the TRL MSRs are always available for
platforms since Nehalem. Support for these msrs can be described
altogether.

Abstract the support for these MSRs.

Delete probe_nhm_msrs() CPU model check.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>


# 3989fc89 27-Aug-2023 Zhang Rui <rui.zhang@intel.com>

tools/power/turbostat: Abstract Package cstate limit decoding support

Abstract the support for decoding package cstate limit from
MSR_PKG_CST_CONFIG_CONTROL.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>


# 71e84129 21-Aug-2023 Zhang Rui <rui.zhang@intel.com>

tools/power/turbostat: Abstract BCLK frequency support

Abstract CPU base clock frequency support.

Note that bclk is used by
1. calculate base_hz using MSR_PLATFORM_INFO, which is guarded by
probe_nhm_msrs().
2. dump MSR_PLATFORM_INFO and Turbo Ratio Limit MSRs, which are also
guarded by probe_nhm_msrs().
Thus probe_bclk() works for probe_nhm_msrs() models only.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>


# 3dd0e754 21-Aug-2023 Zhang Rui <rui.zhang@intel.com>

tools/power/turbostat: Abstract MSR_MISC_PWR_MGMT support

Abstract MSR_MISC_PWR_MGMT support.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>


# 778fc34a 21-Aug-2023 Zhang Rui <rui.zhang@intel.com>

tools/power/turbostat: Abstract MSR_MISC_FEATURE_CONTROL support

Abstract MSR_MISC_FEATURE_CONTROL support.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>


# 45232ab1 27-Aug-2023 Zhang Rui <rui.zhang@intel.com>

tools/power/turbostat: Add skeleton support for table driven feature enumeration

Turbostat supports a series of features that may diverge among different
CPU models.

Current code uses various of CPU model checks in different places to
handle this, which makes the code hard to maintain.

Add skeleton support for table driven feature enumeration to replace the
current error-prone CPU model checks and global variables.

Note: by comparing the CPU models with intel-family.h, it is found that
turbostat support for below four Models are missing, including
INTEL_FAM6_ICELAKE, INTEL_FAM6_ATOM_SILVERMONT_MID,
INTEL_FAM6_ATOM_AIRMONT_MID and INTEL_FAM6_ATOM_AIRMONT_NP. Adding
support for these models is a different work, thus it is not covered in
this patch set.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>


# 48674c1b 26-Mar-2023 Zhang Rui <rui.zhang@intel.com>

tools/power/turbostat: Remove pseudo check for two models

INTEL_FAM6_ATOM_SILVERMONT_MID/INTEL_FAM6_ATOM_AIRMONT_MID are not
listed in probe_nhm_msrs(). This means that most of the turbostat
features are not available on these two platforms.

Further more, checking for these two models in has_slv_msrs() is
dead code. Because has_slv_msrs() is called by the code guarded by
probe_nhm_msrs().

For these two reasons, remove pseudo check for
INTEL_FAM6_ATOM_SILVERMONT_MID and INTEL_FAM6_ATOM_AIRMONT_MID.

Will add back the support when we can access these two platforms.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>


# bbfc33b1 24-Mar-2023 Zhang Rui <rui.zhang@intel.com>

tools/power/turbostat: Remove redundant duplicates

Remove redundant duplicates in intel_model_duplicates().

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>


# 6d306d6e 24-Mar-2023 Zhang Rui <rui.zhang@intel.com>

tools/power/turbostat: Replace raw value cpu model with Macro

Kernel already has
#define INTEL_FAM6_NEHALEM_G 0x1F /* Auburndale / Havendale */

Use standard Macro for CPU Model instead of raw value.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>


# 2c019d65 21-Aug-2022 Zhang Rui <rui.zhang@intel.com>

tools/power/turbostat: Support alternative graphics sysfs knobs

/sys/class/graphics/fb0/device/drm/card0/ and /sys/class/drm/card0/
point to the same device node.
But in some cases, one exists and the other one does not.

Prefer to use /sys/class/drm/card0/, and fall back to
/sys/class/graphics/fb0/device/drm/card0/.

This recovers the "GFXMHz" and "GFXAMHz" columns on some platforms like
a SPR server.

Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>


# b98a6d78 24-Mar-2023 Zhang Rui <rui.zhang@intel.com>

tools/power/turbostat: Enable TCC Offset on more models

All Models that duplicate INTEL_FAM6_CANNONLAKE_L support TCC Offset.
Enable this feature on all these models.

Delete obsolete model_orig.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>


# b61b7d8c 26-Mar-2023 Chen Yu <yu.c.chen@intel.com>

tools/power/turbostat: Enable the C-state Pre-wake printing

Currently the C-state Pre-wake will not be printed due to the
probe has not been invoked. Invoke the probe function accordingly.

Fixes: aeb01e6d71ff ("tools/power turbostat: Print the C-state Pre-wake settings")
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Reviewed-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>


# 137f01b3 25-Mar-2023 Zhang Rui <rui.zhang@intel.com>

tools/power/turbostat: Fix a knl bug

MSR_KNL_CORE_C6_RESIDENCY should be evaluated only if
1. this is KNL platform
AND
2. need to get C6 residency or need to calculate C1 residency

Fix the broken logic introduced by commit 1e9042b9c8d4 ("tools/power
turbostat: Fix CPU%C1 display value").

Fixes: 1e9042b9c8d4 ("tools/power turbostat: Fix CPU%C1 display value")
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>


# 4d182748 31-Jul-2023 Zhang Rui <rui.zhang@intel.com>

tools/power/turbostat: Fix failure with new uncore sysfs

On some platforms, turbostat fails during launch time like below,

turbostat version 2023.03.17 - Len Brown <lenb@kernel.org>
...
cpu40: MSR_IA32_PACKAGE_THERM_STATUS: 0x884c0000 (24 C)
cpu40: MSR_IA32_PACKAGE_THERM_INTERRUPT: 0x00000003 (100 C, 100 C)
turbostat: snapshot_sysfs_counter(/sys/devices/system/cpu/intel_uncore_frequency/package_00_die_00/current_freq_khz): No data available

This is because new uncore sysfs is used on these platforms as
introduced by commit 9b8dea80e3cb ("platform/x86/intel-uncore-freq:
Support for cluster level controls").

With the new uncore sysfs interface,
/sys/devices/system/cpu/intel_uncore_frequency/package_00_die_00/current_freq_khz
is still available, but reading it fails.

How to support the fabric cluster level uncore sysfs is not settled yet,
as a short term fix, clear the BIC_UNCORE_MHZ bit when new sysfs I/F is
detected.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>


# 882cdb06 07-Aug-2023 Peter Zijlstra <peterz@infradead.org>

x86/cpu: Fix Gracemont uarch

Alderlake N is an E-core only product using Gracemont
micro-architecture. It fits the pre-existing naming scheme perfectly
fine, adhere to it.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20230807150405.686834933@infradead.org


# de7839ee 17-Mar-2023 Len Brown <len.brown@intel.com>

tools/power turbostat: version 2023.03.17

Happy St. Patrick's Day!

Signed-off-by: Len Brown <len.brown@intel.com>


# 92c25393 25-Jan-2023 Antti Laakso <antti.laakso@intel.com>

tools/power turbostat: fix decoding of HWP_STATUS

The "excursion to minimum" information is in bit2
in HWP_STATUS MSR. Fix the bitmask used for
decoding the register.

Signed-off-by: Antti Laakso <antti.laakso@intel.com>
Reviewed-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# 93cac415 04-Jan-2023 Zhang Rui <rui.zhang@intel.com>

tools/power turbostat: Introduce support for EMR

Introduce support for EMR.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Tested-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# 6cbfedc7 17-Mar-2023 Len Brown <len.brown@intel.com>

tools/power turbostat: remove stray newlines from warn/warnx strings

warn(3) terminates strings with newlines

Signed-off-by: Len Brown <len.brown@intel.com>


# 40aafc7d 15-Dec-2022 Prarit Bhargava <prarit@redhat.com>

tools/power turbostat: Fix /dev/cpu_dma_latency warnings

When running as non-root the following error is seen in turbostat:

turbostat: fopen /dev/cpu_dma_latency
: Permission denied

turbostat and the man page have information on how to avoid other
permission errors, so these can be fixed the same way.

Provide better /dev/cpu_dma_latency warnings that provide instructions on
how to avoid the error, and update the man page.

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Cc: linux-pm@vger.kernel.org
Signed-off-by: Len Brown <len.brown@intel.com>


# 9c085817 18-Oct-2022 Prarit Bhargava <prarit@redhat.com>

tools/power turbostat: Provide better debug messages for failed capabilities accesses

turbostat reports some capabilities access errors and not others. Provide
the same debug message for all errors.

[lenb: remove extra quotes]

Cc: David Arcari <darcari@redhat.com>
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# 884a1f95 12-Oct-2022 Len Brown <len.brown@intel.com>

tools/power turbostat: update dump of SECONDARY_TURBO_RATIO_LIMIT

cosmetic only (but useful if you copy/paste)

Signed-off-by: Len Brown <len.brown@intel.com>


# 9992dd77 04-Oct-2022 Len Brown <len.brown@intel.com>

tools/power turbostat: version 2022.10.04

Signed-off-by: Len Brown <len.brown@intel.com>


# b2d433ae 23-Sep-2022 Zhang Rui <rui.zhang@intel.com>

tools/power turbostat: Use standard Energy Unit for SPR Dram RAPL domain

Intel Xeon servers used to use a fixed energy resolution (15.3uj) for
Dram RAPL domain. But on SPR, Dram RAPL domain follows the standard
energy resolution as described in MSR_RAPL_POWER_UNIT.

Remove the SPR rapl_dram_energy_units quirk.

Fixes: e7af1ed3fa47 ("tools/power turbostat: Support additional CPU model numbers")
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Tested-by: Wang Wendy <wendy.wang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# 3ea8e52e 16-Sep-2022 Artem Bityutskiy <artem.bityutskiy@linux.intel.com>

tools/power turbostat: Do not dump TRL if turbo is not supported

Do not dump turbo ratio limits if platform does not support turbo, because it
is confusing and the TRL MSRs may even include misleading information. And they
are not supposed to be relied on if turbo is not supported.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# 8e45a9bf 10-Sep-2022 Zhang Rui <rui.zhang@intel.com>

tools/power turbostat: Add support for MeteorLake platforms

Add turbostat support for MeteorLake platforms, which behave the same
as RaptorLake platforms.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# 9b1c2ecf 31-Aug-2022 Zhang Rui <rui.zhang@intel.com>

tools/power turbostat: Add support for RPL-S

Add turbostat support for RAPTORLAKE_S platform, which behaves the same
as RAPTORLAKE and RAPTORLAKE_P platforms.

RPL-S 601/801 have different CPU ID than the Hybrid ADL-S platforms.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# 3afe697b 28-Jul-2022 Len Brown <len.brown@intel.com>

tools/power turbostat: version 2022.07.28

update version number

Signed-off-by: Len Brown <len.brown@intel.com>


# 6287e6f0 26-Jul-2022 Artem Bityutskiy <artem.bityutskiy@linux.intel.com>

tools/power turbostat: do not decode ACC for ICX and SPR

The ACC (automatic C-state conversion) feature was available on Sky Lake and
Cascade Lake Xeons (SKX and CLX), but it is not available on Ice Lake and
Sapphire Rapids Xeons (ICX and SPR). Therefore, stop decoding it for ICX and
SPR.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# 0e4d42af 26-Jul-2022 Artem Bityutskiy <artem.bityutskiy@linux.intel.com>

tools/power turbostat: fix SPR PC6 limits

Sapphire Rapids Xeon (SPR) supports 2 flavors of PC6 - PC6N (non-retention) and
PC6R (retention). Before this patch we used ICX package C-state limits, which
was wrong, because ICX has only one PC6 flavor. With this patch, we use SKX PC6
limits for SPR, because they are the same.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# eade39b2 26-Jul-2022 Artem Bityutskiy <artem.bityutskiy@linux.intel.com>

tools/power turbostat: cleanup 'automatic_cstate_conversion_probe()'

The 'automatic_cstate_conversion_probe()' function has a too long 'if'
statement, convert it to a 'switch' statement in order to improve code
readability a bit.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# 684e40e9 26-Jul-2022 Artem Bityutskiy <artem.bityutskiy@linux.intel.com>

tools/power turbostat: separate SPR from ICX

Before this patch, SPR platform was considered identical to ICX platform. This
patch separates SPR support from ICX.

This patch is a preparation for adding SPR-specific package C-state limits
support.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Reviewed-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# 2db0e5eb 21-Jun-2022 Jiang Jian <jiangjian@cdjrlc.com>

tools/power turbosstat: fix comment

remove duplicate "the" in comment

Signed-off-by: Jiang Jian <jiangjian@cdjrlc.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# 6f9cf553 01-Jun-2022 George D Sworo <george.d.sworo@intel.com>

tools/power turbostat: Support RAPTORLAKE P

Add initial support for Raptorlake model

Signed-off-by: George D Sworo <george.d.sworo@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# 1c1313b5 12-May-2022 Zhang Rui <rui.zhang@intel.com>

tools/power turbostat: add support for ALDERLAKE_N

Add support for ALDERLAKE_N platform.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# 4af184ee 31-May-2022 Len Brown <len.brown@intel.com>

tools/power turbostat: dump secondary Turbo-Ratio-Limit

Intel Performance Hybrid processors have a 2nd MSR
describing the turbo limits enforced on the Ecores.

Note, TRL and Secondary-TRL are usually R/O information,
but on overclock-capable parts, they can be written.

Signed-off-by: Len Brown <len.brown@intel.com>


# 5d622845 31-May-2022 Len Brown <len.brown@intel.com>

tools/power turbostat: simplify dump_turbo_ratio_limits()

code cleanup only.
no functional change.

Signed-off-by: Len Brown <len.brown@intel.com>


# 774627c5 31-May-2022 Len Brown <len.brown@intel.com>

tools/power turbostat: dump CPUID.7.EDX.Hybrid

CPUID leaf 7 EDX now tells us if the processor has hybrid CPUs

Signed-off-by: Len Brown <len.brown@intel.com>


# a5c6d65d 13-May-2022 Len Brown <len.brown@intel.com>

tools/power turbostat: Show uncore frequency

When CONFIG_INTEL_UNCORE_FREQ_CONTROL is effective,
(Linux 5.9 and later), print the current (and default)
min and max uncore frequency limits.

When that driver provides the current uncore frequency
(Linux 5.18 and later), print a UncMHz column
reflecting the current uncore frequency.

Note that UncMHz is an instantaneous sample, not an average.

eg.

$ sudo ./turbostat -S --show frequency
...
Uncore Frequency pkg0 die0: 800 - 3900 MHz (800 - 3900 MHz)
...
Avg_MHz Busy% Bzy_MHz TSC_MHz UncMHz
28 0.70 4049 3095 3900

Signed-off-by: Len Brown <len.brown@intel.com>


# 5e5fd36c 26-Apr-2022 Colin Ian King <colin.king@intel.com>

tools/power turbostat: Fix file pointer leak

Currently if a fscanf fails then an early return leaks an open
file pointer. Fix this by fclosing the file before the return.
Detected using static analysis with cppcheck:

tools/power/x86/turbostat/turbostat.c:2039:3: error: Resource leak: fp [resourceLeak]

Fixes: eae97e053fe3 ("tools/power turbostat: Support thermal throttle count print")
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Acked-by: Chen Yu <yu.c.chen@intel.com>
Reviewed-by: Tom Rix <trix@redhat.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# e13da9a1 26-Apr-2022 Colin Ian King <colin.king@intel.com>

tools/power turbostat: replace strncmp with single character compare

Using strncmp for a single character comparison is overly complicated,
just use a simpler single character comparison instead. Also stops
static analyzers (such as cppcheck) from complaining about strncmp on
non-null terminated strings.

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# 03331233 21-Apr-2022 Chen Yu <yu.c.chen@intel.com>

tools/power turbostat: print the kernel boot commandline

It would be handy to have cmdline in turbostat output. For example,
according to the turbostat output, there are no C-states requested.
In this case the user is very curious if something like
intel_idle.max_cstate=0 was used, or may be idle=none too. It is
also curious whether things like intel_pstate=nohwp were used.

Print the boot command line accordingly:
turbostat version 21.05.04 - Len Brown <lenb@kernel.org>
Kernel command line: BOOT_IMAGE=/boot/vmlinuz-5.16.0+ root=UUID=
b42359ed-1e05-42eb-8757-6bf2a1c19070 ro quiet splash vt.handoff=7

Suggested-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# fb5e29df 21-Apr-2022 Zhang Rui <rui.zhang@intel.com>

tools/power turbostat: Introduce support for RaptorLake

RaptorLake is compatible with AlderLake.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# 58990892 16-Apr-2022 Len Brown <len.brown@intel.com>

tools/power turbostat: version 2022.04.16

Signed-off-by: Len Brown <len.brown@intel.com>


# 9878bf7a 16-Apr-2022 Len Brown <len.brown@intel.com>

tools/power turbostat: No build warnings with -Wextra

Signed-off-by: Len Brown <len.brown@intel.com>


# 164d7a96 16-Apr-2022 Len Brown <len.brown@intel.com>

tools/power turbostat: be more useful as non-root

Don't exit if used this way:

sudo setcap cap_sys_nice,cap_sys_rawio=+ep ./turbostat
sudo chmod +r /dev/cpu/*/msr
./turbostat

note: cap_sys_admin is now also needed for the perf IPC counter:
sudo setcap cap_sys_admin,cap_sys_nice,cap_sys_rawio=+ep ./turbostat

Reported-by: Artem S. Tashkinov <aros@gmx.com>
Reported-by: Toby Broom <tbroom@outlook.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# 6397b641 10-Feb-2022 Len Brown <len.brown@intel.com>

tools/power turbostat: fix ICX DRAM power numbers

ICX (and its duplicates) require special hard-coded DRAM RAPL units,
rather than using the generic RAPL energy units.

Reported-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# eae97e05 12-Nov-2021 Chen Yu <yu.c.chen@intel.com>

tools/power turbostat: Support thermal throttle count print

The turbostat data is collected by end user for power evaluationit. However
it looks like we are missing enough thermal context there. Already a couple of
time we found that power management developer asking something like this:
grep -r . /sys/devices/system/cpu/cpu*/thermal_throttle/*

Print the per core thermal throttle count so as to get suffificent thermal
context.

turbostat -i 5 -s Core,CPU,CoreThr
Core CPU CoreThr
- - 104
0 0 61
0 4
1 1 0
1 5
2 2 104
2 6
3 3 7
3 7

Suggested-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# c7e399f8 04-Oct-2021 Zephaniah E. Loss-Cutler-Hull <zephaniah@gmail.com>

tools/power turbostat: Allow printing header every N iterations

This gives the ability to reprint the header every N iterations, so you
can ensure that a scrolling display always has the header visible
somewhere on the screen.

Signed-off-by: Zephaniah E. Loss-Cutler-Hull <zephaniah@gmail.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# 0fc521bc 04-Oct-2021 Zephaniah E. Loss-Cutler-Hull <zephaniah@gmail.com>

tools/power turbostat: Allow -e for all names.

Currently, there are a number of variables which are displayed by
default, enabled with -e all, and listed by --list, but which you can
not give to --enable/-e.

So you can enable CPU0c1 (in the bic array), but you can't enable C1 or
C1% (not in the bic array, but exists in sysfs).

This runs counter to both the documentation and user expectations, and
it's just not very user friendly.

As such, the mechanism used by --hide has been duplicated, and is now
also used by --enable, so we can handle unknown names gracefully.

Note: One impact of this is that truly unknown fields given to --enable
will no longer generate errors, they will be silently ignored, as --hide
does.

Signed-off-by: Zephaniah E. Loss-Cutler-Hull <zephaniah@gmail.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# 6b398625 30-Sep-2021 Sumeet Pawnikar <sumeet.r.pawnikar@intel.com>

tools/power turbostat: print power values upto three decimal

Print power values upto three decimal places in watts.

Suggested-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Sumeet Pawnikar <sumeet.r.pawnikar@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# f52ba931 20-Aug-2021 Sumeet Pawnikar <sumeet.r.pawnikar@intel.com>

tools/power turbostat: Add Power Limit4 support

Add Power Limit4 support.

Signed-off-by: Sumeet Pawnikar <sumeet.r.pawnikar@intel.com>
Acked-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# 6799ba84 09-May-2021 Dan Merillat <git@dan.eginity.com>

tools/power turbostat: fix dump for AMD cpus

turbostat --Dump exits early with status 243 (-13)

get_counters() calls get_msr_sum() on zen CPUS
for MSR_PKG_ENERGY_STAT, but per_cpu_msr_sum
has not been initialized.

Signed-off-by: Dan Merillat <git@dan.eginity.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# 5dc241f2 16-Jul-2021 Len Brown <len.brown@intel.com>

tools/power turbostat: tweak --show and --hide capability

allow invocations such as # turbostat --show power,Busy%

previously the "Busy%" was ignored

Signed-off-by: Len Brown <len.brown@intel.com>


# a1b6f487 08-Mar-2022 Artem Bityutskiy <artem.bityutskiy@linux.intel.com>

turbostat: fix PC6 displaying on some systems

'MSR_PKG_CST_CONFIG_CONTROL' encodes the deepest allowed package C-state limit,
and turbostat decodes it.

Before this patch: turbostat does not recognize value "3" on Ice Lake Xeon
(ICX) and Sapphire Rapids Xeon (SPR), treats it as "unknown", and does not
display any package C-states in the results table.

After this patch: turbostat recognizes value 3 on ICX and SPR, treats it as
"PC6", and correctly displays package C-states in the results table.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 3c070b2a 04-May-2021 Len Brown <len.brown@intel.com>

tools/power turbostat: version 2021.05.04

Signed-off-by: Len Brown <len.brown@intel.com>


# b60c573d 04-May-2021 Len Brown <len.brown@intel.com>

tools/power turbostat: Support "turbostat --hide idle"

As idle, in particular, can have many columns on some machines...
Make it easy to ignore them all at once.

Signed-off-by: Len Brown <len.brown@intel.com>


# 38c6663a 04-May-2021 Len Brown <len.brown@intel.com>

tools/power turbostat: elevate priority of interval mode

This makes interval mode less likely to see delayed
results on a heavily loaded system.

Signed-off-by: Len Brown <len.brown@intel.com>


# 1b439f01 04-May-2021 Len Brown <len.brown@intel.com>

tools/power turbostat: formatting

Spring is here...
run a long overdue Lendent on turbostat.c

no functional change

Signed-off-by: Len Brown <len.brown@intel.com>


# 55279aef 26-Apr-2021 Zhang Rui <rui.zhang@intel.com>

tools/power turbostat: rename tcc variables

There are two TCC activation temeprature.
One is the default TCC activation temperature, also known as TJ_MAX.
Another one is the effective TCC activation temperature, which is the
subtraction of default TCC activation temperature and TCC offset.

The name of variable tcc_activation_temp might be misleading here.
Thus rename tcc_activation_temp to tj_max, and use tcc_default and
tcc_offset to calculate the effective TCC activation temperature.

No functional change in this patch.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# 0b9a0b9b 21-Apr-2021 Zhang Rui <rui.zhang@intel.com>

tools/power turbostat: add TCC Offset support

The length of TCC Offset bits varies on different platforms.
Decode TCC Offset bits only for the platforms that we have verified.
For the others, only show default TCC activation temperature.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# e9d3092f 25-Apr-2021 Zhang Rui <rui.zhang@intel.com>

tools/power turbostat: save original CPU model

CPU model may get changed in intel_model_duplicates() for code reuse.
But there are still some cases we need the original CPU model to handle
minor differences between generations.

Thus save the original CPU model.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# 7ab5ff49 21-Apr-2021 Zhang Rui <rui.zhang@intel.com>

tools/power turbostat: Fix Core C6 residency on Atom CPUs

For Atom CPUs that have core cstate deeper than C6,
MSR_CORE_C6_RESIDENCY actually returns the residency for both CC6 and
deeper Core cstates.
Thus, the real Core C6 residency should be the subtraction of
MSR_CORE_C6_RESIDENCY return value and MSR_CORE_C6_RESIDENCY return value.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# aeb01e6d 27-Apr-2021 Chen Yu <yu.c.chen@intel.com>

tools/power turbostat: Print the C-state Pre-wake settings

C-state pre-wake setting[1] is an optimization for some Intel CPUs to
be woken up from deep C-states in order to reduce latency. According to
the spec, the BIT30 is the C-state Pre-wake Disable. Expose this setting
accordingly.
Sample output from turbostat:
...
cpu51: MSR_IA32_POWER_CTL: 0x1a00a40059 (C1E auto-promotion: DISabled)
C-state Pre-wake: ENabled
cpu51: MSR_TURBO_RATIO_LIMIT: 0x2021212121212224
...

[1] https://intel.github.io/wult/#c-state-pre-wake

Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# 8c69da29 27-Apr-2021 Chen Yu <yu.c.chen@intel.com>

tools/power turbostat: Enable tsc_tweak for Elkhart Lake and Jasper Lake

It was found that on Elkhart Lake the TSC frequency is driven by
a separate crystal-clock domain, which is different from the
BCLK domain which includes mperf. This has result in small different
speed thus inconsistence between TSC and the mperf, which caused the
Busy% to be higher than 100%. On this platform it seems that the mperf
runs faster than tsc when the CPU is 100% utilized:
delta tsc(18815473183) < delta mperf(18958403680) for 10 seconds.

To align TSC with mperf, leverage the tsc_tweak mechanism introduced for
cores newer than Skylake, so that TSC and mperf would be calculated in
the same domain.

Reported-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# 1e3ec5cd 25-Mar-2021 Randy Dunlap <rdunlap@infradead.org>

tools/power turbostat: unmark non-kernel-doc comment

Do not mark a comment as kernel-doc notation when it is not
meant to be in kernel-doc notation.

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Len Brown <len.brown@intel.com>


# 25368d7c 04-May-2021 Chen Yu <yu.c.chen@intel.com>

tools/power/turbostat: Remove Package C6 Retention on Ice Lake Server

Currently the turbostat treats ICX the same way as SKX and shares the
code among them. But one difference is that ICX does not support Package
C6 Retention, unlike SKX and CLX.

So this patch:

1. Splitting SKX and ICX in turbostat.
2. Removing Package C6 Rentention for ICX.

And after this split, it would be easier to cutomize Ice Lake Server
in turbostat in the future.

Suggested-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Reviewed-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Tested-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# 13a779de 28-Apr-2021 Calvin Walton <calvin.walton@kepstin.ca>

tools/power turbostat: Fix offset overflow issue in index converting

The idx_to_offset() function returns type int (32-bit signed), but
MSR_PKG_ENERGY_STAT is u32 and would be interpreted as a negative number.
The end result is that it hits the if (offset < 0) check in update_msr_sum()
which prevents the timer callback from updating the stat in the background when
long durations are used. The similar issue exists in offset_to_idx() and
update_msr_sum(). Fix this issue by converting the 'int' to 'off_t' accordingly.

Fixes: 9972d5d84d76 ("tools/power turbostat: Enable accumulate RAPL display")
Signed-off-by: Calvin Walton <calvin.walton@kepstin.ca>
Signed-off-by: Len Brown <len.brown@intel.com>


# 301b1d3a 28-Apr-2021 Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

tools/power/turbostat: Fix turbostat for AMD Zen CPUs

It was reported that on Zen+ system turbostat started exiting,
which was tracked down to the MSR_PKG_ENERGY_STAT read failing because
offset_to_idx wasn't returning a non-negative index.

This patch combined the modification from Bingsong Si and
Bas Nieuwenhuizen and addd the MSR to the index system as alternative for
MSR_PKG_ENERGY_STATUS.

Fixes: 9972d5d84d76 ("tools/power turbostat: Enable accumulate RAPL display")
Reported-by: youling257 <youling257@gmail.com>
Tested-by: youling257 <youling257@gmail.com>
Tested-by: Kurt Garloff <kurt@garloff.de>
Tested-by: Bingsong Si <owen.si@ucloud.cn>
Tested-by: Artem S. Tashkinov <aros@gmx.com>
Co-developed-by: Bingsong Si <owen.si@ucloud.cn>
Co-developed-by: Terry Bowman <terry.bowman@amd.com>
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# ba58ecde 12-Mar-2021 Len Brown <len.brown@intel.com>

tools/power turbostat: update version number


# abdc75ab 10-Mar-2021 Zhang Rui <rui.zhang@intel.com>

tools/power turbostat: Fix DRAM Energy Unit on SKX

SKX uses fixed DRAM Energy Unit, just like HSX and BDX.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# b2b94be7 11-Mar-2021 Len Brown <len.brown@intel.com>

Revert "tools/power turbostat: adjust for temperature offset"

This reverts commit 6ff7cb371c4bea3dba03a56d774da925e78a5087.

Apparently the TCC offset should not be used to adjust what temperature
we show the user after all.

(on most systems, TCC offset is 0, FWIW)

Fixes: 6ff7cb371c4b

Signed-off-by: Len Brown <len.brown@intel.com>


# 6c5c6560 03-Feb-2021 Chen Yu <yu.c.chen@intel.com>

tools/power turbostat: Support Ice Lake D

Ice Lake D is low-end server version of Ice Lake X, reuse
the code accordingly.

Tested-by: Wendy Wang <wendy.wang@intel.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# 5683460b 03-Feb-2021 Chen Yu <yu.c.chen@intel.com>

tools/power turbostat: Support Alder Lake Mobile

Share the code between Alder Lake Mobile and Alder Lake Desktop.

Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# ed0757b8 04-Feb-2021 Len Brown <len.brown@intel.com>

tools/power turbostat: print microcode patch level

(also available via "grep microcode /proc/cpuinfo")

Signed-off-by: Len Brown <len.brown@intel.com>


# 2af4f9b8 30-Jan-2021 Len Brown <len.brown@intel.com>

tools/power turbostat: add built-in-counter for IPC -- Instructions per Cycle

Use linux-perf to access the hardware instructions-retired counter.
This is necessary because the counter is not enabled by default,
and also the counter is prone to roll-over -- both of which
perf manages.

It is not necessary to use perf for the cycle counter,
because turbostat already needs to collect delta-aperf
to calcuate frequency.

Signed-off-by: Len Brown <len.brown@intel.com>


# 800c120e 25-Mar-2021 Randy Dunlap <rdunlap@infradead.org>

tools/turbostat: Unmark non-kernel-doc comment

Do not mark a comment as kernel-doc notation when it is not meant to be
in kernel-doc notation.

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210325201333.16792-1-rdunlap@infradead.org


# 7f1b11ba 28-Jan-2021 Borislav Petkov <bp@suse.de>

tools/power/turbostat: Fallback to an MSR read for EPB

Commit

6d6501d912a9 ("tools/power/turbostat: Read energy_perf_bias from sysfs")

converted turbostat to read the energy_perf_bias value from sysfs.
However, older kernels which do not have that file yet, would fail. For
those, fall back to the MSR reading.

Fixes: 6d6501d912a9 ("tools/power/turbostat: Read energy_perf_bias from sysfs")
Reported-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Link: https://lkml.kernel.org/r/20210127132444.981120-1-dedekind1@gmail.com


# 6d6501d9 15-Oct-2020 Borislav Petkov <bp@suse.de>

tools/power/turbostat: Read energy_perf_bias from sysfs

... instead of poking at the MSR directly.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Len Brown <lenb@kernel.org>
Cc: linux-pm@vger.kernel.org
Link: https://lkml.kernel.org/r/20201029190259.3476-3-bp@alien8.de


# 3e9fa998 30-Sep-2020 Len Brown <len.brown@intel.com>

tools/power turbostat: update version number

goodbye summer...

Signed-off-by: Len Brown <len.brown@intel.com>


# 3d7772ea 30-Sep-2020 Len Brown <len.brown@intel.com>

tools/power turbostat: harden against cpu hotplug

turbostat tends to get confused when CPUs are added and removed
while it is running.

There are races, such as checking the current cpu, and then
reading a sysfs file that depends on that cpu number.

Close the two issues that seem to come up the most.
First, there is an infinite reset loop detector --
change that to allow more resets before giving up.
Secondly, one of those file reads didn't really need
to exit the program on failure...

Signed-off-by: Len Brown <len.brown@intel.com>


# 6ff7cb37 29-Sep-2020 Len Brown <len.brown@intel.com>

tools/power turbostat: adjust for temperature offset

cpu1: MSR_IA32_TEMPERATURE_TARGET: 0x05640000 (95 C) (100 default - 5 offset)

Account for the new "offset" field in MSR_TEMPERATURE_TARGET.
While this field is usually zero, ignoring it results in over-stating
the current temperature, both per-core and per-package.

Signed-off-by: Len Brown <len.brown@intel.com>


# 33eb8225 17-Aug-2020 Kim Phillips <kim.phillips@amd.com>

tools/power turbostat: Support AMD Family 19h

Family 19h processors have the same RAPL (Running average power limit)
hardware register interface as Family 17h processors.

Change the family checks to succeed for Family 17h and above to enable
core and package energy measurement on Family 19h machines.

Also update the TDP to the largest found at the bottom of the page at
amd.com->processors->servers->epyc->2nd-gen-epyc, i.e., the EPYC 7H12.

Signed-off-by: Kim Phillips <kim.phillips@amd.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Len Brown <lenb@kernel.org>
Cc: linux-pm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Len Brown <len.brown@intel.com>


# 20de0dab 17-Aug-2020 Antti Laakso <antti.laakso@linux.intel.com>

tools/power turbostat: Remove empty columns for Jacobsville

Jacobsville doesn't have Package C2 and C6. Also
Core and DRAM RAPL are not available. Adjust output
accordingly.

Signed-off-by: Antti Laakso <antti.laakso@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# b4b91569 22-Apr-2020 Rafael Antognolli <rafael.antognolli@intel.com>

tools/power turbostat: Add a new GFXAMHz column that exposes gt_act_freq_mhz.

The column already present called GFXMHz reads from gt_cur_freq_mhz,
which represents the GT frequency that was requested, but power
management might not be able to do that. So the new column will display
what the actual frequency GT is running at.

Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# c315a09b 13-Aug-2020 Len Brown <len.brown@intel.com>

tools/power turbostat: Skip pc8, pc9, pc10 columns, if they are disabled

Like we skip PC3 and PC6 columns when the package C-state limit
disables them, skip PC8/PC9/CP10 under analogous conditions.

Reported-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# e7af1ed3 13-Aug-2020 Len Brown <len.brown@intel.com>

tools/power turbostat: Support additional CPU model numbers

Initial support for models recently added to intel-family.h.

Signed-off-by: Len Brown <len.brown@intel.com>


# fecb3bc8 10-Aug-2020 David Arcari <darcari@redhat.com>

tools/power turbostat: Fix output formatting for ACPI CST enumeration

turbostat formatting is broken with ACPI CST for enumeration. The
problem is that the CX_ACPI% is eight characters long which does not
work with tab formatting. One simple solution is to remove the underbar
from the state name such that C1_ACPI will be displayed as C1ACPI.

Signed-off-by: David Arcari <darcari@redhat.com>
Cc: Len Brown <lenb@kernel.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Len Brown <len.brown@intel.com>


# 8201a028 29-Jun-2020 Prarit Bhargava <prarit@redhat.com>

tools/power turbostat: Use sched_getcpu() instead of hardcoded cpu 0

Disabling cpu 0 results in an error

turbostat: /sys/devices/system/cpu/cpu0/topology/thread_siblings: open failed: No such file or directory

Use sched_getcpu() instead of a hardcoded cpu 0 to get the max cpu number.

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# 9972d5d8 18-Apr-2020 Chen Yu <yu.c.chen@intel.com>

tools/power turbostat: Enable accumulate RAPL display

Enable the accumulated RAPL display by default.

Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# 87e15da9 18-Apr-2020 Chen Yu <yu.c.chen@intel.com>

tools/power turbostat: Introduce functions to accumulate RAPL consumption

Since the RAPL Joule Counter is 32 bit, turbostat would
only print a *star* instead of printing the actual energy
consumed to indicate the overflow due to long duration.
This does not meet the requirement from servers as the
sampling time of turbostat is usually very long on servers.

So maintain a set of MSR buffer, and update them
periodically before the 32bit MSR register is wrapped round,
so as to avoid the overflow.

The idea is similar to the implementation of ktime_get():

Periodical MSR timer:
total_rapl_sum += (current_rapl_msr - last_rapl_msr);

Using get_msr_sum() to get the accumulated RAPL:
return (current_rapl_msr - last_rapl_msr) + total_rapl_sum;

The accumulated RAPL mechanism will be turned on in next patch.

Originally-by: Aaron Lu <aaron.lwe@gmail.com>
Reviewed-by: Doug Smythies <dsmythies@telus.net>
Tested-by: Doug Smythies <dsmythies@telus.net>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# 7c2ccc50 18-Apr-2020 Chen Yu <yu.c.chen@intel.com>

tools/power turbostat: Make the energy variable to be 64 bit

Change the energy variable from 32bit to 64bit,
so that it can record long time duration.
After this conversion, adjust the DELTA_WRAP32() accordingly.

Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# 9aefc2cd 26-Mar-2020 Doug Smythies <doug.smythies@gmail.com>

tools/power turbostat: Always print idle in the system configuration header

If the --quiet option is not used, turbostat prints a useful system
configuration header during startup.

But inclusion of idle system configuration information in this header
is currently a function of inclusion in the columns chosen to be displayed.

Always list this idle system configuration.

Signed-off-by: Doug Smythies <dsmythies@telus.net>
Signed-off-by: Len Brown <len.brown@intel.com>


# d76bb7a0 26-May-2020 Len Brown <len.brown@intel.com>

tools/power turbostat: Print /dev/cpu_dma_latency

Users are puzzled when they use tuned performance and all their
C-states vanish. Dump /dev/cpu_dma_latency and state
whether the value is default, or constraining,
to explain this situation.

Signed-off-by: Len Brown <len.brown@intel.com>


# b95fffb9 17-Nov-2019 Len Brown <len.brown@intel.com>

tools/power turbostat: update version

A stitch in time saves nine.

Signed-off-by: Len Brown <len.brown@intel.com>


# abdcbdb2 20-Mar-2020 Len Brown <len.brown@intel.com>

tools/power turbostat: Print cpuidle information

Print cpuidle driver and governor.

Originally-by: Antti Laakso <antti.laakso@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# fcaa681c 19-Mar-2020 Len Brown <len.brown@intel.com>

tools/power turbostat: Fix 32-bit capabilities warning

warning: `turbostat' uses 32-bit capabilities (legacy support in use)

Signed-off-by: Len Brown <len.brown@intel.com>


# 1f81c5ef 19-Mar-2020 Len Brown <len.brown@intel.com>

tools/power turbostat: Fix missing SYS_LPI counter on some Chromebooks

Some Chromebook BIOS' do not export an ACPI LPIT, which is how
Linux finds the residency counter for CPU and SYSTEM low power states,
that is exports in /sys/devices/system/cpu/cpuidle/*residency_us

When these sysfs attributes are missing, check the debugfs attrubte
from the pmc_core driver, which accesses the same counter value.

Signed-off-by: Len Brown <len.brown@intel.com>


# f6708400 18-Mar-2020 Chen Yu <yu.c.chen@intel.com>

tools/power turbostat: Support Elkhart Lake

From a turbostat point of view the Tremont-based Elkhart Lake
is very similar to Goldmont, reuse the code of Goldmont.

Elkhart Lake does not support 'group turbo limit counter'
nor C3, adjust the code accordingly.

Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# d7814c30 13-Jan-2020 Chen Yu <yu.c.chen@intel.com>

tools/power turbostat: Support Jasper Lake

Jasper Lake, like Elkhart Lake, uses a Tremont CPU.
So reuse the code.

Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# 23274faf 13-Jan-2020 Chen Yu <yu.c.chen@intel.com>

tools/power turbostat: Support Ice Lake server

From a turbostat point of view, Ice Lake server looks like Sky Lake server.

Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# 4bf7132a 13-Jan-2020 Chen Yu <yu.c.chen@intel.com>

tools/power turbostat: Support Tiger Lake

From a turbostat point of view, Tiger Lake looks like Ice Lake.

Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# d8d005ba 19-Mar-2020 Len Brown <len.brown@intel.com>

tools/power turbostat: Fix gcc build warnings

Warning: ‘__builtin_strncpy’ specified bound 20 equals destination size
[-Wstringop-truncation]

reduce param to strncpy, to guarantee that a null byte is always copied
into destination buffer.

Signed-off-by: Len Brown <len.brown@intel.com>


# 081c5432 31-Oct-2019 Chen Yu <yu.c.chen@intel.com>

tools/power turbostat: Support Cometlake

From a turbostat point of view, Cometlake is like Kabylake.

Suggested-by: Rui Zhang <rui.zhang@intel.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# f6505c88 20-Dec-2019 Sean Christopherson <seanjc@google.com>

tools/x86: Sync msr-index.h from kernel sources

Sync msr-index.h to pull in recent renames of the IA32_FEATURE_CONTROL
MSR definitions. Update KVM's VMX selftest and turbostat accordingly.
Keep the full name in turbostat's output to avoid breaking someone's
workflow, e.g. if a script is looking for the full name.

While using the renamed defines is by no means necessary, do the sync
now to avoid leaving a landmine that will get stepped on the next time
msr-index.h needs to be refreshed for some other reason.

No functional change intended.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20191221044513.21680-4-sean.j.christopherson@intel.com


# 9eb4b518 31-Aug-2019 Len Brown <len.brown@intel.com>

tools/power turbostat: update version number

Today is 19.08.31, at least in some parts of the world.

Signed-off-by: Len Brown <len.brown@intel.com>


# c1c10cc7 30-Aug-2019 Pu Wen <puwen@hygon.cn>

tools/power turbostat: Add support for Hygon Fam 18h (Dhyana) RAPL

Commit 9392bd98bba760be96ee ("tools/power turbostat: Add support for AMD
Fam 17h (Zen) RAPL") and the commit 3316f99a9f1b68c578c5 ("tools/power
turbostat: Also read package power on AMD F17h (Zen)") add AMD Fam 17h
RAPL support.

Hygon Family 18h(Dhyana) support RAPL in bit 14 of CPUID 0x80000007 EDX,
and has MSRs RAPL_PWR_UNIT/CORE_ENERGY_STAT/PKG_ENERGY_STAT. So add Hygon
Dhyana Family 18h support for RAPL.

Already tested on Hygon multi-node systems and it shows correct per-core
energy usage and the total package power.

Signed-off-by: Pu Wen <puwen@hygon.cn>
Reviewed-by: Calvin Walton <calvin.walton@kepstin.ca>
Signed-off-by: Len Brown <len.brown@intel.com>


# 9cfa8e04 30-Aug-2019 Pu Wen <puwen@hygon.cn>

tools/power turbostat: Fix caller parameter of get_tdp_amd()

Commit 9392bd98bba760be96ee ("tools/power turbostat: Add support for AMD
Fam 17h (Zen) RAPL") add a function get_tdp_amd(), the parameter is CPU
family. But the rapl_probe_amd() function use wrong model parameter.
Fix the wrong caller parameter of get_tdp_amd() to use family.

Cc: <stable@vger.kernel.org> # v5.1+
Signed-off-by: Pu Wen <puwen@hygon.cn>
Reviewed-by: Calvin Walton <calvin.walton@kepstin.ca>
Signed-off-by: Len Brown <len.brown@intel.com>


# 1e9042b9 27-Aug-2019 Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>

tools/power turbostat: Fix CPU%C1 display value

In some case C1% will be wrong value, when platform doesn't have MSR for
C1 residency.

For example:
Core CPU CPU%c1
- - 100.00
0 0 100.00
0 2 100.00
1 1 100.00
1 3 100.00

But adding Busy% will fix this
Core CPU Busy% CPU%c1
- - 99.77 0.23
0 0 99.77 0.23
0 2 99.77 0.23
1 1 99.77 0.23
1 3 99.77 0.23

This issue can be reproduced on most of the recent systems including
Broadwell, Skylake and later.

This is because if we don't select Busy% or Avg_MHz or Bzy_MHz then
mperf value will not be read from MSR, so it will be 0. But this
is required for C1% calculation when MSR for C1 residency is not present.
Same is true for C3, C6 and C7 column selection.

So add another define DO_BIC_READ(), which doesn't depend on user
column selection and use for mperf, C3, C6 and C7 related counters.
So when there is no platform support for C1 residency counters,
we still read these counters, if the CPU has support and user selected
display of CPU%c1.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# 6ee9fc63 14-Aug-2019 Artem Bityutskiy <artem.bityutskiy@linux.intel.com>

tools/power turbostat: do not enforce 1ms

Turbostat works by taking a snapshot of counters, sleeping, taking another
snapshot, calculating deltas, and printing out the table.

The sleep time is controlled via -i option or by user sending a signal or a
character to stdin. In the latter case, turbostat always adds 1 ms
sleep before it reads the counters, in order to avoid larger imprecisions
in the results in prints.

While the 1 ms delay may be a good idea for a "dumb" user, it is a
problem for an "aware" user. I do thousands and thousands of measurements
over a short period of time (like 2ms), and turbostat unconditionally adds
a 1ms to my interval, so I cannot get what I really need.

This patch removes the unconditional 1ms sleep. This is an expert user
tool, after all, and non-experts will unlikely ever use it in the non-fixed
interval mode anyway, so I think it is OK to remove the 1ms delay.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# c026c236 14-Aug-2019 Artem Bityutskiy <artem.bityutskiy@linux.intel.com>

tools/power turbostat: read from pipes too

Commit '47936f944e78 tools/power turbostat: fix printing on input' make
a valid fix, but it completely disabled piped stdin support, which is
a valuable use-case. Indeed, if stdin is a pipe, turbostat won't read
anything from it, so it becomes impossible to get turbostat output at
user-defined moments, instead of the regular intervals.

There is no reason why this should works for terminals, but not for
pipes. This patch improves the situation. Instead of ignoring pipes, we
read data from them but gracefully handle the EOF case.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# d93ea567 14-Jun-2019 Rajneesh Bhardwaj <rajneesh.bhardwaj@linux.intel.com>

tools/power turbostat: Add Ice Lake NNPI support

This enables turbostat utility on Ice Lake NNPI SoC.

Link: https://lkml.org/lkml/2019/6/5/1034
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# 570992fc 31-Aug-2019 Len Brown <len.brown@intel.com>

tools/power turbostat: rename has_hsw_msrs()

Perhaps if this more descriptive name had been used,
then we wouldn't have had the HSW ULT vs HSW CORE bug,
fixed by the previous commit.

Signed-off-by: Len Brown <len.brown@intel.com>


# cd188af5 31-Aug-2019 Len Brown <len.brown@intel.com>

tools/power turbostat: Fix Haswell Core systems

turbostat: cpu0: msr offset 0x630 read failed: Input/output error

because Haswell Core does not have C8-C10.

Output C8-C10 only on Haswell ULT.

Fixes: f5a4c76ad7de ("tools/power turbostat: consolidate duplicate model numbers")

Reported-by: Prarit Bhargava <prarit@redhat.com>
Suggested-by: Kosuke Tatsukawa <tatsu@ab.jp.nec.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# b62b3184 21-Apr-2019 Zhang Rui <rui.zhang@intel.com>

tools/power turbostat: add Jacobsville support

Jacobsville behaves like Denverton.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# eeb71c95 03-Apr-2019 Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>

tools/power turbostat: fix buffer overrun

turbostat could be terminated by general protection fault on some latest
hardwares which (for example) support 9 levels of C-states and show 18
"tADDED" lines. That bloats the total output and finally causes buffer
overrun. So let's extend the buffer to avoid this.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# 605736c6 08-Apr-2019 Gustavo A. R. Silva <gustavo@embeddedor.com>

tools/power turbostat: fix file descriptor leaks

Fix file descriptor leaks by closing fp before return.

Addresses-Coverity-ID: 1444591 ("Resource leak")
Addresses-Coverity-ID: 1444592 ("Resource leak")
Fixes: 5ea7647b333f ("tools/power turbostat: Warn on bad ACPI LPIT data")
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Reviewed-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# 15423b95 08-Apr-2019 Colin Ian King <colin.king@canonical.com>

tools/power turbostat: fix leak of file descriptor on error return path

Currently the error return path does not close the file fp and leaks
a file descriptor. Fix this by closing the file.

Fixes: 5ea7647b333f ("tools/power turbostat: Warn on bad ACPI LPIT data")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# d4794f25 25-Mar-2019 Yazen Ghannam <yazen.ghannam@amd.com>

tools/power turbostat: Make interval calculation per thread to reduce jitter

Turbostat currently normalizes TSC and other values by dividing by an
interval. This interval is the delta between the start of one global
(all counters on all CPUs) sampling and the start of another. However,
this introduces a lot of jitter into the data.

In order to reduce jitter, the interval calculation should be based on
timestamps taken per thread and close to the start of the thread's
sampling.

Define a per thread time value to hold the delta between samples taken
on the thread.

Use the timestamp taken at the beginning of sampling to calculate the
delta.

Move the thread's beginning timestamp to after the CPU migration to
avoid jitter due to the migration.

Use the global time delta for the average time delta.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# d743dae6 30-Aug-2019 Len Brown <len.brown@intel.com>

tools/power turbostat: remove duplicate pc10 column

Remove the duplicate pc10 column.

Fixes: be0e54c4ebbf ("turbostat: Build-in "Low Power Idle" counters support")
Reported-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# 5ebb34ed 27-Aug-2019 Peter Zijlstra <peterz@infradead.org>

x86/intel: Aggregate microserver naming

Currently big microservers have _XEON_D while small microservers have
_X, Make it uniformly: _D.

for i in `git grep -l "\(INTEL_FAM6_\|VULNWL_INTEL\|INTEL_CPU_FAM6\).*_\(X\|XEON_D\)"`
do
sed -i -e 's/\(\(INTEL_FAM6_\|VULNWL_INTEL\|INTEL_CPU_FAM6\).*ATOM.*\)_X/\1_D/g' \
-e 's/\(\(INTEL_FAM6_\|VULNWL_INTEL\|INTEL_CPU_FAM6\).*\)_XEON_D/\1_D/g' ${i}
done

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Cc: x86@kernel.org
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Link: https://lkml.kernel.org/r/20190827195122.677152989@infradead.org


# 5e741407 27-Aug-2019 Peter Zijlstra <peterz@infradead.org>

x86/intel: Aggregate big core graphics naming

Currently big core clients with extra graphics on have:

- _G
- _GT3E

Make it uniformly: _G

for i in `git grep -l "\(INTEL_FAM6_\|VULNWL_INTEL\|INTEL_CPU_FAM6\).*_GT3E"`
do
sed -i -e 's/\(\(INTEL_FAM6_\|VULNWL_INTEL\|INTEL_CPU_FAM6\).*\)_GT3E/\1_G/g' ${i}
done

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Cc: x86@kernel.org
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Link: https://lkml.kernel.org/r/20190827195122.622802314@infradead.org


# af239c44 27-Aug-2019 Peter Zijlstra <peterz@infradead.org>

x86/intel: Aggregate big core mobile naming

Currently big core mobile chips have either:

- _L
- _ULT
- _MOBILE

Make it uniformly: _L.

for i in `git grep -l "\(INTEL_FAM6_\|VULNWL_INTEL\|INTEL_CPU_FAM6\).*_\(MOBILE\|ULT\)"`
do
sed -i -e 's/\(\(INTEL_FAM6_\|VULNWL_INTEL\|INTEL_CPU_FAM6\).*\)_\(MOBILE\|ULT\)/\1_L/g' ${i}
done

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Cc: x86@kernel.org
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190827195122.568978530@infradead.org


# c66f78a6 27-Aug-2019 Peter Zijlstra <peterz@infradead.org>

x86/intel: Aggregate big core client naming

Currently the big core client models either have:

- no OPTDIFF
- _CORE
- _DESKTOP

Make it uniformly: 'no OPTDIFF'.

for i in `git grep -l "\(INTEL_FAM6_\|VULNWL_INTEL\|INTEL_CPU_FAM6\).*_\(CORE\|DESKTOP\)"`
do
sed -i -e 's/\(\(INTEL_FAM6_\|VULNWL_INTEL\|INTEL_CPU_FAM6\).*\)_\(CORE\|DESKTOP\)/\1/g' ${i}
done

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Cc: x86@kernel.org
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190827195122.513945586@infradead.org


# a61127c2 29-May-2019 Thomas Gleixner <tglx@linutronix.de>

treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 335

Based on 1 normalized pattern(s):

this program is free software you can redistribute it and or modify
it under the terms and conditions of the gnu general public license
version 2 as published by the free software foundation this program
is distributed in the hope it will be useful but without any
warranty without even the implied warranty of merchantability or
fitness for a particular purpose see the gnu general public license
for more details you should have received a copy of the gnu general
public license along with this program if not write to the free
software foundation inc 51 franklin st fifth floor boston ma 02110
1301 usa

extracted by the scancode license scanner the SPDX license identifier

GPL-2.0-only

has been chosen to replace the boilerplate/reference in 111 file(s).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexios Zavras <alexios.zavras@intel.com>
Reviewed-by: Allison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190530000436.567572064@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# 0f71d089 20-Mar-2019 Len Brown <len.brown@intel.com>

tools/power turbostat: update version number

Signed-off-by: Len Brown <len.brown@intel.com>


# 5ea7647b 25-Sep-2018 Prarit Bhargava <prarit@redhat.com>

tools/power turbostat: Warn on bad ACPI LPIT data

On some systems /sys/devices/system/cpu/cpuidle/low_power_idle_cpu_residency_us
or /sys/devices/system/cpu/cpuidle/low_power_idle_system_residency_us
return a file error because of bad ACPI LPIT data from a misconfigured BIOS.
turbostat interprets this failure as a fatal error and outputs

turbostat: CPU LPI: No data available

If the ACPI LPIT sysfs files return an error output a warning instead of
a fatal error, disable the ACPI LPIT evaluation code, and continue.

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# 8173c336 20-Mar-2019 Ben Hutchings <ben@decadent.org.uk>

tools/power turbostat: Add checks for failure of fgets() and fscanf()

Most calls to fgets() and fscanf() are followed by error checks.
Add an exit-on-error in the remaining cases.

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Len Brown <len.brown@intel.com>


# 3316f99a 16-Aug-2018 Calvin Walton <calvin.walton@kepstin.ca>

tools/power turbostat: Also read package power on AMD F17h (Zen)

The package power can also be read from an MSR. It's not clear exactly
what is included, and whether it's aggregated over all nodes or
reported separately.

It does look like this is reported separately per CCX (I get a single
value on the Ryzen R7 1700), but it might be reported separately per-
die (node?) on larger processors. If that's the case, it would have to
be recorded per node and aggregated for the socket.

Note that although Zen has these MSRs reporting power, it looks like
the actual RAPL configuration (power limits, configured TDP) is done
through PCI configuration space. I have not yet found any public
documentation for this.

Signed-off-by: Calvin Walton <calvin.walton@kepstin.ca>
Signed-off-by: Len Brown <len.brown@intel.com>


# 9392bd98 16-Aug-2018 Calvin Walton <calvin.walton@kepstin.ca>

tools/power turbostat: Add support for AMD Fam 17h (Zen) RAPL

Based on the Open-Source Register Reference for AMD Family 17h
Processors Models 00h-2Fh:
https://support.amd.com/TechDocs/56255_OSRR.pdf

These processors report RAPL support in bit 14 of CPUID 0x80000007 EDX,
and the following MSRs are present:
0xc0010299 (RAPL_PWR_UNIT), like Intel's RAPL_POWER_UNIT
0xc001029a (CORE_ENERGY_STAT), kind of like Intel's PP0_ENERGY_STATUS
0xc001029b (PKG_ENERGY_STAT), like Intel's PKG_ENERGY_STATUS

A notable difference from the Intel implementation is that AMD reports
the "Cores" energy usage separately for each core, rather than a
per-package total. The code has been adjusted to handle either case in a
generic way.

I haven't yet enabled collection of package power, due to being unable
to test it on multi-node systems (TR, EPYC).

Signed-off-by: Calvin Walton <calvin.walton@kepstin.ca>
Signed-off-by: Len Brown <len.brown@intel.com>


# 0a42d235 13-Aug-2018 Prarit Bhargava <prarit@redhat.com>

tools/power turbostat: Do not display an error on systems without a cpufreq driver

Running without a cpufreq driver is a valid case so warnings output in
this case should not be to stderr.

Use outf instead of stderr for these warnings.

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# 6de68fe1 14-Feb-2019 Len Brown <len.brown@intel.com>

tools/power turbostat: Add Die column

If the system has more than one software visible die per package,
print a Die column.

Signed-off-by: Len Brown <len.brown@intel.com>


# 937807d3 19-Mar-2019 Len Brown <len.brown@intel.com>

tools/power turbostat: Add Icelake support

From a turbostat point of view, Iceland is like Cannonlake.

Signed-off-by: Len Brown <len.brown@intel.com>


# 31a1f15c 19-Mar-2019 Len Brown <len.brown@intel.com>

tools/power turbostat: Cleanup CNL-specific code

no functional change.

Signed-off-by: Len Brown <len.brown@intel.com>


# 562855ee 19-Mar-2019 Len Brown <len.brown@intel.com>

tools/power turbostat: Cleanup CC3-skip code

no functional change

Signed-off-by: Len Brown <len.brown@intel.com>


# df2f677d 15-Feb-2019 Len Brown <len.brown@intel.com>

tools/power turbostat: Restore ability to execute in topology-order

turbostat executes on CPUs in "topology order".
This is an optimization for measuring profoundly idle systems --
as the closest hardware is woken next...

Fix a typo that was added with the sub-die-node support,
that broke topology ordering on multi-node systems.

Signed-off-by: Len Brown <len.brown@intel.com>


# 2a954966 12-Feb-2019 David Arcari <darcari@redhat.com>

tools/power turbostat: return the exit status of a command

turbostat failed to return a non-zero exit status even though the
supplied command (turbostat <command>) failed. Currently when turbostat
forks a command it returns zero instead of the actual exit status of the
command. Modify the code to return the exit status.

Signed-off-by: David Arcari <darcari@redhat.com>
Acked-by: Len Brown <len.brown@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# f5a4c76a 14-Dec-2018 Len Brown <len.brown@intel.com>

tools/power turbostat: consolidate duplicate model numbers

Often a new processor gets a new model number, but from a turbostat
point of view, it is the same as a previous model. Support duplicates
with 1-line updates, rather than error-prone scattering of model #'s.

Signed-off-by: Len Brown <len.brown@intel.com>


# 445640a5 14-Dec-2018 Len Brown <len.brown@intel.com>

tools/power turbostat: fix goldmont C-state limit decoding

When the C-state limit is 8 on Goldmont, PC10 is enabled.
Previously turbostat saw this as "undefined", and thus assumed
it should not show some counters, such as pc3, pc6, pc7.

Signed-off-by: Len Brown <len.brown@intel.com>


# 0ec712e3 21-Sep-2018 Len Brown <len.brown@intel.com>

tools/power turbostat: reduce debug output

A recent turbostat release increased topo.max_cpu_num
to make it convenient to handle sysfs bitmaps of 32-cpus.

But users, who regularly make use of "--debug", then saw a bunch of output
for cpus that were not present.

Remove that extra output by checking a cpu is online before dumping its info.

Signed-off-by: Len Brown <len.brown@intel.com>
Cc: Prarit Bhargava <prarit@redhat.com>


# 34041551 07-Aug-2018 Len Brown <len.brown@intel.com>

tools/power turbosat: fix AMD APIC-id output

turbostat recently gained a feature adding APIC and X2APIC columns.
While they are disabled by-default, they are enabled with --debug
or when explicitly requested, eg.

$ sudo turbostat --quiet --show Package,Node,Core,CPU,APIC,X2APIC date

But these columns erroneously showed zeros on AMD hardware.
This patch corrects the APIC and X2APIC [sic] columns on AMD.

Signed-off-by: Len Brown <len.brown@intel.com>


# f2c4db1b 07-Aug-2018 Peter Zijlstra <peterz@infradead.org>

x86/cpu: Sanitize FAM6_ATOM naming

Going primarily by:

https://en.wikipedia.org/wiki/List_of_Intel_Atom_microprocessors

with additional information gleaned from other related pages; notably:

- Bonnell shrink was called Saltwell
- Moorefield is the Merriefield refresh which makes it Airmont

The general naming scheme is: FAM6_ATOM_UARCH_SOCTYPE

for i in `git grep -l FAM6_ATOM` ; do
sed -i -e 's/ATOM_PINEVIEW/ATOM_BONNELL/g' \
-e 's/ATOM_LINCROFT/ATOM_BONNELL_MID/' \
-e 's/ATOM_PENWELL/ATOM_SALTWELL_MID/g' \
-e 's/ATOM_CLOVERVIEW/ATOM_SALTWELL_TABLET/g' \
-e 's/ATOM_CEDARVIEW/ATOM_SALTWELL/g' \
-e 's/ATOM_SILVERMONT1/ATOM_SILVERMONT/g' \
-e 's/ATOM_SILVERMONT2/ATOM_SILVERMONT_X/g' \
-e 's/ATOM_MERRIFIELD/ATOM_SILVERMONT_MID/g' \
-e 's/ATOM_MOOREFIELD/ATOM_AIRMONT_MID/g' \
-e 's/ATOM_DENVERTON/ATOM_GOLDMONT_X/g' \
-e 's/ATOM_GEMINI_LAKE/ATOM_GOLDMONT_PLUS/g' ${i}
done

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: dave.hansen@linux.intel.com
Cc: len.brown@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 538c48f2 26-Jul-2018 Len Brown <len.brown@intel.com>

tools/power turbostat: version 18.07.27

Signed-off-by: Len Brown <len.brown@intel.com>


# 5aa3d1a2 27-Jul-2018 Calvin Walton <calvin.walton@kepstin.ca>

tools/power turbostat: Read extended processor family from CPUID

This fixes the reported family on modern AMD processors (e.g. Ryzen,
which is family 0x17). Previously these processors all showed up as
family 0xf.

See the document
https://support.amd.com/TechDocs/56255_OSRR.pdf
section CPUID_Fn00000001_EAX for how to calculate the family
from the BaseFamily and ExtFamily values.

This matches the code in arch/x86/lib/cpu.c

Signed-off-by: Calvin Walton <calvin.walton@kepstin.ca>
Signed-off-by: Len Brown <len.brown@intel.com>


# 2ffbb224 26-Jul-2018 Prarit Bhargava <prarit@redhat.com>

tools/power turbostat: Fix logical node enumeration to allow for non-sequential physical nodes

turbostat fails on some multi-package topologies because the logical node
enumeration assumes that the nodes are sequentially numbered,
which causes the logical numa nodes to not be enumerated, or enumerated incorrectly.

Use a more robust enumeration algorithm which allows for non-seqential physical nodes.

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# cfce494d 25-Jul-2018 Len Brown <len.brown@intel.com>

tools/power turbostat: fix x2apic debug message output file

A recently added x2apic debug message was hard-coded to stderr.
That doesn't work with "-o outfile".

Signed-off-by: Len Brown <len.brown@intel.com>


# 4f206a0f 25-Jul-2018 Artem Bityutskiy <artem.bityutskiy@linux.intel.com>

tools/power turbostat: fix bogus summary values

This patch fixes a regression introduced in

commit 8cb48b32a5de ("tools/power turbostat: track thread ID in cpu_topology")

Turbostat uses incorrect cores number ('topo.num_cores') - its value is count
of logical CPUs, instead of count of physical cores. So it is twice as large as
it should be on a typical Intel system. For example, on a 6 core Xeon system
'topo.num_cores' is 12, and on a 52 core Xeon system 'topo.num_cores' is 104.

And interestingly, on a 68-core Knights Landing Intel system 'topo.num_cores'
is 272, because this system has 4 logical CPUs per core.

As a result, some of the turbostat calculations are incorrect. For example,
on idle 52-core Xeon system when all cores are ~99% in Core C6 (CPU%c6), the
summary (very first) line shows ~48% Core C6, while it should be ~99%.

This patch fixes the problem by fixing 'topo.num_cores' calculation.

Was:

1. Init 'thread_id' for all CPUs to -1
2. Run 'get_thread_siblings()' which sets it to 0 or 1
3. Increment 'topo.num_cores' when thread_id != -1 (bug!)

Now:

1. Init 'thread_id' for all CPUs to -1
2. Run 'get_thread_siblings()' which sets it to 0 or 1
3. Increment 'topo.num_cores' when thread_id is not 0

I did not have a chance to test this on an AMD machine, and only tested on a
couple of Intel Xeons (6 and 52 cores).

Reported-by: Vladislav Govtva <vladislav.govtva@intel.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# 9d83601a 20-Jul-2018 Len Brown <len.brown@intel.com>

tools/power turbostat: fix -S on UP systems

The -S (system summary) option failed to print any data on a 1-processor system.

Reported-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# 73780cd8 20-Jun-2018 Len Brown <len.brown@intel.com>

tools/power turbostat: version 18.06.20

Signed-off-by: Len Brown <len.brown@intel.com>


# 9ce80578 13-Jun-2018 Nathan Ciobanu <nathan.d.ciobanu@linux.intel.com>

tools/power turbostat: add the missing command line switches

Document the missing command line tokens in the help() function.

Signed-off-by: Nathan Ciobanu <nathan.d.ciobanu@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# cc481650 13-Jun-2018 Nathan Ciobanu <nathan.d.ciobanu@linux.intel.com>

tools/power turbostat: add single character tokens to help

Improve the help() output by adding the single character
tokens (e.g -a).

Signed-off-by: Nathan Ciobanu <nathan.d.ciobanu@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# 2ee19bde 13-Jun-2018 Nathan Ciobanu <nathan.d.ciobanu@linux.intel.com>

tools/power turbostat: alphabetize the help output

Sort the command line arguments output of help() in
alphabetical order in line with other linux tools.

Signed-off-by: Nathan Ciobanu <nathan.d.ciobanu@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# 42dd4520 08-Jun-2018 Nathan Ciobanu <nathan.d.ciobanu@linux.intel.com>

tools/power turbostat: fix segfault on 'no node' machines

Running turbostat on machines that don't expose nodes
in sysfs (no /sys/bus/node) causes a segfault or a -nan
value diesplayed in the log. This is caused by
physical_node_id being reported as -1 and logical_node_id
being calculated as a negative number resulting in the new
GET_THREAD/GET_CORE returning an incorrect address.

Signed-off-by: Nathan Ciobanu <nathan.d.ciobanu@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# 4c2122d4 06-Jun-2018 Len Brown <len.brown@intel.com>

tools/power turbostat: add optional APIC X2APIC columns

Add APIC and X2APIC columns to the topology section.

They are disabled-by-default -- enable like so:
--debug
or
--enable APIC,X2APIC

Signed-off-by: Len Brown <len.brown@intel.com>


# d9d226ff 06-Jun-2018 Len Brown <len.brown@intel.com>

tools/power turbostat: decode cpuid.1.HT

eg. the "HT" here:
CPUID(1): SSE3 MONITOR - EIST TM2 TSC MSR ACPI-TM HT TM

Signed-off-by: Len Brown <len.brown@intel.com>


# bdd5ae3a 06-Jun-2018 Len Brown <len.brown@intel.com>

tools/power turbostat: fix show/hide issues resulting from mis-merge

The --show and --hide options failed on "Node", which was listed as "Node%".
The --show and --hide options were generally fouled-up do due to come
content merges that scrambled the list of column name indexes.

Signed-off-by: Len Brown <len.brown@intel.com>


# 201d4f50 28-Jan-2018 Len Brown <len.brown@intel.com>

tools/power turbostat: update version number

Signed-off-by: Len Brown <len.brown@intel.com>


# 01235041 01-Jun-2018 Prarit Bhargava <prarit@redhat.com>

tools/power turbostat: Add Node in output

Output a Node column if there is more than one node/socket.

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# 40f5cfe7 01-Jun-2018 Prarit Bhargava <prarit@redhat.com>

tools/power turbostat: add node information into turbostat calculations

The previous patches have added node information to turbostat, but the
counters code does not take it into account.

Add node information from cpu_topology calculations to turbostat
counters.

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# 70a9c6e8 01-Jun-2018 Prarit Bhargava <prarit@redhat.com>

tools/power turbostat: remove num_ from cpu_topology struct

Cleanup, remove num_ from num_nodes_per_pkg, num_cores_per_node, and
num_threads_per_node.

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# 139dd0e0 01-Jun-2018 Prarit Bhargava <prarit@redhat.com>

tools/power turbostat: rename num_cores_per_pkg to num_cores_per_node

turbostat incorrectly assumes that there is one node per package. As a
result num_cores_per_pkg is not correctly named and is actually
num_cores_per_node.

Rename num_cores_per_pkg to num_cores_per_node.

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# 8cb48b32 01-Jun-2018 Prarit Bhargava <prarit@redhat.com>

tools/power turbostat: track thread ID in cpu_topology

The code can be simplified if the cpu_topology *cpus tracks the thread
IDs. This removes an additional file lookup and simplifies the counter
initialization code.

Add thread ID to cpu_topology information and cleanup the counter
initialization code.

v2: prevent thread_id from being overwritten

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# ef605741 01-Jun-2018 Prarit Bhargava <prarit@redhat.com>

tools/power turbostat: Calculate additional node information for a package

The code currently assumes each package has exactly one node. This is not
the case for AMD systems and Intel systems with COD. AMD systems also
may re-enumerate each node's core IDs starting at 0 (for example, an AMD
processor may have two nodes, each with core IDs from 0 to 7). In order
to properly enumerate the cores we need to track both the physical and
logical node IDs.

Add physical_node_id to track the node ID assigned by the kernel, and
logical_node_id used by turbostat to track the nodes per package ie) a
0-based count within the package.

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# 0e2d8f05 01-Jun-2018 Len Brown <len.brown@intel.com>

tools/power turbostat: Fix node and siblings lookup data

The turbostat code only looks at thread_siblings_list to determine if
processing units/threads are on the same the core. This works well on
Intel systems which have a shared L1 instruction and data cache. This
does not work on AMD systems which have shared L1 instruction cache but
separate L1 data caches. Other utilities also check sibling's core ID
to determine if the processing unit shares the same core.

Additionally, the cpu_topology *cpus list used in topology_probe() can
be used elsewhere in the code to simplify things.

Export *cpus to the entire turbostat code, and add Processing Unit/Thread
IDs information to each cpu_topology struct. Confirm that the thread
is on the same core as indicated by thread_siblings_list.

[v2]: Fixup CPU_* usage that caused gcc malloc error.

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# 843c5791 01-Jun-2018 Prarit Bhargava <prarit@redhat.com>

tools/power turbostat: set max_num_cpus equal to the cpumask length

Future fixes will use sysfs files that contain cpumask output. The code
needs to know the length of the cpumask in order to determine which cpus
are set in a cpumask. Currently topo.max_cpu_num is the maximum cpu
number. It can be increased the the maximum value of cpus represented in
cpumasks.

Set max_num_cpus to the length of a cpumask.

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# 023fe0ac 25-Apr-2018 Chen Yu <yu.c.chen@intel.com>

tools/power turbostat: if --num_iterations, print for specific number of iterations

There's a use case during test to only print specific round of iterations
if --num_iterations is specified, for example, with this patch applied:

turbostat -i 5 -n 4
will capture 4 samples with 5 seconds interval.

[lenb: renamed to --num_iterations from --iterations]

Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Reviewed-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# 997e5395 31-May-2018 Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>

tools/power turbostat: Add Cannon Lake support

All MSRs related to turbostat are same as Kabylake.
Even though SDM claims that core C3 residency can be read from MSR 0x662,
the read on this MSR fails on CNL platform. Hence disabled C3 MSR read
and display.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# 9d4eab02 01-Jun-2018 Len Brown <len.brown@intel.com>

tools/power turbostat: delete duplicate #defines

The SNB_C1_AUTO_UNDEMOTE definition should have been deleted once
it was copied into msr-index.h. One copy of the truth is better --
particularly when Matt needs to fix it:-)

Signed-off-by: Len Brown <len.brown@intel.com>


# e0d34648 13-Feb-2018 Matt Turner <mattst88@gmail.com>

tools/power turbostat: Correct SNB_C1/C3_AUTO_UNDEMOTE defines

According to the Intel Software Developers' Manual, Vol. 4, Order No.
335592, these macros have been reversed since they were added.

Fixes: 889facbee3e6 ("tools/power turbostat: v3.0: monitor Watts and Temperature")
Signed-off-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# 0748eaf0 31-May-2018 Len Brown <len.brown@intel.com>

tools/power turbostat: add POLL and POLL% column

Like the "C1" and "C1%" column, the new POLL and POLL% columns
show invocations and residency% during the measurement interval.

While it didn't seem important to track in the past,
we've recently found some Linux cpuidle bugs related to POLL%.

Signed-off-by: Len Brown <len.brown@intel.com>


# 4bd1f8f2 28-Jan-2018 Len Brown <len.brown@intel.com>

tools/power turbostat: Fix --hide Pk%pc10

The column header for PC10 residency is "Pk%pc10"
This is missing the 'g' that others have, eg Pkg%pc6,
to allow tab-delimited columns to fit into 8-columns.

However, --hide Pk%pc10 did not work, it was still looking for the 'g'.
This was confusing, because --list shows the correct "Pk%pc10"

Reported-by: Wendy Wang <wendy.wang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# be0e54c4 31-May-2018 Len Brown <len.brown@intel.com>

tools/power turbostat: Build-in "Low Power Idle" counters support

Linux 4.15 exports the ACPI Low Power Idle Table's
counters in /sys/devices/system/cpu/cpuidle/

low_power_idle_cpu_residency_us

Show this in the "CPU%LPI" column.

Today this reflects the "North Complex"
residency in PC10, so expect it to
closely follow "Pk%pc10".

low_power_idle_system_residency_us

Show this in the "SYS%LPI" column.

Today, this reflects the North is in PC10,
plus the PCH is sufficiently quiescent
to save additional power via the "S0ix"
system state, as measured by the
PCH SLP_S0 counter.

Signed-off-by: Len Brown <len.brown@intel.com>


# 94d6ab4b 27-Jan-2018 Len Brown <len.brown@intel.com>

tools/power turbostat: remove blank lines

When the user reuests to collect and show columns
that are not present on every row (eg. for every CPU)
turbostat still prints an (empty) line for every CPU.
Update so no blank lines are printed.

old:
# turbostat --quiet --show Pkg%pc6
Pkg%pc6
9.12
9.12

Pkg%pc6
9.12
9.12

new:
# turbostat --quiet --show Pkg%pc6
Pkg%pc6
9.12
9.12
Pkg%pc6
9.12
9.12

Reported-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# 3e8b62bf 05-Sep-2017 Artem Bityutskiy <artem.bityutskiy@linux.intel.com>

tools/power turbostat: a small C-states dump readability immprovement

Improve readability a little bit by changing this output:

MSR_PKG_CST_CONFIG_CONTROL: 0x00008407 (locked: pkg-cstate-limit=7: unlimited, automatic-c-state-conversion=off)

with this output:

MSR_PKG_CST_CONFIG_CONTROL: 0x00008407 (locked, pkg-cstate-limit=7 (unlimited), automatic-c-state-conversion=off)

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# ac980e13 05-Sep-2017 Artem Bityutskiy <artem.bityutskiy@linux.intel.com>

tools/power turbostat: dump BDX, SKX automatic C-state conversion bit

BDX and SKX have a bit that tells them to PROMOTE shallow
C-states requests to MWAIT(C6). It is generally a BIOS bug
if this bit is set. As we have encountered that BIOS bug,
let's print this bit in turbostat debug output.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# 733ef0f8 02-Jan-2018 Len Brown <len.brown@intel.com>

tools/power turbostat: do not hard-code 25MHz crystal on SKX

Some SKX use a 24 MHz crystal, so do not hard code 25 MHz.

Also, SKX crystal is not exact, because SKX uses an EMI reduction
circuit that costs a fraction of a percent.

Signed-off-by: Len Brown <len.brown@intel.com>


# 46c27978 08-Dec-2017 Len Brown <len.brown@intel.com>

tools/power turbostat: fix possible sprintf buffer overflow

Signed-off-by: Len Brown <len.brown@intel.com>


# fd3933ca 08-Nov-2017 Len Brown <len.brown@intel.com>

tools/power turbostat: fix MSR_IA32_MISC_ENABLE MWAIT printout

MSR_IA32_MISC_ENABLE[18] is the MWAIT ENABLE bit, not DISABLE bit...

so

MSR_IA32_MISC_ENABLE: 0x00850089 (TCC EIST No-MWAIT PREFETCH TURBO)

should print as:

MSR_IA32_MISC_ENABLE: 0x00850089 (TCC EIST MWAIT PREFETCH TURBO)

Signed-off-by: Len Brown <len.brown@intel.com>


# 47936f94 04-Oct-2017 Artem Bityutskiy <artem.bityutskiy@linux.intel.com>

tools/power turbostat: fix printing on input

The recent patch that implements table printing on a keypress introduced a
regression - turbostat prints the table almost continuously if it is run from a
daemon program.

The problem is also easy to reproduce like this:

echo | turbostat

The reason is that we cannot assume that stdin is always a TTY. It can be many
things.

This patch adds fixes the problem by limiting the new keypress functionality to
TTYs only. If stdin is not a TTY, we just sleep for the full interval time.

While on it, clean-up 'do_sleep()' to return no value, as callers do not expect
that anyway.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# b9ad8ee0 19-Jul-2017 Len Brown <len.brown@intel.com>

tools/power turbostat: end current interval upon newline input

In turbostat interval mode, a newline typed on standard input
will now conclude the current interval. Data will immediately
be collected and printed for that interval, and the next interval
will be started.

This is similar to the recently added SIGUSR1 feature.
But that is for use by programs, while this is for interactive use.

Signed-off-by: Len Brown <len.brown@intel.com>


# 07211960 15-Jul-2017 Len Brown <len.brown@intel.com>

tools/power turbostat: on SIGUSR1: sample, print and continue

Interval-mode turbostat now catches and discards SIGUSR1.

Thus, SIGUSR1 can be used to tell turbostat to cut short
the current measurement interval. Turbostat will then start
the next measurement interval using the regular interval length.

This can be used to give turbostat variable intervals.
Invoke turbostat with --interval LARGE_NUMBER_SEC
and have a program that has permission to send it a SIGUSR1
always before LARGE_NUMBER_SEC expires.

It may also be useful to use "--enable Time_Of_Day_Seconds"
to observe the actual interval length.

Signed-off-by: Len Brown <len.brown@intel.com>


# 8aa2ed0b 15-Jul-2017 Len Brown <len.brown@intel.com>

tools/power turbostat: on SIGINT: sample, print and exit

When running in interval-mode, catch interrupts
and print a final data record before exiting.

Signed-off-by: Len Brown <len.brown@intel.com>


# 3f44a5c6 17-Oct-2017 Len Brown <len.brown@intel.com>

tools/power turbostat: add --enable Time_Of_Day_Seconds

Add a Time_Of_Day_Seconds column showing when measurement
for each row was completed. Units are [sec.subsec] since Epoch,
as reported by gettimeofday(2).

While useful to correlate turbostat output with other tools,
this built-in column is disabled, by default.

Add the "--enable" option to enable such disabled-by-default
built-in columns:

"--enable Time_Of_Day_Seconds"
"--enable usec"

"--enable all", will enable all disabled-by-defauilt built-in counters.

When "--debug" is used, all disabled-by-default columns are enabled,
unless explicitly skipped using "--hide"

Signed-off-by: Len Brown <len.brown@intel.com>


# 2085e124 04-Aug-2017 Artem Bityutskiy <artem.bityutskiy@linux.intel.com>

tools/power turbostat: fix Skylake Xeon package C-state display

Turbostat neglects to display all package C-states for some Skylake Xeon BIOS configurations.

This is due to a typo in the table decoding MSR_PKG_CST_CONFIG_CONTROL (0x000000e2)

Here we fix that typo, according to Intel SDM, vol 4, Table 2-41 -
"MSRs Supported by Intel® Xeon® Processor Scalable Family with DisplayFamily_DisplayModel 06_55H".

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# c97cc7db 17-Oct-2017 Len Brown <len.brown@intel.com>

Revert "tools/power turbostat: stop migrating, unless '-m'"

This reverts commit c91fc8519d87715a3a173475ea3778794c139996.

That change caused a C6 and PC6 residency regression on large idle systems.

Users also complained about new output indicating jitter:

turbostat: cpu6 jitter 3794 9142

Signed-off-by: Len Brown <len.brown@intel.com>
Cc: 4.13+ <stable@vger.kernel.org> # v4.13+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# f7d44a8f 27-May-2017 Len Brown <len.brown@intel.com>

tools/power turbostat: update version number

Signed-off-by: Len Brown <len.brown@intel.com>


# f26b1519 23-Jun-2017 Len Brown <len.brown@intel.com>

tools/power turbostat: decode MSR_IA32_MISC_ENABLE only on Intel

otherwise, turbostat bails on on AMD Opteron boxes:

turbostat: cpu26: msr offset 0x1a0 read failed: Input/output error

Reported-by: Kamil Kolakowski <kkolakow@redhat.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# c91fc851 27-May-2017 Len Brown <len.brown@intel.com>

tools/power turbostat: stop migrating, unless '-m'

Turbostat has the capability to set its own affinity to
each CPU so that its MSR accesses are on the local CPU.

However, using the in-kernel cross-call in the msr driver
tends to be less invasive, so do that -- by-default.
'-m' remains to get the old behaviour.

Signed-off-by: Len Brown <len.brown@intel.com>


# f4fdf2b4 27-May-2017 Len Brown <len.brown@intel.com>

tools/power turbostat: if --debug, print sampling overhead

The --debug option now pre-pends each row with
the number of micro-seconds [usec] to collect
the finishing snapshot for that row.

Signed-off-by: Len Brown <len.brown@intel.com>


# a99d8730 20-May-2017 Len Brown <len.brown@intel.com>

tools/power turbostat: hide SKL counters, when not requested

Skylake has some new counters, and they were erroneously
exempt from --show and --hide

eg.

turbostat --quiet --show CPU
CPU Totl%C0 Any%C0 GFX%C0 CPUGFX%
- 116.73 90.56 85.69 79.00
0 117.78 91.38 86.47 79.71
2
1
3

is now

CPU
-
0
2
1
3

Signed-off-by: Len Brown <len.brown@intel.com>


# 5f9bf02a 04-Mar-2017 Len Brown <len.brown@intel.com>

tools/power turbostat: update version number

Signed-off-by: Len Brown <len.brown@intel.com>


# 95149369 12-Apr-2017 Len Brown <len.brown@intel.com>

tools/power turbostat: fix impossibly large CPU%c1 value

Most CPUs do not have a hardware c1 counter,
and so turbostat derives c1 residency:

c1 = TSC - MPERF - other_core_cstate_counters

As it is not possible to atomically read these coutners,
measurement jitter can case this calcuation to "go negative"
when very close to 0. Turbostat detect that case and
simply prints c1 = 0.00%

But that check neglected to account for systems where the TSC
crystal clock domain and the MPERF BCLK domain are differ by
a small amount. That allowed very small negative c1 numbers
to escape this check and be printed as huge positve numbers.

This code begs for a bit of cleanup, but this patch
is the minimal change to fix the issue.

Signed-off-by: Len Brown <len.brown@intel.com>


# 6dbd25a2 04-Mar-2017 Len Brown <len.brown@intel.com>

tools/power turbostat: update HWP dump to decimal from hex

Syntax only.

The HWP CAPABILTIES and REQUEST ratios are more easily
viewed in decimal -- just multiply by 100 and you get MHz...

new:
cpu0: MSR_HWP_CAPABILITIES: 0x010c1b23 (high 35 guar 27 eff 12 low 1)
cpu0: MSR_HWP_REQUEST: 0x80002301 (min 1 max 35 des 0 epp 0x80 window 0x0 pkg 0x0)

old:
cpu0: MSR_HWP_CAPABILITIES: 0x010c1b23 (high 0x23 guar 0x1b eff 0xc low 0x1)
cpu0: MSR_HWP_REQUEST: 0x80002301 (min 0x1 max 0x23 des 0x0 epp 0x80 window 0x0 pkg 0x0)

Signed-off-by: Len Brown <len.brown@intel.com>


# f4896fa5 04-Mar-2017 Len Brown <len.brown@intel.com>

tools/power turbostat: enable package THERM_INTERRUPT dump

cpu0: MSR_IA32_TEMPERATURE_TARGET: 0x00641400 (100 C)
cpu0: MSR_IA32_PACKAGE_THERM_STATUS: 0x884b0800 (25 C)
cpu0: MSR_IA32_PACKAGE_THERM_INTERRUPT: 0x00000003 (100 C, 100 C)

Enable the same per-core output, but hide it behind --debug
because it is too verbose on big systems.

Signed-off-by: Len Brown <len.brown@intel.com>


# 81824921 04-Mar-2017 Len Brown <len.brown@intel.com>

tools/power turbostat: show missing Core and GFX power on SKL and KBL

While the current SDM is silent on the matter, the Core and GFX
RAPL power meters on SKL and KBL appear to work -- so show them.

Reported-by: Yaroslav Isakov <yaroslav.isakov@gmail.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# 22048c54 04-Mar-2017 Len Brown <len.brown@intel.com>

tools/power turbostat: bugfix: GFXMHz column not changing

turbostat displays a GFXMHz column, which comes from reading
/sys/class/graphics/fb0/device/drm/card0/gt_cur_freq_mhz

But GFXMHz was not changing, even when a manual
cat /sys/class/graphics/fb0/device/drm/card0/gt_cur_freq_mhz
showed a new value.

It turns out that a rewind() on the open file is not sufficient,
fflush() (or a close/open) is needed to read fresh values.

Reported-by: Yaroslav Isakov <yaroslav.isakov@gmail.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# e3942ed8 21-Jan-2017 Len Brown <len.brown@intel.com>

tools/power turbostat: version 17.02.24

The turbostat before this last set of changes is obsolete.
This new version can do a lot more, but it also has
some different defaults, that might catch some off-guard.
So it seems a good time to give a new version number.

Signed-off-by: Len Brown <len.brown@intel.com>


# 5f3aea57 23-Feb-2017 Len Brown <len.brown@intel.com>

tools/power turbostat: bugfix: --add u32 was printed as u64

When the "u32" keyword is used with --add, it means that
the output should be truncated to 32-bits. This was not
happening and all 64-bits were printed.

Also, when no column name was used for an added MSR,
The default column name was in deximal, eg. MSR16.
Users report that they tend to use hex MSR numbers,
so print them in hex. To always fit into the columns,
use the syntax M0x10. Note that the user can always
supply any column header that they want.

eg --add msr0x10,MY_TSC

Signed-off-by: Len Brown <len.brown@intel.com>


# 0815a3d0 23-Feb-2017 Len Brown <len.brown@intel.com>

tools/power turbostat: show error on exec

When turbostat is run in one-shot command mode,
the parent takes the 'before' counter snapshot,
fork/exec/wait for the child to exit,
takes the 'after' counter snapshot,
and prints the results.

however, if the child fails to exec the command,
it immediately returns, without indicating that
anythign was wrong.

Add an error message showing that exec failed:

sudo turbostat sleeeep 4
...
turbostat: exec sleeeep: No such file or directory
...

Note that the parent will still print out the statistics,
because it can't tell the difference between the failed
exec and a command that is purposefully returning
the same status. Unfortunately, this may obscure the
error message. However, if the --out parameter is used,
the error message is evident on stderr.

Reported-by: Wendy Wang <wendy.wang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# 7293fccd 21-Feb-2017 Len Brown <len.brown@intel.com>

tools/power turbostat: dump p-state software config

cpu1: cpufreq driver: acpi-cpufreq
cpu1: cpufreq governor: ondemand
cpufreq boost: 1

or

cpu0: cpufreq driver: intel_pstate
cpu0: cpufreq governor: powersave
cpufreq intel_pstate no_turbo: 0

Signed-off-by: Len Brown <len.brown@intel.com>


# 7da6e3e2 21-Feb-2017 Len Brown <len.brown@intel.com>

tools/power turbostat: show package number, even without --debug

On multi-package systems, the "Package" column was being displayed
only if --debug was used. Show it always.

Signed-off-by: Len Brown <len.brown@intel.com>


# dd778a5e 21-Feb-2017 Len Brown <len.brown@intel.com>

tools/power turbostat: support "--hide C1" etc.

Originally, the only way to hide the sysfs C-state statistics columns
was with "--hide sysfs". This was because we process "--hide" before
we probe for those columns.

hack --hide to remember deferred hide requests, and apply
them when sysfs is probed.

"--hide sysfs" is still available as short-hand to refer to
the entire group of counters.

The down-side of this change is that we no longer error check for
bogus --hide column names. But the user will quickly figure that
out if a column they mean to hide is still there...

Signed-off-by: Len Brown <len.brown@intel.com>


# 4e4e1e7c 21-Feb-2017 Len Brown <len.brown@intel.com>

tools/power turbostat: move --Package and --processor into the --cpu option

--Package is now "--cpu package",
which will display just the 1st CPU in each package

--processor is not "--cpu core"
which will display just the 1st CPU in each core

Signed-off-by: Len Brown <len.brown@intel.com>


# 6168c2e0 16-Feb-2017 Len Brown <len.brown@intel.com>

tools/power turbostat: update --list feature

Make it possible to take the entire un-edited output
from `turbostat --list` and feed it to "turbostat --show"
or "turbostat --hide".

To do this, the leading comma was removed
(no mater what columns are active)
and also they dynamic C-state "C1, C2, C3" etc are replaced
by the string "sysfs", which refers to them as a group.

Signed-off-by: Len Brown <len.brown@intel.com>


# 0de6c0df 15-Feb-2017 Len Brown <len.brown@intel.com>

tools/power turbostat: use wide columns to display large numbers

When a counter overlfows 7 columns, it shifts the remaining
columns to the right, so they no longer line up under
their column header.

Update turbostat to dectect when it is handling large
numbers, and switch to wider columns where, necessary.

Reported-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# c8ade361 15-Feb-2017 Len Brown <len.brown@intel.com>

tools/power turbostat: Add --list option to show available header names

It is handy to know the list of column header names,
so that they can be used with --add and --skip

The new --list option shows them:

sudo ./turbostat --list --hide sysfs
,Core,CPU,Avg_MHz,Busy%,Bzy_MHz,TSC_MHz,IRQ,SMI,CPU%c1,CPU%c3,CPU%c6,CPU%c7,CoreTmp,PkgTmp,GFX%rc6,GFXMHz,PkgWatt,CorWatt,GFXWatt

Signed-off-by: Len Brown <len.brown@intel.com>


# 218f0e8d 14-Feb-2017 Len Brown <len.brown@intel.com>

tools/power turbostat: fix zero IRQ count shown in one-shot command mode

The IRQ column has been working for periodic mode,
but not in one-shot command mode, it shows only 0.

until now.

Signed-off-by: Len Brown <len.brown@intel.com>


# 1ef7d21a 10-Feb-2017 Len Brown <len.brown@intel.com>

tools/power turbostat: add --cpu parameter

With the --cpu parameter, turbostat prints only lines
for the specified set of CPUs:

sudo ./turbostat --quiet --show Core,CPU --cpu 0,1,3..5,6-7
Core CPU
- -
0 0
0 4
1 1
1 5
2 6
3 3
3 7

Signed-off-by: Len Brown <len.brown@intel.com>


# 41618e63 09-Feb-2017 Len Brown <len.brown@intel.com>

tools/power turbostat: print sysfs C-state stats

When turbostat shows % of time in a CPU idle power state,
it has always been showing information from underlying
hardware residency counters.

While this reflects what the hardware is doing, and is thus
useful for understanding the hardware,
it doesn't directly tell us what Linux requested --
which is useful for tuning Linux itself.

Here we add columns to turbostat to show the
Linux cpuidle sub-system statistics:
/sys/devices/system/cpu/cpu*/cpuidle/state*/*

The first group of columns are the "usage", which is the
number of times software requested that C-state in the
measurement interval. eg C1 below.

The second group of columns are the "time", which is the percentage
of the measurement interval time that software has requested
the specified C-state. eg C1% below.

These software counters can be compared to the underlying
hardware residency counters (eg CPU%c1 CPU%c3 CPU%c6 CPU%c7)
to compare what sofware requested to what the hardware delivered.

These sysfs attributes are discovered when turbostat starts,
rather than being "built in". So the --show and --hide
parameters do not know about these dynamic column names.
However "--show sysfs" and "--hide sysfs" act on the
entire group of columns:

turbostat --show sysfs
...
cpu4: POLL: CPUIDLE CORE POLL IDLE
cpu4: C1: MWAIT 0x00
cpu4: C1E: MWAIT 0x01
cpu4: C3: MWAIT 0x10
cpu4: C6: MWAIT 0x20
cpu4: C7s: MWAIT 0x32
...
C1 C1E C3 C6 C7s C1% C1E% C3% C6% C7s%
3 6 5 1 188 0.00 0.02 0.00 0.00 99.93
0 6 5 0 58 0.00 0.16 0.02 0.00 99.70
0 0 0 0 9 0.00 0.00 0.00 0.00 99.96
0 0 0 1 24 0.00 0.00 0.00 0.02 99.93
0 0 0 0 9 0.00 0.00 0.00 0.00 99.97
0 0 0 0 32 0.00 0.00 0.00 0.00 99.96
0 0 0 0 7 0.00 0.00 0.00 0.00 99.98
2 0 0 0 36 0.00 0.00 0.00 0.00 99.97
1 0 0 0 13 0.00 0.00 0.00 0.00 99.98

Signed-off-by: Len Brown <len.brown@intel.com>


# 495c7654 08-Feb-2017 Len Brown <len.brown@intel.com>

tools/power turbostat: extend --add option to accept /sys path

Previously, the --add option could specify only an MSR.

Here is is extended so an arbitrary /sys attribute,
as specified by an absolute file path name.

sudo ./turbostat --add /sys/devices/system/cpu/cpu0/cpuidle/state5/usage

Signed-off-by: Len Brown <len.brown@intel.com>


# ade0ebac 09-Feb-2017 Len Brown <len.brown@intel.com>

tools/power turbostat: skip unused counters on BDX

Skip these two counters on BDX, as they are always zero:
cc7, pc7

Signed-off-by: Len Brown <len.brown@intel.com>


# 31e07522 31-Jan-2017 Len Brown <len.brown@intel.com>

tools/power turbostat: fix decoding for GLM, DNV, SKX turbo-ratio limits

Newer processors do not hard-code the the number of cpus in each bin
to {1, 2, 3, 4, 5, 6, 7, 8} Rather, they can specify any number
of CPUS in each of the 8 bins:

eg.

...
37 * 100.0 = 3600.0 MHz max turbo 4 active cores
38 * 100.0 = 3700.0 MHz max turbo 3 active cores
39 * 100.0 = 3800.0 MHz max turbo 2 active cores
39 * 100.0 = 3900.0 MHz max turbo 1 active cores

could now look something like this:

...
37 * 100.0 = 3600.0 MHz max turbo 16 active cores
38 * 100.0 = 3700.0 MHz max turbo 8 active cores
39 * 100.0 = 3800.0 MHz max turbo 4 active cores
39 * 100.0 = 3900.0 MHz max turbo 2 active cores

Signed-off-by: Len Brown <len.brown@intel.com>


# 34c76197 27-Jan-2017 Len Brown <len.brown@intel.com>

tools/power turbostat: skip unused counters on SKX

Skip these four counters on SKX, as they are always zero:
cc3, pc3
cc7, pc7

Signed-off-by: Len Brown <len.brown@intel.com>


# 7170a374 27-Jan-2017 Len Brown <len.brown@intel.com>

tools/power turbostat: Denverton: use HW CC1 counter, skip C3, C7

The CC1 column in tubostat can be computed by subtracting
the core c-state residency countes from the total Cx residency.

CC1 = (Idle_time_as_measured by MPERF) - (all core C-states with
residency counters)

However, as the underlying counter reads are not atomic,
error can be noticed in this calculations, especially
when the numbers are small.

Denverton has a hardware CC1 residency counter
to improve the accuracy of the cc1 statistic -- use it.

At the same time, Denverton has no concept of CC3, PC3, CC7, PC7,
so skip collecting and printing those columns.

Finally, a note of clarification.
Turbostat prints the standard PC2 residency counter,
but on Denverton hardware, that actually means PC1E.
Turbostat prints the standard PC6 residency counter,
but on Denverton hardware, that actually means PC2.

At this point, we document that differnce in this commit message,
rather than adding a quirk to the software.

Signed-off-by: Len Brown <len.brown@intel.com>


# ac01ac13 26-Jan-2017 Len Brown <len.brown@intel.com>

tools/power turbostat: initial Gemini Lake SOC support

Gemini Lake is similar to Apollo Lake (Broxton/Goldmont)

Signed-off-by: Len Brown <len.brown@intel.com>


# 0f47c08d 26-Jan-2017 Len Brown <len.brown@intel.com>

tools/power turbostat: bug fixes to --add, --show/--hide features

Fix a bug with --add, where the title of the column
is un-initialized if not specified by the user.

The initial implementation of --show and --hide
neglected to handle the pc8/pc9/pc10 counters.

Fix a bug where "--show Core" only worked with --debug

Reported-by: Wendy Wang <wendy.wang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# 008d396e 09-Feb-2017 Len Brown <len.brown@intel.com>

tools/power turbostat: use tsc_tweak everwhere it is needed

The CPU ticks at a rate in the "bus clock" domain.
eg. 100 MHz * bus_ratio.

On newer processors, the TSC has been moved out of this BCLK
domain and into a separate crystal-clock domain.

While the TSC ticks "close to" the base frequency, those that look
closely at the numbers will notice small errors in calculations that
mix units of TSC clocks and bus clocks.

"tsc_tweak" was introduced to address the most visible
mixing -- the %Busy and the the Busy_MHz calculations.
(A simplification as since removed TSC from the BusyMHz calculation)

Here we apply the tsc_tweak to everyplace where BCLK
and TSC units are mixed. The results is that
on a system which is 100% idle, the sum of the C-states
are now much more likely to be closer to 100%.

Reported-by: Travis Downs <travis.downs@gmail.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# 96e47158 21-Jan-2017 Len Brown <len.brown@intel.com>

tools/power turbostat: print system config, unless --quiet

Some users want turbostat to tell them everything, by default.
Some users want turbostat to be quiet, by default.

I find that I'm in the 1st camp, and so I've never liked
needing to type the --debug parameter to decode the system
configuration.

So here we change the default and print the system configuration,
by default. (The --debug option is now un-documented, though
it does still exist for debugging turbostat internals)

When you do not want to see the system configuration
header, use the new "--quiet" option.

Signed-off-by: Len Brown <len.brown@intel.com>


# fee86541 20-Jan-2017 Len Brown <len.brown@intel.com>

tools/power turbostat: show all columns, independent of --debug

Some time ago, turbostat overflowed 80 columns.

So on the assumption that a "casual" user would always
want topology and frequency columns, we hid the rest
of the columns and the system configuration decoding
behind the --debug option.

Not everybody liked that change -- including me.
I use --debug 99% of the time...

Well, now we have "-o file" to put turbostat output into a file,
so unless you are watching real-time in a small window,
column count is less frequently a factor.

And more recently, we got the "--hide columnA,columnB" option
to specify columns to skip.

So now we "un-hide" the rest of the columns from behind --debug,
and show them all, by default.

Signed-off-by: Len Brown <len.brown@intel.com>


# 33148d67 20-Jan-2017 Len Brown <len.brown@intel.com>

tools/power turbostat: decode MSR_MISC_FEATURE_CONTROL

useful for observing if the BIOS disabled prefetch
Not architectural, but docuemented as present on NHM, SNB
and is present on others.

Signed-off-by: Len Brown <len.brown@intel.com>


# b3a34e93 20-Jan-2017 Len Brown <len.brown@intel.com>

tools/power turbostat: decode CPUID(6).TURBO

show the CPUID feature for turbo to clarify the case
when it may not be shown in MISC_ENABLE

CPUID(6): APERF, TURBO, DTS, PTM, No-HWP, No-HWPnotify, No-HWPwindow, No-HWPepp, No-HWPpkg, EPB
cpu4: MSR_IA32_MISC_ENABLE: 0x00850089 (TCC EIST MWAIT TURBO)

Signed-off-by: Len Brown <len.brown@intel.com>


# 0f7887c4 12-Jan-2017 Len Brown <len.brown@intel.com>

tools/power turbostat: dump Atom P-states correctly

Turbostat dumps MSR_TURBO_RATIO_LIMIT on Core Architecture.
But Atom Architecture uses MSR_ATOM_CORE_RATIOS and
MSR_ATOM_CORE_TURBO_RATIOS.

Signed-off-by: Len Brown <len.brown@intel.com>


# e6512624 11-Jan-2017 Len Brown <len.brown@intel.com>

tools/power turbostat: further decode MSR_IA32_MISC_ENABLE

Decode MISC_ENABLE.NO_TURBO,
also use the #defines in msr-index.h for decoding this register

cpu0: MSR_IA32_MISC_ENABLE: 0x00850089 (TCC EIST MWAIT TURBO)

Although it is not architectural, decode also
MSR_IA32_MISC_ENABLE.prefetch-disable (bit-9).
documented to be present on: Core, P4, Intel-Xeon
reserved on: Atom, Silvermont, Nehalem, SNB, PHI ec.

Signed-off-by: Len Brown <len.brown@intel.com>


# 710f273b 11-Jan-2017 Len Brown <len.brown@intel.com>

tools/power turbostat: add precision to --debug frequency output

Add a digit of precision to the --debug output for frequency range.
This is useful when BCLK is not an integer.

old:
6 * 83 = 500 MHz max efficiency frequency
26 * 83 = 2166 MHz base frequency

new:
6 * 83.3 = 499.8 MHz max efficiency frequency
26 * 83.3 = 2165.8 MHz base frequency

Signed-off-by: Len Brown <len.brown@intel.com>


# 0539ba11 09-Feb-2017 Len Brown <len.brown@intel.com>

tools/power turbostat: Baytrail c-state support

The Baytrail SOC, with its Silvermont core, has some unique properties:

1. a hardware CC1 residency counter
2. a module-c6 residency counter
3. a package-c6 counter at traditional package-c7 counter address.

The SOC does not support c3, pc3, c7 or pc7 counters.

Signed-off-by: Len Brown <len.brown@intel.com>


# 1df2e55a 07-Jan-2017 Len Brown <len.brown@intel.com>

tools/power turbostat: use new name for MSR_PKG_CST_CONFIG_CONTROL

Previously called MSR_NHM_SNB_PKG_CST_CFG_CTL

Signed-off-by: Len Brown <len.brown@intel.com>


# f2642888 07-Jan-2017 Len Brown <len.brown@intel.com>

tools/power turbostat: update MSR_PKG_CST_CONFIG_CONTROL decoding

AMT value 0 is unlimited, not PC0

Signed-off-by: Len Brown <len.brown@intel.com>


# 8f6196c1 07-Jan-2017 Len Brown <len.brown@intel.com>

tools/power turbostat: Baytrail: remove debug line in quiet mode

Without --debug, a debug line was printed on Baytrail:

SLM BCLK: 83.3 Mhz

Signed-off-by: Len Brown <len.brown@intel.com>


# 71616c8e 07-Jan-2017 Len Brown <len.brown@intel.com>

tools/power turbostat: decode Baytrail CC6 and MC6 demotion configuration

with --debug, see:

cpu0: MSR_CC6_DEMOTION_POLICY_CONFIG: 0x00000000 (DISable-CC6-Demotion)
cpu0: MSR_MC6_DEMOTION_POLICY_CONFIG: 0x00000000 (DISable-MC6-Demotion)

Note that the hardware default is to enable demotion,
and Linux started clearing these registers in 3.17.

Signed-off-by: Len Brown <len.brown@intel.com>


# cf4cbe53 01-Jan-2017 Len Brown <len.brown@intel.com>

tools/power turbostat: BYT does not have MSR_MISC_PWR_MGMT

and so --debug fails with:

turbostat: msr 1 offset 0x1aa read failed: Input/output error

It seems that baytrail, and airmont do not have this MSR.
It is included in subsequent Goldmont Atom.

Signed-off-by: Len Brown <len.brown@intel.com>


# 812db3f7 09-Feb-2017 Len Brown <len.brown@intel.com>

tools/power turbostat: Add --show and --hide parameters

Add the "--show" and "--hide" cmdline parameters.

By default, turbostat shows all columns.

turbostat --hide counter_list
will continue showing all columns, except for those listed.

turbostat --show counter_list
will show _only_ the listed columns

These features work for built-in counters, and have no effect
on columns added with the --add parameter.

Signed-off-by: Len Brown <len.brown@intel.com>


# 678a3bd1 09-Feb-2017 Len Brown <len.brown@intel.com>

tools/power turbostat: fix bugs in --add option

When --add was used more than once, overflowed buffers
caused some counters to be stored on top of others,
corrupting the results. Simplify the code by simply
reserving space for up to 16 added counters per each
cpu, core, package.

Per-cpu added counters were being printed only per-core.

Signed-off-by: Len Brown <len.brown@intel.com>


# 6886fee4 24-Dec-2016 Len Brown <len.brown@intel.com>

tools/power turbostat: remove obsolete -M, -m, -C, -c options

The new --add option has replaced the -M, -m, -C, -c options
Eg.

-M 0x10 is now --add msr0x10,raw
-m 0x10 is now --add msr0x10,raw,u32
-C 0x10 is now --add msr0x10,delta
-c 0x10 is now --add msr0x10,delta,u32

The --add option can be repeated to add any number of counters,
while the previous options were limited to adding one of each type.

In addition, the --add option can accept a column label,
and can also display a counter as a percentage of elapsed cycles.

Eg. --add msr0x3fe,core,percent,MY_CC3

Signed-off-by: Len Brown <len.brown@intel.com>


# 388e9c81 22-Dec-2016 Len Brown <len.brown@intel.com>

tools/power turbostat: Make extensible via the --add parameter

Create the "--add" parameter. This can be used to teach an existing
turbostat binary about any number of any type of counter.

turbostat(8) details the syntax for --add.

Signed-off-by: Len Brown <len.brown@intel.com>


# 7268d407 01-Dec-2016 Len Brown <len.brown@intel.com>

tools/power turbostat: Denverton uses a 25 MHz crystal, not 19.2 MHz

This changes only the TSC frequency decoding line seen with --debug

old: TSC: 1382 MHz (19200000 Hz * 216 / 3 / 1000000)
new: TSC: 1800 MHz (25000000 Hz * 216 / 3 / 1000000)

Signed-off-by: Len Brown <len.brown@intel.com>


# 5cc6323c 01-Dec-2016 Len Brown <len.brown@intel.com>

tools/power turbostat: line up headers when -M is used

The -M option adds an 18-column item, and the header
needs to be wide enough to keep the header aligned
with the columns.

Signed-off-by: Len Brown <len.brown@intel.com>


# d8ebb442 01-Dec-2016 Len Brown <len.brown@intel.com>

tools/power turbostat: fix SKX PKG_CSTATE_LIMIT decoding

SKX has fewer package C-states than previous generations,
and so the decoding of PKG_CSTATE_LIMIT has changed.

This changes the line ending with pkg-cstate-limit=XXX: pcYYY

Signed-off-by: Len Brown <len.brown@intel.com>


# 005c82d6 30-Nov-2016 Len Brown <len.brown@intel.com>

tools/power turbostat: Support Knights Mill (KNM)

Original-author: Piotr Luc <piotr.luc@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# ddadb8ad 11-Nov-2016 Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>

tools/power turbostat: Display HWP OOB status

Display if the HWP is enabled in OOB (Out of band) mode.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# 5bbac26e 30-Sep-2016 Xiaolong Wang <xiaolong.wang@linux.intel.com>

tools/power turbostat: fix Denverton BCLK

Add Denverton to the group of SandyBridge and later processors,
to let the bclk be recognized as 100MHz rather than 133MHz,
then avoid the wrong value of the frequencies based on it,
including Bzy_MHz, max efficiency freuency, base frequency,
and turbo mode frequencies.

Signed-off-by: Xiaolong Wang <xiaolong.wang@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# 869ce69e 16-Jun-2016 Len Brown <len.brown@intel.com>

tools/power turbostat: use intel-family.h model strings

All except for model 1F, a Nehalem, which is currently incorrectly
indentified as a Westmere in that new header.

Signed-off-by: Len Brown <len.brown@intel.com>


# 0f644909 16-Jun-2016 Jacob Pan <jacob.jun.pan@linux.intel.com>

tools/power/turbostat: Add Denverton RAPL support

The Denverton CPU RAPL supports package, core, and DRAM domains.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# 2c48c990 16-Jun-2016 Jacob Pan <jacob.jun.pan@linux.intel.com>

tools/power/turbostat: Add Denverton support

Denverton is an Atom based micro server which shares the same
Goldmont architecture as Broxton. The available C-states on
Denverton is a subset of Broxton with only C1, C1e, and C6.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# 9148494c 16-Jun-2016 Jacob Pan <jacob.jun.pan@linux.intel.com>

tools/power/turbostat: split core MSR support into status + limit

Some CPUs may not have PP0/Core domain power limit MSRs. We
should still allow its domain energy status to be used. This
patch splits PP0/Core RAPL into two separate flags for power
limit and energy status such that energy status can continue
to be reported without power limit.

Without this patch, turbostat will not be able to use the
remaining RAPL features if some PL MSRs are not present.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# 0a91e551 25-Apr-2016 Colin Ian King <colin.king@canonical.com>

tools/power turbostat: fix error case overflow read of slm_freq_table[]

When i >= SLM_BCLK_FREQS, the frequency read from the slm_freq_table
is off the end of the array because msr is set to 3 rather than the
actual array index i. Set i to 3 rather than msr to fix this.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# 01a67adf 22-Apr-2016 Mika Westerberg <mika.westerberg@linux.intel.com>

tools/power turbostat: Allocate correct amount of fd and irq entries

The tool uses topo.max_cpu_num to determine number of entries needed for
fd_percpu[] and irqs_per_cpu[]. For example on a system with 4 CPUs
topo.max_cpu_num is 3 so we get too small array for holding per-CPU items.

Fix this to use right number of entries, which is topo.max_cpu_num + 1.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# 3d109de2 22-Apr-2016 Len Brown <len.brown@intel.com>

tools/power turbostat: switch to tab delimited output

Switch to tab-delimited output from fixed-width columns
to make it simpler to import into spreadsheets.

As the fixed width columnns were 8-spaces wide,
the output on the screen should not change.

Signed-off-by: Len Brown <len.brown@intel.com>


# ba3dec99 22-Apr-2016 Len Brown <len.brown@intel.com>

tools/power turbostat: Gracefully handle ACPI S3

turbostat gives valid results across suspend to idle, aka freeze,
whether invoked in interval mode, or in command mode.
Indeed, this can be used to measure suspend to idle:

turbostat echo freeze > /sys/power/state

But this does not work across suspend to ACPI S3, because the
processor counters, including the TSC, are reset on resume.
Further, when turbostat detects a problem, it does't forgive
the hardware, and interval mode will print *'s from there on out.

Instead, upon detecting counters going backwards, simply
reset and start over.

Interval mode across ACPI S3: (observe TSC going backwards)

root@sharkbay:/home/lenb/turbostat-src# ./turbostat -M 0x10
CPU Avg_MHz Busy% Bzy_MHz TSC_MHz MSR 0x010
- 1 0.06 858 2294 0x0000000000000000
0 0 0.06 847 2294 0x0000002a254b98ac
1 1 0.06 878 2294 0x0000002a254efa3a
2 1 0.07 843 2294 0x0000002a2551df65
3 0 0.05 863 2294 0x0000002a2553fea2
turbostat: re-initialized with num_cpus 4
CPU Avg_MHz Busy% Bzy_MHz TSC_MHz MSR 0x010
- 2 0.20 849 2294 0x0000000000000000
0 2 0.26 856 2294 0x0000000449abb60d
1 2 0.20 844 2294 0x0000000449b087ec
2 2 0.21 850 2294 0x0000000449b35d5d
3 1 0.12 839 2294 0x0000000449b5fd5a
^C

Command mode across ACPI S3:
root@sharkbay:/home/lenb/turbostat-src# ./turbostat -M 0x10 sleep 10
./turbostat: Counter reset detected
14.196299 sec

Signed-off-by: Len Brown <len.brown@intel.com>


# e975db5d 06-Apr-2016 Len Brown <len.brown@intel.com>

tools/power turbostat: tidy up output on Joule counter overflow

The RAPL Joules counter is limited in capacity.
Turbostat estimates how soon it can roll-over
based on the max TDP of the processor --
which tells us the maximum increment rate.

eg.
RAPL: 2759 sec. Joule Counter Range, at 95 Watts

So if a sample duration is longer than 2759 seconds on this system,
'**' replace the decimal place in the display to indicate
that the results may be suspect.

But the display had an extra ' ' in this case, throwing off the columns.

Also, the -J "Joules" option appended an extra "time" column
to the display. While this may be useful, it printed the interval time,
which may not be the accurate time per processor. Remove this column,
which appeared only when using '-J',
as we plan to add accurate per-cpu interval times in a future commit.

Signed-off-by: Len Brown <len.brown@intel.com>


# ebf5926a 06-Jul-2016 Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>

tools/power turbostat: Replace MSR_NHM_TURBO_RATIO_LIMIT

Replace MSR_NHM_TURBO_RATIO_LIMIT with MSR_TURBO_RATIO_LIMIT.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 9185e988 06-Apr-2016 Len Brown <len.brown@intel.com>

tools/power turbostat: work around RC6 counter wrap

Sometimes the rc6 sysfs counter spontaneously resets,
causing turbostat prints a very large number
as it tries to calcuate % = 100 * (old - new) / interval

When we see (old > new), print ***.**% instead
of a bogus huge number.

Note that this detection is not fool-proof, as the counter
could reset several times and still result in new > old.

Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# cdc57272 06-Apr-2016 Len Brown <len.brown@intel.com>

tools/power turbostat: initial KBL support

KBL is similar to SKL

Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# ec53e594 06-Apr-2016 Len Brown <len.brown@intel.com>

tools/power turbostat: initial SKX support

SKX has a lot in common with HSX

Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# e8efbc80 06-Apr-2016 Len Brown <len.brown@intel.com>

tools/power turbostat: decode BXT TSC frequency via CPUID

Hard-code BXT ART to 19200MHz, so turbostat --debug
can fully enumerate TSC:

CPUID(0x15): eax_crystal: 3 ebx_tsc: 186 ecx_crystal_hz: 0
TSC: 1190 MHz (19200000 Hz * 186 / 3 / 1000000)

Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# e4085d54 06-Apr-2016 Len Brown <len.brown@intel.com>

tools/power turbostat: initial BXT support

Broxton has a lot in common with SKL

Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 5a63426e 06-Apr-2016 Len Brown <len.brown@intel.com>

tools/power turbostat: print IRTL MSRs

Some processors use the Interrupt Response Time Limit (IRTL) MSR value
to describe the maximum IRQ response time latency for deep
package C-states. (Though others have the register, but do not use it)
Lets print it out to give insight into the cases where it is used.

IRTL begain in SNB, with PC3/PC6/PC7, and HSW added PC8/PC9/PC10.

Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 8ae72255 06-Apr-2016 Len Brown <len.brown@intel.com>

tools/power turbostat: SGX state should print only if --debug

The CPUID.SGX bit was printed, even if --debug was used

Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 685b535b 13-Dec-2015 Chen Yu <yu.c.chen@intel.com>

tools/power turbostat: bugfix: TDP MSRs print bits fixing

MSR_CONFIG_TDP_NOMINAL:
should print all 8 bits of base_ratio (bit 0:7) 0xFF

MSR_CONFIG_TDP_LEVEL_1:
should print all 15 bits of PKG_MIN_PWR_LVL1 (bit 48:62) 0x7FFF
should print all 15 bits of PKG_MAX_PWR_LVL1 (bit 32:46) 0x7FFF
should print all 8 bits of LVL1_RATIO (bit 16:23) 0xFF
should print all 15 bits of PKG_TDP_LVL1 (bit 0:14) 0x7FFF

And the same modification to MSR_CONFIG_TDP_LEVEL_2.

MSR_TURBO_ACTIVATION_RATIO:
should print all 8 bits of MAX_NON_TURBO_RATIO (bit 0:7) 0xFF

Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# 6c34f160 13-Mar-2016 Len Brown <len.brown@intel.com>

tools/power turbostat: correct output for MSR_NHM_SNB_PKG_CST_CFG_CTL dump

MSR_NHM_SNB_PKG_CST_CFG_CTL: 0x1e008008 (...pkg-cstate-limit=0: unlimited)
should print as
MSR_NHM_SNB_PKG_CST_CFG_CTL: 0x1e008008 (...pkg-cstate-limit=8: unlimited)

Signed-off-by: Len Brown <len.brown@intel.com>


# 5aea2f7f 13-Mar-2016 Len Brown <len.brown@intel.com>

tools/power turbostat: call __cpuid() instead of __get_cpuid()

turbostat already checks whether calling each cpuid leavf is legal,
and it doesn't look at the function return value,
so call the simpler gcc intrinsic __cpuid() instead of __get_cpuid().

syntax only, no functional change

Signed-off-by: Len Brown <len.brown@intel.com>


# aa8d8cc7 11-Mar-2016 Len Brown <len.brown@intel.com>

tools/power turbostat: indicate SMX and SGX support

SGX presence is related to a SKL power workaround,
so lets show when that is enabled.

Signed-off-by: Len Brown <len.brown@intel.com>


# 0102b067 27-Feb-2016 Len Brown <len.brown@intel.com>

tools/power turbostat: detect and work around syscall jitter

The accuracy of Bzy_Mhz and Busy% depend on reading
the TSC, APERF, and MPERF close together in time.

When there is a very short measurement interval,
or a large system is profoundly idle, the changes
in APERF and MPERF may be very small.
They can be small enough that an expensive interrupt
between reading APERF and MPERF can cause the APERF/MPERF
ratio to become inaccurate, resulting in invalid
calculation and display of Bzy_MHz.

A dummy APERF read of APERF makes this problem
much more rare. Apparently this 1st systemn call
after exiting a long stretch of idle is when we
typically see expensive timer interrupts that cause
large jitter.

For the cases that dummy APERF read fails to prevent,
we compare the latency of the APERF and MPERF reads.
If they differ by more than 2x, we re-issue them.

Signed-off-by: Len Brown <len.brown@intel.com>


# fdf676e5 26-Feb-2016 Len Brown <len.brown@intel.com>

tools/power turbostat: show GFX%rc6

The column "GFX%c6" show the percentage of time the GPU
is in the "render C6" state, rc6. Deep package C-states on several
systems depend on the GPU being in RC6.

This information comes from the counter
/sys/class/drm/card0/power/rc6_residency_ms,
as read before and after the measurement interval.

Signed-off-by: Len Brown <len.brown@intel.com>


# 27d47356 26-Feb-2016 Len Brown <len.brown@intel.com>

tools/power turbostat: show GFXMHz

Under the column "GFXMHz", show a snapshot of this attribute:
/sys/class/graphics/fb0/device/drm/card0/gt_cur_freq_mhz

This is an instantaneous snapshot of what sysfs presents
at the end of the measurement interval. turbostat does
not average or otherwise perform any math on this value.

Signed-off-by: Len Brown <len.brown@intel.com>


# 562a2d37 26-Feb-2016 Len Brown <len.brown@intel.com>

tools/power turbostat: show IRQs per CPU

The new IRQ column shows how many interrupts have occurred on each CPU
during the measurement inteval. This information comes from
the difference between /proc/interrupts shapshots made before
and after the measurement interval.

The first row, the system summary, shows the sum of the IRQS
for all CPUs during that interval.

Signed-off-by: Len Brown <len.brown@intel.com>


# 36229897 26-Feb-2016 Len Brown <len.brown@intel.com>

tools/power turbostat: make fewer systems calls

skip the open(2)/close(2) on each msr read
by keeping the /dev/cpu/*/msr files open.

The remaining read(2) is generally far fewer cycles
than the removed open(2) system call.

Signed-off-by: Len Brown <len.brown@intel.com>


# 58cc30a4 24-Feb-2016 Len Brown <len.brown@intel.com>

tools/power turbostat: fix compiler warnings

Signed-off-by: Len Brown <len.brown@intel.com>


# b7d8c148 13-Feb-2016 Len Brown <len.brown@intel.com>

tools/power turbostat: add --out option for saving output in a file

By default...

Turbostat --debug gconfiguration info goes to stderr.

In FORK mode, turbostat statistics go to stderr.

In PERIODIC mode, turbostat statistics go to stdout.

These defaults do not change, but an option "--out file"
will send all output above only to the specified file.

Signed-off-by: Len Brown <len.brown@intel.com>


# 75d2e44e 13-Feb-2016 Len Brown <len.brown@intel.com>

tools/power turbostat: re-name "%Busy" field to "Busy%"

some tools processing turbostat output
have difficulty with items that begin with %...

Reported-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# cbf97aba 10-Feb-2016 Hubert Chrzaniuk <hubert.chrzaniuk@intel.com>

tools/power turbostat: Intel Xeon x200: fix turbo-ratio decoding

Following changes have been made:
- changed MSR_NHM_TURBO_RATIO_LIMIT to MSR_TURBO_RATIO_LIMIT in debug print
for consistency with Developer Manual
- updated definition of bitfields in MSR_TURBO_RATIO_LIMIT and appropriate
parsing code
- added x200 to list of architectures that do not support Nahlem compatible
definition of MSR_TURBO_RATIO_LIMIT register (x200 has the register but
bits definition is custom)
- fixed typo in code that parses MSR_TURBO_RATIO_LIMIT
(logical instead of bitwise operator)
- changed MSR_TURBO_RATIO_LIMIT parsing algorithm so the print out had the
same order as implementations for other platforms

Signed-off-by: Hubert Chrzaniuk <hubert.chrzaniuk@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# 121b48bb 10-Feb-2016 Chrzaniuk, Hubert <hubert.chrzaniuk@intel.com>

tools/power turbostat: Intel Xeon x200: fix erroneous bclk value

x200 does not enable any way to programmatically obtain bus clock
speed. Bclk for the architecture has a fixed value of 100 MHz.
At the same time x200 cannot be included in has_snb_msrs since
it does not support C7 idle state.

prior to this patch, MHz values reported on this chip
were erroneously calculated using bclk of 133MHz,
causing MHz values to be reported 33% higher than actual.

Signed-off-by: Hubert Chrzaniuk <hubert.chrzaniuk@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# 2a0609c0 12-Feb-2016 Len Brown <len.brown@intel.com>

tools/power turbostat: allow sub-sec intervals

turbostat -i interval_sec

will sample and display statistics every interval_sec.
interval_sec used to be a whole number of seconds,
but now we accept a decimal, as small as 0.001 sec (1 ms).

Signed-off-by: Len Brown <len.brown@intel.com>


# 1b69317d 02-Mar-2016 Colin Ian King <colin.king@canonical.com>

tools/power turbostat: fix various build warnings

When building with gcc 6 we're getting various build warnings that just
require some trivial function declaration and call fixes:

turbostat.c: In function ‘dump_cstate_pstate_config_info’:
turbostat.c:1973:1: warning: type of ‘family’ defaults to ‘int’
dump_cstate_pstate_config_info(family, model)
turbostat.c:1973:1: warning: type of ‘model’ defaults to ‘int’
turbostat.c: In function ‘get_tdp’:
turbostat.c:2145:8: warning: type of ‘model’ defaults to ‘int’
double get_tdp(model)
turbostat.c: In function ‘perf_limit_reasons_probe’:
turbostat.c:2259:6: warning: type of ‘family’ defaults to ‘int’
void perf_limit_reasons_probe(family, model)
turbostat.c:2259:6: warning: type of ‘model’ defaults to ‘int’

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-wbicer8n0s9qe6ql8h9x478e@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# f0057310 02-Dec-2015 Len Brown <len.brown@intel.com>

tools/power turbostat: Decode MSR_MISC_PWR_MGMT

This MSR is helpful to show if P-state HW coordination
is enabled or disabled.

Signed-off-by: Len Brown <len.brown@intel.com>


# 7f5c258e 30-Nov-2015 Len Brown <len.brown@intel.com>

tools/power turbostat: decode HWP registers

# turbostat --debug
...
CPUID(6): ... HWP, HWPnotify, HWPwindow, HWPepp, HWPpkg ...
...
cpu0: MSR_PM_ENABLE: 0x00000001 (HWP)
cpu0: MSR_HWP_CAPABILITIES: 0x01050916 (high 0x16 guar 0x9 eff 0x5 low 0x1)
cpu0: MSR_HWP_REQUEST: 0x80001604 (min 0x4 max 0x16 des 0x0 epp 0x80 window 0x0 pkg 0x0)
cpu0: MSR_HWP_INTERRUPT: 0x00000001 (EN_Guaranteed_Perf_Change, Dis_Excursion_Min)
cpu0: MSR_HWP_STATUS: 0x00000000 (No-Guaranteed_Perf_Change, No-Excursion_Min)

Signed-off-by: Len Brown <len.brown@intel.com>


# 61a87ba7 23-Nov-2015 Len Brown <len.brown@intel.com>

tools/power turbostat: CPUID(0x16) leaf shows base, max, and bus frequency

This CPUID leaf is available on Skylake:

CPUID(0x16): base_mhz: 1500 max_mhz: 2200 bus_mhz: 100

Signed-off-by: Len Brown <len.brown@intel.com>


# 69807a63 20-Nov-2015 Len Brown <len.brown@intel.com>

tools/power turbostat: decode more CPUID fields

for debugging, dump a few more fields:

CPUID(1): SSE3 MONITOR EIST TM2 TSC MSR ACPI-TM TM

cpu0: MSR_IA32_MISC_ENABLE: 0x00850089 (TCC EIST MONITOR)

Signed-off-by: Len Brown <len.brown@intel.com>


# ec0adc53 12-Nov-2015 Len Brown <len.brown@intel.com>

tools/power turbostat: use new name for MSR_PLATFORM_INFO

MSR_PLATFORM_INFO is the new name for MSR_NHM_PLATFORM_INFO

no functional change

Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 759d2a93 22-Oct-2015 Len Brown <len.brown@intel.com>

tools/power turbostat: bugfix: print MAX_NON_TURBO_RATIO

MSR_TURBO_ACTIVATION_RATIO: 0x00000016 (MAX_NON_TURBO_RATIO=6 lock=0)
should print all 7 bits of MAX_NON_TURBO_RATIO (in decimal):
MSR_TURBO_ACTIVATION_RATIO: 0x00000016 (MAX_NON_TURBO_RATIO=22 lock=0)

Signed-off-by: Len Brown <len.brown@intel.com>


# 21ed5574 19-Oct-2015 Len Brown <len.brown@intel.com>

tools/power turbostat: simplify Bzy_MHz calculation

Bzy_MHz = TSC_delta*tsc_tweak/APERF_delta/MPERF_delta/measurement_interval

becomes

Bzy_MHz = base_mhz/APERF_delta/MPERF_delta

on systems which support MSR_NHM_PLATFORM_INFO.

base_mhz is calculated directly from the base_ratio
reported in MSR_NHM_PLATFORM_INFO * bclk,
and bclk is discovered via MSR or cpuid.

This reduces the dependency of Bzy_MHz calculation on the TSC.
Previously, there were 4 TSC readings required in each caculation,
the raw TSC delta combined with the measurement_interval.
This also removes the "tsc_tweak" correction factor used when
TSC runs on a different base clock from the CPU's bclk.

After this change, tsc_tweak is used only for %Busy.

The end-result should be a Bzy_MHz result slightly less prone to jitter.

Signed-off-by: Len Brown <len.brown@intel.com>


# af71b980 26-Sep-2015 Len Brown <len.brown@intel.com>

tools/power turbosat: update version number

Signed-off-by: Len Brown <len.brown@intel.com>


# a2b7b749 25-Sep-2015 Len Brown <len.brown@intel.com>

tools/power turbostat: SKL: Adjust for TSC difference from base frequency

On a Skylake with 1500MHz base frequency,
the TSC runs at 1512MHz.

This is because the TSC is no longer in the n*100 MHz BCLK domain,
but is now in the m*24MHz crystal clock domain. (24 MHz * 63 = 1512 MHz)

This adds error to several calculations in turbostat,
unless the TSC sample sizes are adjusted for this difference.

Note that calculations in the time domain are immune
from this issue, as the timing sub-system has already
calibrated the TSC against a known wall clock.

AVG_MHz = APERF_delta/measurement_interval

need no adjustment. APERF_delta is in the BCLK domain,
and measurement_interval is in the time domain.

TSC_MHz = TSC_delta/measurement_interval

needs no adjustment -- as we really do want to report
the actual measured TSC delta here, and measurement_interval
is in the accurate time domain.

%Busy = MPERF_delta/TSC_delta

needs adjustment to use TSC_BCLK_DOMAIN_delta.
TSC_BCLK_DOMAIN_delta = TSC_delta * base_hz / tsc_hz

Bzy_MHz = TSC_delta/APERF_delta/MPERF_delta/measurement_interval

need adjustment as above.

No other metrics in turbostat need to be adjusted.

Before:

CPU Avg_MHz %Busy Bzy_MHz TSC_MHz
- 550 24.84 2216 1512
0 2191 98.73 2219 1514
2 0 0.01 2130 1512
1 9 0.43 2016 1512
3 2 0.08 2016 1512

After:

CPU Avg_MHz %Busy Bzy_MHz TSC_MHz
- 550 25.05 2198 1512
0 2190 99.62 2199 1512
2 0 0.01 2152 1512
1 9 0.46 2000 1512
3 2 0.10 2000 1512

Note that in this example, the "Before" Bzy_MHz
was reported as exceeding the 2200 max turbo rate.
Also, even a pinned spin loop would not be reported
as over 99% busy.

Signed-off-by: Len Brown <len.brown@intel.com>


# b2b34dfe 14-Sep-2015 Hubert Chrzaniuk <hubert.chrzaniuk@intel.com>

tools/power turbostat: KNL workaround for %Busy and Avg_MHz

KNL increments APERF and MPERF every 1024 clocks.
This is compliant with the architecture specification,
which requires that only the ratio of APERF/MPERF need be valid.

However, turbostat takes advantage of the fact that these
two MSRs increment every un-halted clock
at the actual and base frequency:

AVG_MHz = APERF_delta/measurement_interval

%Busy = MPERF_delta/TSC_delta

This quirk is needed for these calculations to also work on KNL,
which would otherwise show a value 1024x smaller than expected.

Signed-off-by: Hubert Chrzaniuk <hubert.chrzaniuk@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# 756357b8 25-Sep-2015 Len Brown <len.brown@intel.com>

tools/power turbostat: IVB Xeon: fix --debug regression

Staring in Linux-4.3-rc1,
commit 6fb3143b561c ("tools/power turbostat: dump CONFIG_TDP")
touches MSR 0x648, which is not supported on IVB-Xeon.
This results in "turbostat --debug" exiting on those systems:

turbostat: /dev/cpu/2/msr offset 0x648 read failed: Input/output error

Remove IVB-Xeon from the list of machines supporting with that MSR.

Signed-off-by: Len Brown <len.brown@intel.com>


# bd6906ed 24-Jul-2015 Len Brown <len.brown@intel.com>

tools/power turbostat: fix typo on DRAM column in Joules-mode

< RAM_W
> RAM_J

Reported-by: Hubert Chrzaniuk <hubert.chrzaniuk@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# a01e72fb 15-Jul-2015 Len Brown <len.brown@intel.com>

tools/power turbostat: fix parameter passing for forked command

turbostat supports forked command when sampling cpu state. However,
the forked command is not allowed to be executed with options, otherwise
turbostat might regard these options as invalid turbostat options.

For example:

./turbostat stress -c 4 -t 10
./turbostat: unrecognized option '-t'

Reported-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# 6fb3143b 17-Jun-2015 Len Brown <len.brown@intel.com>

tools/power turbostat: dump CONFIG_TDP

Config TDP is a feature that allows parts to be configured
for different thermal limits after they have left the factory.

This can have an effect on the operation of the part,
particularly in determiniing...

Max Non-turbo Ratio
Turbo Activation Ratio

Signed-off-by: Len Brown <len.brown@intel.com>


# bfae2052 16-Jun-2015 Len Brown <len.brown@intel.com>

tools/power turbostat: cpu0 is no longer hard-coded, so update output

The --debug option reads a number of per-package MSRs.
Previously we explicitly read them on cpu0, but recently
turbostat changed to read them on the current "base_cpu".

Update the print-out to reflect base_cpu, rather than
the hard-coded cpu0.

Signed-off-by: Len Brown <len.brown@intel.com>


# a68c7c3f 27-May-2015 Len Brown <len.brown@intel.com>

tools/power turbostat: update version number to 4.7

Signed-off-by: Len Brown <len.brown@intel.com>


# 7ce7d5de 25-May-2015 Prarit Bhargava <prarit@redhat.com>

tools/power turbostat: allow running without cpu0

Linux-3.7 added CONFIG_BOOTPARAM_HOTPLUG_CPU0,
allowing systems to offline cpu0.

But when cpu0 is offline, turbostat will not run:

# turbostat ls
turbostat: no /dev/cpu/0/msr

This patch replaces the hard-coded use of cpu0 in turbostat
with the current cpu, allowing it to run without a cpu0.

Fewer cross-calls may also be needed due to use of current cpu,
though this hard-coding was used only for the --debug preamble.

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# e9be7dd6 25-May-2015 Len Brown <len.brown@intel.com>

tools/power turbostat: correctly decode of ENERGY_PERFORMANCE_BIAS

When EPB is 0xF, turbosat was incorrectly describing it as "custom"
instead of calling it "powersave":

< cpu0: MSR_IA32_ENERGY_PERF_BIAS: 0x0000000f (custom)
> cpu0: MSR_IA32_ENERGY_PERF_BIAS: 0x0000000f (powersave)

Signed-off-by: Len Brown <len.brown@intel.com>


# fb5d4327 20-May-2015 Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>

tools/power turbostat: enable turbostat to support Knights Landing (KNL)

Changes mainly to account for minor differences in Knights Landing(KNL):
1. KNL supports C1 and C6 core states.
2. KNL supports PC2, PC3 and PC6 package states.
3. KNL has a different encoding of the TURBO_RATIO_LIMIT MSR

Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# e275b388 15-Apr-2015 Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>

tools/power turbostat: correctly display more than 2 threads/core

Without this update, turbostat displays only 2 threads per core.
Some processors, such as Xeon Phi, have more.

Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# e9257f5f 01-Apr-2015 Len Brown <len.brown@intel.com>

tools/power turbostat: correct dumped pkg-cstate-limit value

HSW expanded MSR_PKG_CST_CONFIG_CONTROL.Package-C-State-Limit,
from bits[2:0] used by previous implementations, to [3:0].
The value 1000b is unlimited, and is used by BDW and SKL too.

Signed-off-by: Len Brown <len.brown@intel.com>


# 8a5bdf41 01-Apr-2015 Len Brown <len.brown@intel.com>

tools/power turbostat: calculate TSC frequency from CPUID(0x15) on SKL

turbostat --debug
...
CPUID(0x15): eax_crystal: 2 ebx_tsc: 100 ecx_crystal_hz: 0
TSC: 1200 MHz (24000000 Hz * 100 / 2 / 1000000)

Signed-off-by: Len Brown <len.brown@intel.com>


# 40ee8e3b 04-Dec-2014 Andrey Semin <andrey.semin@intel.com>

tools/power turbostat: correct DRAM RAPL units on recent Xeon processors

While not yet documented in the Software Developer's Manual,
the data-sheet for modern Xeon states that DRAM RAPL ENERGY units
are fixed at 15.3 uJ, rather than being discovered via MSR.

Before this patch, DRAM energy on these products is over-stated by turbostat
because the RAPL units are 4x larger.

ref: "Xeon E5-2600 v3/E5-1600 v3 Datasheet Volume 2"
http://www.intel.com/content/dam/www/public/us/en/documents/datasheets/xeon-e5-v3-datasheet-vol-2.pdf

Signed-off-by: Andrey Semin <andrey.semin@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# 0b2bb692 25-Mar-2015 Len Brown <len.brown@intel.com>

tools/power turbostat: Initial Skylake support

Skylake adds some additional residency counters.

Skylake supports a different mix of RAPL registers
from any previous product.

In most other ways, Skylake is like Broadwell.

Signed-off-by: Len Brown <len.brown@intel.com>


# a21d38c8 24-Mar-2015 Len Brown <len.brown@intel.com>

tools/power turbostat: modprobe msr, if needed

Some distros (Ubuntu) ship the msr driver as a module.
If turbosat is run as root on those systems, and discovers
that there is no /dev/cpu/cpu0/msr, it will now "modprobe msr"
for the user.

If not root, the modprobe attempt will fail, and turbostat will exit as before:

turbostat: no /dev/cpu/0/msr, Try "# modprobe msr" : No such file or directory

Signed-off-by: Len Brown <len.brown@intel.com>


# fcd17211 23-Mar-2015 Len Brown <len.brown@intel.com>

tools/power turbostat: dump MSR_TURBO_RATIO_LIMIT2

and up to 18 cores of turbo ratio limit
when using the turbostat --debug option.

Signed-off-by: Len Brown <len.brown@intel.com>


# 12bb43c6 13-Apr-2015 Len Brown <len.brown@intel.com>

tools/power turbostat: use new MSR_TURBO_RATIO_LIMIT names

s/MSR_NHM_TURBO_RATIO_LIMIT/MSR_TURBO_RATIO_LIMIT/
s/MSR_IVT_TURBO_RATIO_LIMIT/MSR_TURBO_RATIO_LIMIT1/

syntax only -- use the documented strings describing these registers.

Signed-off-by: Len Brown <len.brown@intel.com>


# 8f61f359 23-Mar-2015 Len Brown <len.brown@intel.com>

tools/power turbostat: label base frequency

syntax only.

The cool kids are now using the phrase "base frequency",
where in the past we used "max non-turbo frequency" or "TSC frequency".

This distinction becomes important when a processor has a TSC
that runs at a different speed than the "base frequency".

Signed-off-by: Len Brown <len.brown@intel.com>


# e33cbe85 13-Mar-2015 Len Brown <len.brown@intel.com>

tools/power turbostat: update PERF_LIMIT_REASONS decoding

cosmetic only.

order the decoding of MSR_PERF_LIMIT_REASONS bits
from MSB to LSB -- which you notice when more than 1 bit is set
and you are, say, comparing the output to the documentation...

Signed-off-by: Len Brown <len.brown@intel.com>


# 1cc21f7b 22-Feb-2015 Len Brown <len.brown@intel.com>

tools/power turbostat: simplify default output

Casual turbostat users generally just want to know MHz.
So by default, just print enough information to make sense of MHz.

All the other configuration data and columns for C-states and temperature etc,
are printed with the --debug option.

Signed-off-by: Len Brown <len.brown@intel.com>


# 48a0631c 10-Feb-2015 Len Brown <len.brown@intel.com>

tools/power turbostat: support additional Broadwell model

Signed-off-by: Len Brown <len.brown@intel.com>


# d8af6f5f 09-Feb-2015 Len Brown <len.brown@intel.com>

tools/power turbostat: update parameters, documentation

Long format options added, though the short ones should still work.
eg. the new "--Counter 0x10" is the same as the old "-C 0x10"

Note this Incompatibility:
Old:
-v displayed verbose debug output

New:
-v and --version simpaly display version

Additional parameters:
-d and --debug display verbose debug output
-h and --help display a help message

Updated turbosat.8 man page accordingly.

Signed-off-by: Len Brown <len.brown@intel.com>


# ee7e38e3 09-Feb-2015 Len Brown <len.brown@intel.com>

tools/power turbostat: Skip printing disabled package C-states

Replaced previously open-coded Package C-state Limit decoding
with table-driven decoding. In doing so, updated to match January 2015
"Intel(R) 64 and IA-23 Architectures Software Developer's Manual".

In the past, turbostat would print package C-state residency columns
for all package states supported by the model's architecture, even though
a particular SKU may not support them, or they may be disabled by the BIOS.
Now turbostat will skip printing colunns if MSRs indicate that they are not enabled.
eg. many SKUs don't support PC7, and so that column will no longer be printed.

Signed-off-by: Len Brown <len.brown@intel.com>


# a729617c 22-Jan-2015 Len Brown <len.brown@intel.com>

tools/power turbostat: relax dependency on APERF_MSR

While turbostat is significantly less useful on systems
with no APERF_MSR, it seems more friendly
to run on such systems and report what we can,
rather than refusing to run.

Update man page to reflect recent changes.

Signed-off-by: Len Brown <len.brown@intel.com>


# d7899447 22-Jan-2015 Len Brown <len.brown@intel.com>

tools/power turbostat: relax dependency on invariant TSC

Turbostat can be useful on systems that do not support invariant TSC,
so allow it to run on those systgems.

All arithmetic in turbostat using the TSC value is per-processsor,
so it does not depend on the TSC values being in sync acrosss processors.

Turbostat uses gettimeofday() for the measurement interval
rather than using the TSC directly, so that key metric
is also immune from variable TSC.

Turbostat prints a TSC sanity check column:

TSC_MHz = TSC_delta/interval

If this column is constant and is close to the processor
base frequency, then the TSC is behaving properly.

The other key turbostat columns are calculated this way:

Avg_Mhz = APERF_delta/interval

%Busy = MPERF_delta/TSC_delta

Bzy_MHz = TSC_delta/APERF_delta/MPERF_delta/interval

Tested on Core2 and Core2-Xeon, and so this patch includes
a few other changes to remove the assumption that target
systems are Nehalem and newer.

Signed-off-by: Len Brown <len.brown@intel.com>


# 3a9a941d 15-Aug-2014 Len Brown <len.brown@intel.com>

tools/power turbostat: decode MSR_*_PERF_LIMIT_REASONS

The Processor generation code-named Haswell
added MSR_{CORE | GFX | RING}_PERF_LIMIT_REASONS
to explain when and how the processor limits frequency.

turbostat -v
will now decode these bits.

Each MSR has an "Active" set of bits which describe
current conditions, and a "Logged" set of bits,
which describe what has happened since last cleared.

Turbostat currently doesn't clear the log bits.

Signed-off-by: Len Brown <len.brown@intel.com>


# 98481e79 14-Aug-2014 Len Brown <len.brown@intel.com>

tools/power turbostat: relax dependency on root permission

For turbostat to run as non-root, it needs to permissions:

1. read access to /dev/cpu/*/msr
via standard user/group/world file permissions

2. CAP_SYS_RAWIO
eg. # setcap cap_sys_rawio=ep turbostat

Yes, running as root still works.

Signed-off-by: Len Brown <len.brown@intel.com>


# e7c95ff3 14-Aug-2014 Len Brown <len.brown@intel.com>

tools/power turbostat: tweak whitespace in output format

turbostat -S
output was off by 1 space before this patch.

Signed-off-by: Len Brown <len.brown@intel.com>


# 3482124a 01-May-2014 Jean Delvare <jdelvare@suse.de>

tools / power: turbostat: Drop temperature checks

The Intel 64 and IA-32 Architectures Software Developer's Manual says
that TjMax is stored in bits 23:16 of MSR_TEMPERATURE TARGET (0x1a2).
That's 8 bits, not 7, so it must be masked with 0xFF rather than 0x7F.

The manual has no mention of which values should be considered valid,
which kind of implies that they all are. Arbitrarily discarding values
outside a specific range is wrong. The upper range check had to be
fixed recently (commit 144b44b1) and the lower range check is just as
wrong. See bug #75071:

https://bugzilla.kernel.org/show_bug.cgi?id=75071

There are many Xeon processor series with TjMax of 70, 71 or 80
degrees Celsius, way below the arbitrary 85 degrees Celsius limit.
There may be other (past or future) models with even lower limits.

So drop this arbitrary check. The only value that would be clearly
invalid is 0. Everything else should be accepted.

After these changes, turbostat is aligned with what the coretemp
driver does.

Signed-off-by: Jean Delvare <jdelvare@suse.de>
Cc: Len Brown <len.brown@intel.com>
Acked-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 4e8e863f 27-Feb-2014 Len Brown <len.brown@intel.com>

tools/power turbostat: Run on Broadwell

Signed-off-by: Len Brown <len.brown@intel.com>


# fc04cc67 05-Feb-2014 Len Brown <len.brown@intel.com>

tools/power turbostat: simplify output, add Avg_MHz

Use 8 columns for each number ouput.
We don't fit into 80 columns on most machines,
so keep the format simple.

Print frequency in MHz instead of GHz.
We've got 8 columns now, so use them to
show low frequency in a more natural unit.

Many users didn't understand what %c0 meant,
so re-name it to be %Busy.

Add Avg_MHz column, which is the frequency that many
users expect to see -- the total number of cycles executed
over the measurement interval.

People found the previous GHz to be confusing, since
it was the speed only over the non-idle interval.
That measurement has been re-named Bzy_MHz.

Suggested-by: Dirk J. Brandewie
Signed-off-by: Len Brown <len.brown@intel.com>


# 3b4d5c7f 23-Jan-2014 Andy Shevchenko <andriy.shevchenko@linux.intel.com>

tools/power turbostat: introduce -s to dump counters

The new option allows just run turbostat and get dump of counter values. It's
useful when we have something more than one program to test.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# f591c38b 23-Jan-2014 Andy Shevchenko <andriy.shevchenko@linux.intel.com>

tools/power turbostat: remove unused command line option

The -s is not used, let's remove it, and update quick help accordingly.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# 5c56be9a 16-Dec-2013 Dirk Brandewie <dirk.j.brandewie@intel.com>

turbostat: Add option to report joules consumed per sample

Add "-J" option to report energy consumed in joules per sample. This option
also adds the sample time to the reported values.

Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# e6f9bb3c 03-Dec-2013 Len Brown <len.brown@intel.com>

turbostat: run on HSX

Haswell Xeon has slightly different RAPL support than client HSW,
which prevented the previous version of turbostat from running on HSX.

Signed-off-by: Len Brown <len.brown@intel.com>


# b2c95d90 20-Aug-2013 Josh Triplett <josh@joshtriplett.org>

turbostat: Clean up error handling; disambiguate error messages; use err and errx

Most of turbostat's error handling consists of printing an error (often
including an errno) and exiting. Since perror doesn't support a format
string, those error messages are often ambiguous, such as just showing a
file path, which doesn't uniquely identify which call failed.

turbostat already uses _GNU_SOURCE, so switch to the err and errx
functions from err.h, which take a format string.

Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Len Brown <len.brown@intel.com>


# 57a42a34 20-Aug-2013 Josh Triplett <josh@joshtriplett.org>

turbostat: Factor out common function to open file and exit on failure

Several different functions in turbostat contain the same pattern of
opening a file and exiting on failure. Factor out a common fopen_or_die
function for that.

Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Len Brown <len.brown@intel.com>


# 95aebc44 20-Aug-2013 Josh Triplett <josh@joshtriplett.org>

turbostat: Add a helper to parse a single int out of a file

Many different chunks of code in turbostat open a file, parse a single
int out of it, and close it. Factor that out into a common function.

Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Len Brown <len.brown@intel.com>


# 74823419 20-Aug-2013 Josh Triplett <josh@joshtriplett.org>

turbostat: Check return value of fscanf

Some systems declare fscanf with the warn_unused_result attribute. On
such systems, turbostat generates the following warnings:

turbostat.c: In function 'get_core_id':
turbostat.c:1203:8: warning: ignoring return value of 'fscanf', declared with attribute warn_unused_result [-Wunused-result]
turbostat.c: In function 'get_physical_package_id':
turbostat.c:1186:8: warning: ignoring return value of 'fscanf', declared with attribute warn_unused_result [-Wunused-result]
turbostat.c: In function 'cpu_is_first_core_in_package':
turbostat.c:1169:8: warning: ignoring return value of 'fscanf', declared with attribute warn_unused_result [-Wunused-result]
turbostat.c: In function 'cpu_is_first_sibling_in_core':
turbostat.c:1148:8: warning: ignoring return value of 'fscanf', declared with attribute warn_unused_result [-Wunused-result]

Fix these by checking the return value of those four calls to fscanf.

Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Len Brown <len.brown@intel.com>


# 2b92865e 20-Aug-2013 Josh Triplett <josh@joshtriplett.org>

turbostat: Use GCC's CPUID functions to support PIC

turbostat uses inline assembly to call cpuid. On 32-bit x86, on systems
that have certain security features enabled by default that make -fPIC
the default, this causes a build error:

turbostat.c: In function ‘check_cpuid’:
turbostat.c:1906:2: error: PIC register clobbered by ‘ebx’ in ‘asm’
asm("cpuid" : "=a" (fms), "=c" (ecx), "=d" (edx) : "a" (1) : "ebx");
^

GCC provides a header cpuid.h, containing a __get_cpuid function that
works with both PIC and non-PIC. (On PIC, it saves and restores ebx
around the cpuid instruction.) Use that instead.

Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Cc: stable@vger.kernel.org
Signed-off-by: Len Brown <len.brown@intel.com>


# 2e9c6bc7 20-Aug-2013 Josh Triplett <josh@joshtriplett.org>

turbostat: Don't attempt to printf an off_t with %zx

turbostat uses the format %zx to print an off_t. However, %zx wants a
size_t, not an off_t. On 32-bit targets, those refer to different
types, potentially even with different sizes. Use %llx and a cast
instead, since printf does not have a length modifier for off_t.

Without this patch, when compiling for a 32-bit target:

turbostat.c: In function 'get_msr':
turbostat.c:231:3: warning: format '%zx' expects argument of type 'size_t', but argument 4 has type 'off_t' [-Wformat]

Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Len Brown <len.brown@intel.com>


# b731f311 20-Aug-2013 Josh Triplett <josh@joshtriplett.org>

turbostat: Don't put unprocessed uapi headers in the include path

turbostat's Makefile puts arch/x86/include/uapi/ in the include path, so
that it can include <asm/msr.h> from it. It isn't in general safe to
include even uapi headers directly from the kernel tree without
processing them through scripts/headers_install.sh, but asm/msr.h
happens to work.

However, that include path can break with some versions of system
headers, by overriding some system headers with the unprocessed versions
directly from the kernel source. For instance:

In file included from /build/x86-generic/usr/include/bits/sigcontext.h:28:0,
from /build/x86-generic/usr/include/signal.h:339,
from /build/x86-generic/usr/include/sys/wait.h:31,
from turbostat.c:27:
../../../../arch/x86/include/uapi/asm/sigcontext.h:4:28: fatal error: linux/compiler.h: No such file or directory

This occurs because the system bits/sigcontext.h on that build system
includes <asm/sigcontext.h>, and asm/sigcontext.h in the kernel source
includes <linux/compiler.h>, which scripts/headers_install.sh would have
filtered out.

Since turbostat really only wants a single header, just include that one
header rather than putting an entire directory of kernel headers on the
include path.

In the process, switch from msr.h to msr-index.h, since turbostat just
wants the MSR numbers.

Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Cc: stable@vger.kernel.org
Signed-off-by: Len Brown <len.brown@intel.com>


# 144b44b1 08-Nov-2013 Len Brown <len.brown@intel.com>

tools / power turbostat: Support Silvermont

Support the next generation Intel Atom processor
mirco-architecture, formerly called Silvermont.

The server version, formerly called "Avoton",
is named the "Intel(R) Atom(TM) Processor C2000 Product Family".

The client version, formerly called "Bay Trail",
is named the "Intel Atom Processor Z3000 Series",
as well as various "Intel Pentium Processor"
and "Intel Celeron Processor" brands, depending
on form-factor.

Silvermont has a set of MSRs not far off from NHM,
but the RAPL register set is a sub-set of those previously supported.

Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# b844db31 12-Jun-2013 Josh Triplett <josh@joshtriplett.org>

turbostat: Increase output buffer size to accommodate C8-C10

On platforms with C8-C10 support, the additional C-states cause
turbostat to overrun its output buffer of 128 bytes per CPU. Increase
this to 256 bytes per CPU.

[ As a bugfix, this should go into 3.10; however, since the C8-C10
support didn't go in until after 3.9, this need not go into any stable
kernel. ]

Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Cc: Len Brown <len.brown@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# ca58710f 21-Nov-2012 Kristen Carlson Accardi <kristen@linux.intel.com>

tools/power turbostat: display C8, C9, C10 residency

Display residency in the new C-states, C8, C9, C10.

C8, C9, C10 are present on some:
"Fourth Generation Intel(R) Core(TM) Processors",
which are based on Intel(R) microarchitecture code name Haswell.

Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# 149c2319 15-Mar-2013 Len Brown <len.brown@intel.com>

tools/power turbostat: additional Haswell CPU-id

There is an additional HSW CPU-id, 0x46,
which has C-states exactly like CPU-id 0x45.

Signed-off-by: Len Brown <len.brown@intel.com>


# 1ed51011 10-Feb-2013 Len Brown <len.brown@intel.com>

tools/power turbostat: display SMI count by default

The SMI counter is popular -- so display it by default
rather than requiring an option. What the heck,
we've blown the 80 column budget on many systems already...

Note that the value displayed is the delta
during the measurement interval.
The absolute value of the counter can still be seen with
the generic 32-bit MSR option, ie. -m 0x34

Signed-off-by: Len Brown <len.brown@intel.com>


# 67920418 31-Jan-2013 Len Brown <len.brown@intel.com>

tools/power turbostat: decode MSR_IA32_POWER_CTL

When verbose is enabled, print the C1E-Enable
bit in MSR_IA32_POWER_CTL.

also delete some redundant tests on the verbose variable.

Signed-off-by: Len Brown <len.brown@intel.com>


# 70b43400 07-Jan-2013 Len Brown <len.brown@intel.com>

tools/power turbostat: support Haswell

This patch enables turbostat to run properly on the
next-generation Intel(R) Microarchitecture, code named "Haswell" (HSW).

HSW supports the BCLK and counters found in SNB.

Signed-off-by: Len Brown <len.brown@intel.com>


# 889facbe 07-Nov-2012 Len Brown <len.brown@intel.com>

tools/power turbostat: v3.0: monitor Watts and Temperature

Show power in Watts and temperature in Celsius
when hardware support is present.

Intel's Sandy Bridge and Ivy Bridge processor generations support RAPL
(Run-Time-Average-Power-Limiting). Per the Intel SDM
(Intel® 64 and IA-32 Architectures Software Developer Manual)
RAPL provides hardware energy counters and power control MSRs
(Model Specific Registers). RAPL MSRs are designed primarily
as a method to implement power capping. However, they are useful
for monitoring system power whether or not power capping is used.

In addition, Turbostat now shows temperature from DTS
(Digital Thermal Sensor) and PTM (Package Thermal Monitor) hardware,
if present.

As before, turbostat reads MSRs, and never writes MSRs.

New columns are present in turbostat output:

The Pkg_W column shows Watts for each package (socket) in the system.
On multi-socket systems, the system summary on the 1st row shows the sum
for all sockets together.

The Cor_W column shows Watts due to processors cores.
Note that Core_W is included in Pkg_W.

The optional GFX_W column shows Watts due to the graphics "un-core".
Note that GFX_W is included in Pkg_W.

The optional RAM_W column on server processors shows Watts due to DRAM DIMMS.
As DRAM DIMMs are outside the processor package, RAM_W is not included in Pkg_W.

The optional PKG_% and RAM_% columns on server processors shows the % of time
in the measurement interval that RAPL power limiting is in effect on the
package and on DRAM.

Note that the RAPL energy counters have some limitations.

First, hardware updates the counters about once every milli-second.
This is fine for typical turbostat measurement intervals > 1 sec.
However, when turbostat is used to measure events that approach
1ms, the counters are less useful.

Second, the 32-bit energy counters are subject to wrapping.
For example, a counter incrementing 15 micro-Joule units
on a 130 Watt TDP server processor could (in theory)
roll over in about 9 minutes. Turbostat detects and handles
up to 1 counter overflow per measurement interval.
But when the measurement interval exceeds the guaranteed
counter range, we can't detect if more than 1 overflow occured.
So in this case turbostat indicates that the results are
in question by replacing the fractional part of the Watts
in the output with "**":

Pkg_W Cor_W GFX_W
3** 0** 0**

Third, the RAPL counters are energy (Joule) counters -- they sum up
weighted events in the package to estimate energy consumed. They are
not analong power (Watt) meters. In practice, they tend to under-count
because they don't cover every possible use of energy in the package.
The accuracy of the RAPL counters will vary between product generations,
and between SKU's in the same product generation, and with temperature.

turbostat's -v (verbose) option now displays more power and thermal configuration
information -- as shown on the turbostat.8 manual page.
For example, it now displays the Package and DRAM Thermal Design Power (TDP):

cpu0: MSR_PKG_POWER_INFO: 0x2f064001980410 (130 W TDP, RAPL 51 - 200 W, 0.045898 sec.)
cpu0: MSR_DRAM_POWER_INFO,: 0x28025800780118 (35 W TDP, RAPL 15 - 75 W, 0.039062 sec.)
cpu8: MSR_PKG_POWER_INFO: 0x2f064001980410 (130 W TDP, RAPL 51 - 200 W, 0.045898 sec.)
cpu8: MSR_DRAM_POWER_INFO,: 0x28025800780118 (35 W TDP, RAPL 15 - 75 W, 0.039062 sec.)

Signed-off-by: Len Brown <len.brown@intel.com>


# ddac0d68 29-Nov-2012 Len Brown <len.brown@intel.com>

tools/power turbostat: fix output buffering issue

In periodic mode, turbostat writes to stdout,
but users were un-able to re-direct stdout, eg.

turbostat > outputfile

would result in an empty outputfile.

Signed-off-by: Len Brown <len.brown@intel.com>


# e52966c0 08-Nov-2012 Len Brown <len.brown@intel.com>

tools/power turbostat: prevent infinite loop on migration error path

Turbostat assumed if it can't migrate to a CPU, then the CPU
must have gone off-line and turbostat should re-initialize
with the new topology.

But if turbostat can not migrate because it is restricted by
a cpuset, then it will fail to migrate even after re-initialization,
resulting in an infinite loop.

Spit out a warning when we can't migrate
and endure only 2 re-initialize cycles in a row
before giving up and exiting.

Signed-off-by: Len Brown <len.brown@intel.com>


# 9c63a650 30-Oct-2012 Len Brown <len.brown@intel.com>

tools/power/x86/turbostat: share kernel MSR #defines

Now that turbostat is built in the kernel tree,
it can share MSR #defines with the kernel.

Signed-off-by: Len Brown <len.brown@intel.com>
Cc: x86@kernel.org


# d91bb17c 31-Oct-2012 Len Brown <len.brown@intel.com>

tools/power turbostat: graceful fail on garbage input

When invald MSR's are specified on the command line,
turbostat should simply print an error and exit.

Signed-off-by: Len Brown <len.brown@intel.com>


# 39300ffb 31-Oct-2012 Len Brown <len.brown@intel.com>

tools/power turbostat: Repair Segmentation fault when using -i option

Fix regression caused by commit 8e180f3cb6b7510a3bdf14e16ce87c9f5d86f102
(tools/power turbostat: add [-d MSR#][-D MSR#] options to print counter
deltas)

Signed-off-by: Len Brown <len.brown@intel.com>


# f9240813 06-Oct-2012 Len Brown <len.brown@intel.com>

tools/power/turbostat: add option to count SMIs, re-name some options

Counting SMIs is popular, so add a dedicated "-s" option to do it,
and juggle some of the other option letters.

-S is now system summary (was -s)
-c is 32 bit counter (was -d)
-C is 64-bit counter (was -D)
-p is 1st thread in core (was -c)
-P is 1st thread in package (was -p)

bump the minor version number

Signed-off-by: Len Brown <len.brown@intel.com>


# 8e180f3c 21-Sep-2012 Len Brown <len.brown@intel.com>

tools/power turbostat: add [-d MSR#][-D MSR#] options to print counter deltas

# turbostat -d 0x34
is useful for printing the number of SMI's within an interval
on Nehalem and newer processors.

where
# turbostat -m 0x34
will simply print out the total SMI count since reset.

Suggested-by: Andi Kleen
Signed-off-by: Len Brown <len.brown@intel.com>


# 2f32edf1 21-Sep-2012 Len Brown <len.brown@intel.com>

tools/power turbostat: add [-m MSR#] option

-m MSR# prints the specified MSR in 32-bit format
-M MSR# prints the specified MSR in 64-bit format

Signed-off-by: Len Brown <len.brown@intel.com>


# 130ff304 21-Sep-2012 Len Brown <len.brown@intel.com>

tools/power turbostat: make -M output pretty

The -M option dumps the specified 64-bit MSR with every sample.

Previously it was output at the end of each line.
However, with the v2 style of printing, the lines are now staggered,
making MSR output hard to read.

So move the MSR output column to the left where things are aligned.

Signed-off-by: Len Brown <len.brown@intel.com>


# 6574a5d5 20-Sep-2012 Len Brown <len.brown@intel.com>

tools/power turbostat: print more turbo-limit information

The "turbo-limit" is the maximum opportunistic processor
speed, assuming no electrical or thermal constraints.
For a given processor, the turbo-limit varies, depending
on the number of active cores. Generally, there is more
opportunity when fewer cores are active.

Under the "-v" verbose option, turbostat would
print the turbo-limits for the four cases
of 1 to 4 cores active.

Expand that capability to cover the cases of turbo
opportunities with up to 16 cores active.

Note that not all hardware platforms supply this information,
and that sometimes a valid limit may be specified for
a core which is not actually present.

Signed-off-by: Len Brown <len.brown@intel.com>


# d7db6901 20-Sep-2012 Len Brown <len.brown@intel.com>

tools/power turbostat: delete unused line

MSR_TSC is no longer needed because
we now use RDTSC directly.

Signed-off-by: Len Brown <len.brown@intel.com>


# 1300651b 26-Sep-2012 Len Brown <len.brown@intel.com>

tools/power turbostat: run on IVB Xeon

This fix is required to run on IVB Xeon,
which previously had an incorrect cpuid model number listed.

Signed-off-by: Len Brown <len.brown@intel.com>


# c3ae331d 13-Jun-2012 Len Brown <len.brown@intel.com>

tools/power: turbostat: fix large c1% issue

Under some conditions, c1% was displayed as very large number,
much higher than 100%.

c1% is not measured, it is derived as "that, which is left over"
from other counters. However, the other counters are not collected
atomically, and so it is possible for c1% to be calaculagted as
a small negative number -- displayed as very large positive.

There was a check for mperf vs tsc for this already,
but it needed to also include the other counters
that are used to calculate c1.

Signed-off-by: Len Brown <len.brown@intel.com>


# c98d5d94 03-Jun-2012 Len Brown <len.brown@intel.com>

tools/power: turbostat v2 - re-write for efficiency

Measuring large profoundly-idle configurations
requires turbostat to be more lightweight.
Otherwise, the operation of turbostat itself
can interfere with the measurements.

This re-write makes turbostat topology aware.
Hardware is accessed in "topology order".
Redundant hardware accesses are deleted.
Redundant output is deleted.
Also, output is buffered and
local RDTSC use replaces remote MSR access for TSC.

From a feature point of view, the output
looks different since redundant figures are absent.
Also, there are now -c and -p options -- to restrict
output to the 1st thread in each core, and the 1st
thread in each package, respectively. This is helpful
to reduce output on big systems, where more detail
than the "-s" system summary is desired.
Finally, periodic mode output is now on stdout, not stderr.

Turbostat v2 is also slightly more robust in
handling run-time CPU online/offline events,
as it now checks the actual map of on-line cpus rather
than just the total number of on-line cpus.

Signed-off-by: Len Brown <len.brown@intel.com>


# 650a37f3 03-Jun-2012 Len Brown <len.brown@intel.com>

tools/power turbostat: fix IVB support

Initial IVB support went into turbostat in Linux-3.1:
553575f1ae048aa44682b46b3c51929a0b3ad337
(tools turbostat: recognize and run properly on IVB)

However, when running on IVB, turbostat would fail
to report the new couters added with SNB, c7, pc2 and pc7.
So in scenarios where these counters are non-zero on IVB,
turbostat would report erroneous residencey results.

In particular c7 time would be added to c1 time,
since c1 time is calculated as "that which is left over".

Also, turbostat reports MHz capabilities when passed
the "-v" option, and it would incorrectly report 133MHz
bclk instead of 100MHz bclk for IVB, which would inflate
GHz reported with that option.

This patch is a backport of a fix already included in turbostat v2.

Signed-off-by: Len Brown <len.brown@intel.com>


# d15cf7c1 03-Jun-2012 Len Brown <len.brown@intel.com>

tools/power turbostat: fix un-intended affinity of forked program

Linux 3.4 included a modification to turbostat to
lower cross-call overhead by using scheduler affinity:

15aaa34654831e98dd76f7738b6c7f5d05a66430
(tools turbostat: reduce measurement overhead due to IPIs)

In the use-case where turbostat forks a child program,
that change had the un-intended side-effect of binding
the child to the last cpu in the system.

This change removed the binding before forking the child.

This is a back-port of a fix already included in turbostat v2.

Signed-off-by: Len Brown <len.brown@intel.com>


# 15aaa346 29-Mar-2012 Len Brown <len.brown@intel.com>

tools turbostat: harden against cpu online/offline

Sometimes users have turbostat running in interval mode
when they take processors offline/online.

Previously, turbostat would survive, but not gracefully.

Tighten up the error checking so turbostat notices
changesn sooner, and print just 1 line on change:

turbostat: re-initialized with num_cpus %d

Signed-off-by: Len Brown <len.brown@intel.com>


# 88c3281f 29-Mar-2012 Len Brown <len.brown@intel.com>

tools turbostat: reduce measurement overhead due to IPIs

turbostat uses /dev/cpu/*/msr interface to read MSRs.
For modern systems, it reads 10 MSR/CPU. This can
be observed as 10 "Function Call Interrupts"
per CPU per sample added to /proc/interrupts.

This overhead is measurable on large idle systems,
and as Yoquan Song pointed out, it can even trick
cpuidle into thinking the system is busy.

Here turbostat re-schedules itself in-turn to each
CPU so that its MSR reads will always be local.
This replaces the 10 "Function Call Interrupts"
with a single "Rescheduling interrupt" per sample
per CPU.

On an idle 32-CPU system, this shifts some residency from
the shallow c1 state to the deeper c7 state:

# ./turbostat.old -s
%c0 GHz TSC %c1 %c3 %c6 %c7 %pc2 %pc3 %pc6 %pc7
0.27 1.29 2.29 0.95 0.02 0.00 98.77 20.23 0.00 77.41 0.00
0.25 1.24 2.29 0.98 0.02 0.00 98.75 20.34 0.03 77.74 0.00
0.27 1.22 2.29 0.54 0.00 0.00 99.18 20.64 0.00 77.70 0.00
0.26 1.22 2.29 1.22 0.00 0.00 98.52 20.22 0.00 77.74 0.00
0.26 1.38 2.29 0.78 0.02 0.00 98.95 20.51 0.05 77.56 0.00
^C
i# ./turbostat.new -s
%c0 GHz TSC %c1 %c3 %c6 %c7 %pc2 %pc3 %pc6 %pc7
0.27 1.20 2.29 0.24 0.01 0.00 99.49 20.58 0.00 78.20 0.00
0.27 1.22 2.29 0.25 0.00 0.00 99.48 20.79 0.00 77.85 0.00
0.27 1.20 2.29 0.25 0.02 0.00 99.46 20.71 0.03 77.89 0.00
0.28 1.26 2.29 0.25 0.01 0.00 99.46 20.89 0.02 77.67 0.00
0.27 1.20 2.29 0.24 0.01 0.00 99.48 20.65 0.00 78.04 0.00

cc: Youquan Song <youquan.song@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# e23da037 06-Feb-2012 Len Brown <len.brown@intel.com>

tools turbostat: add summary option

turbostat -s
cuts down on the amount of output, per user request.

also treak some output whitespace and the man page.

Signed-off-by: Len Brown <len.brown@intel.com>


# 553575f1 18-Nov-2011 Len Brown <len.brown@intel.com>

tools turbostat: recognize and run properly on IVB

Signed-off-by: Len Brown <len.brown@intel.com>


# d30c4b7a 31-Jul-2011 Len Brown <len.brown@intel.com>

tools/power turbostat: fit output into 80 columns on snb-ep

Reduce columns for package number to 1.
If you can afford more than 9 packages,
you can also afford a terminal with more than 80 columns:-)

Also shave a column also off the package C-states

Signed-off-by: Len Brown <len.brown@intel.com>


# aeae1e92 03-Jul-2011 Len Brown <len.brown@intel.com>

tools/power turbostat: less verbose debugging

dump only the counters which are active

Signed-off-by: Len Brown <len.brown@intel.com>


# 6eab04a8 08-Apr-2011 Justin P. Mattock <justinmattock@gmail.com>

treewide: remove extra semicolons

Signed-off-by: Justin P. Mattock <justinmattock@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>


# a829eb4d 10-Feb-2011 Len Brown <len.brown@intel.com>

tools: turbostat: style updates

Follow kernel coding style traditions more closely.
Delete typedef, re-name "per cpu counters" to
simply be counters etc.

This patch changes no functionality.

Suggested-by: Thiago Farina <tfransosi@gmail.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# 8209e054 21-Jan-2011 Thomas Renninger <trenn@suse.de>

tools: turbostat: fix bitwise and operand

bug could cause false positive on indicating
presence of invarient TSC or APERF support.

Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Len Brown <len.brown@intel.com>


# 103a8fea 22-Oct-2010 Len Brown <len.brown@intel.com>

tools: create power/x86/turbostat

turbostat is a Linux tool to observe proper operation
of Intel(R) Turbo Boost Technology.

turbostat displays the actual processor frequency
on x86 processors that include APERF and MPERF MSRs.

Note that turbostat is of limited utility on Linux
kernels 2.6.29 and older, as acpi_cpufreq cleared
APERF/MPERF up through that release.

On Intel Core i3/i5/i7 (Nehalem) and newer processors,
turbostat also displays residency in idle power saving states,
which are necessary for diagnosing any cpuidle issues
that may have an effect on turbo-mode.

See the turbostat.8 man page for example usage.

Signed-off-by: Len Brown <len.brown@intel.com>