History log of /linux-master/tools/power/pm-graph/sleepgraph.py
Revision Date Author Comments
# b85e2dab 15-Nov-2023 David Woodhouse <dwmw@amazon.co.uk>

PM: tools: Fix sleepgraph syntax error

The sleepgraph tool currently fails:

File "/usr/bin/sleepgraph", line 4155
or re.match('psci: CPU(?P<cpu>[0-9]*) killed.*', msg)):
^
SyntaxError: unmatched ')'

Fixes: 34ea427e01ea ("PM: tools: sleepgraph: Recognize "CPU killed" messages")
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Wolfram Sang <wsa@kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 34ea427e 15-Mar-2023 Xueqin Luo <luoxueqin@kylinos.cn>

PM: tools: sleepgraph: Recognize "CPU killed" messages

On the arm64 platform with PSCI, the core log of CPU offline is as
follows:

[ 100.431501] CPU1: shutdown
[ 100.454820] psci: CPU1 killed (polled 20 ms)
[ 100.459266] CPU2: shutdown
[ 100.482575] psci: CPU2 killed (polled 20 ms)
[ 100.486057] CPU3: shutdown
[ 100.513974] psci: CPU3 killed (polled 28 ms)
[ 100.518068] CPU4: shutdown
[ 100.541481] psci: CPU4 killed (polled 24 ms)

Prevent sleepgraph from mistakenly treating the "CPU up" message as part
of the suspend flow (because it should be regarded as part of the resume
flow) by making it recognize the "CPU* killed" messages above.

Signed-off-by: Xueqin Luo <luoxueqin@kylinos.cn>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 04175527 14-Mar-2023 Todd Brandt <todd.e.brandt@intel.com>

pm-graph: Update to v5.11

install_latest_from_github.sh:
- Added a new script which allows users to install the latest pm-graph
from the upstream github repo. This is useful if the kernel source
version has issues that have already been fixed in github.

sleepgraph.py:
- Updated all the dmesg suspend/resume PM print formats to be able to
process recent timelines using dmesg only.

- Added ethtool output to the log for the system's ethernet device id the
ethtool exists. This helps in debugging network issues.

- Made the tool more robustly handle events where mangled dmesg or ftrace
outputs do not include all the requisite data. The tool fails gracefully
instead of creating a garbled timeline.

Signed-off-by: Todd Brandt <todd.e.brandt@intel.com>
[ rjw: Changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 6fa7f537 13-Mar-2023 Todd Brandt <todd.e.brandt@linux.intel.com>

pm-graph: sleepgraph: Avoid crashing on binary data in device names

A regression has occurred in the hid-sensor code where a device
name string has not been initialized to 0, and ends up without
a NULL char and is printed with %s. This includes random binary
data in the device name, which makes its way into the ftrace output
and ends up crashing sleepgraph because it expects the ftrace output
to be ASCII only.

For example: "HID-SENSOR-INT-020b?.39.auto" ends up in ftrace instead
of "HID-SENSOR-INT-020b.39.auto". It causes this crash in sleepgraph:

File "/usr/bin/sleepgraph", line 5579, in executeSuspend
for line in fp:
File "/usr/lib/python3.10/codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position
1568: invalid start byte

The issue is present in 6.3-rc1 and is described in full here:
https://bugzilla.kernel.org/show_bug.cgi?id=217169

A separate fix has been submitted to have this issue repaired, but
it has also exposed a larger bug in sleepgraph, since nothing should
make sleepgraph crash. Sleepgraph needs to be able to handle binary
data showing up in ftrace gracefully.

Modify the ftrace processing code to treat it as potentially binary
and to filter out binary data and leave just the ASCII.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=217169
Fixes: 98c062e82451 ("HID: hid-sensor-custom: Allow more custom iio sensors")
Signed-off-by: Todd Brandt <todd.e.brandt@linux.intel.com>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# cf3e0251 30-Jan-2023 Ross Zwisler <zwisler@chromium.org>

PM: tools: use canonical ftrace path

The canonical location for the tracefs filesystem is at /sys/kernel/tracing.

But, from Documentation/trace/ftrace.rst:

Before 4.1, all ftrace tracing control files were within the debugfs
file system, which is typically located at /sys/kernel/debug/tracing.
For backward compatibility, when mounting the debugfs file system,
the tracefs file system will be automatically mounted at:

/sys/kernel/debug/tracing

A few scripts in tools/power still refer to this older debugfs path, so
let's update them to avoid confusion.

Signed-off-by: Ross Zwisler <zwisler@google.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 96d4b8e1 01-Dec-2022 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

PM: sleep: Refine error message in try_to_freeze_tasks()

A previous change amended try_to_freeze_tasks() with the "what"
variable pointing to a string describing the group of tasks subject to
the freezing which may be used in the error message in there too, so
make that happen.

Accordingly, update sleepgraph.py to catch the modified error message
as appropriate.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>


# 9bfb0977 20-Oct-2022 Todd Brandt <todd.e.brandt@intel.com>

pm-graph v5.10

sleepgraph:
- add -wifitrace argument for tracing all the way to wifi reconnect
- include more data in ftrace to mark the end of kernel resume
- add async_synchronize_full to the list of funcs to chart
- add thermal zone info to the log data
- include a check for s0ix support (s2idle is the default mem_sleep)
- if s2idle does not support s0ix, remove the SYS%LPI turbostat var
- fix -dev crash when kprobe caller is just an address (not a symbol)
- fix the cpuexec data in -proc to display in resume

sleepgraph.8:
- add -wifitrace documentation

README:
- change links from 01.org to developer.intel.com

Signed-off-by: Todd Brandt <todd.e.brandt@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# b3f6c43d 07-Jul-2022 Todd Brandt <todd.e.brandt@intel.com>

pm-graph v5.9

bootgraph:
- fix parsing of /proc/version to be much more flexible
- check kernel version to disallow ftrace on anything older than 4.10

sleepgraph:
- include fix to bugzilla 212761 in case it regresses
- fix for -proc bug: https://github.com/intel/pm-graph/pull/20
- add -debugtiming arg to get timestamps on prints
- allow use of the netfix tool hosted in the github repo
- read s0ix data from pmc_core for better debug
- include more system data in the output log
- Do a better job testing input files useability
- flag more error data from dmesg in the timeline
- pre-parse the trace log to fix any ordering issues
- add new parser to process dmesg only timelines
- remove superflous sleep(5) in multitest mode

config/custom-timeline-functions.cfg:
- change some names to keep up to date

README:
- new version, small wording changes

Signed-off-by: Todd Brandt <todd.e.brandt@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# b7e23e54 18-Mar-2021 Ricardo Ribalda <ribalda@chromium.org>

pm-graph: Fix typo "accesible"

Trivial fix.

Signed-off-by: Ricardo Ribalda <ribalda@chromium.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# d23e95c0 10-Nov-2020 Todd Brandt <todd.e.brandt@linux.intel.com>

pm-graph v5.8

- if wakeups occur in s2idle: "freeze time: N (-x ms waking y times) ms"

- change FREEZELOOP and FREEZEWAKE to S2LOOP and S2WAKE for brevity

- returns all sysfs vals to their initial state after testing

- use the dmesg log for debugging until the test is completed,
instrument the executeSuspend process to have a full trace,
if test completes, formal dmesg log overwrites the debug log

- fix CPU_ON and CPU_OFF devices in the timeline, should include [n]

Signed-off-by: Todd Brandt <todd.e.brandt@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 70d93298 18-Aug-2020 Peter Zijlstra <peterz@infradead.org>

notifier: Fix broken error handling pattern

The current notifiers have the following error handling pattern all
over the place:

int err, nr;

err = __foo_notifier_call_chain(&chain, val_up, v, -1, &nr);
if (err & NOTIFIER_STOP_MASK)
__foo_notifier_call_chain(&chain, val_down, v, nr-1, NULL)

And aside from the endless repetition thereof, it is broken. Consider
blocking notifiers; both calls take and drop the rwsem, this means
that the notifier list can change in between the two calls, making @nr
meaningless.

Fix this by replacing all the __foo_notifier_call_chain() functions
with foo_notifier_call_chain_robust() that embeds the above pattern,
but ensures it is inside a single lock region.

Note: I switched atomic_notifier_call_chain_robust() to use
the spinlock, since RCU cannot provide the guarantee
required for the recovery.

Note: software_resume() error handling was broken afaict.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://lore.kernel.org/r/20200818135804.325626653@infradead.org


# 68f9b228 20-Jul-2020 Todd Brandt <todd.e.brandt@linux.intel.com>

pm-graph v5.7 - important s2idle fixes

Important fixes:

- in s2idle, use timekeeping_freeze trace mark instead of
machine_suspend to denote entry into s2idle mode.

- in s2idle, use machine_suspend trace mark to create a new virtual
device called "s2idle_enter_<n>x". It denotes an s2idle_enter call
loop of <n> iterations where s2idle was never actually achieved.
It isn't counted as "freeze time" in the header.

- in s2idle, only show multiple freeze times if s2idle went in and
out of resume_noirq. Otherwise multiple freezes are shown with
"waking" time subtracted (waking time is time spent outside s2idle
dealing with wakeups).

- in s2idle summaries, include "FREEZEWAKE" as an issue when at
least 1ms is spent waking from s2idle. A clean run should only
wake for the rtc timer.

- add support for device callbacks with matching names in the same
phase. In rare cases some devices register multiple callbacks from
separate drivers using the same name. Without this fix only one is
shown.

- add kparamsfmt string back to fix bootgraph

General updates:

- when suspend_machine is missing, error says "failed in
suspend_machine"

- extract target count/time and add to summary title if -multi
used

- include any instances of "timeout" in dmesg as issues to be
logged.

- fix ftrace parse to handle any number of flags (instead of
just 4).

- remove sync/async_device string from device detail, remains in
hover.

- when using callgraph (-f) add driver name to callgraph titles.

Signed-off-by: Todd Brandt <todd.e.brandt@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 56555855 29-Apr-2020 Qais Yousef <qais.yousef@arm.com>

cpu/hotplug: Remove disable_nonboot_cpus()

The single user could have called freeze_secondary_cpus() directly.

Since this function was a source of confusion, remove it as it's
just a pointless wrapper.

While at it, rename enable_nonboot_cpus() to thaw_secondary_cpus() to
preserve the naming symmetry.

Done automatically via:

git grep -l enable_nonboot_cpus | xargs sed -i 's/enable_nonboot_cpus/thaw_secondary_cpus/g'

Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Link: https://lkml.kernel.org/r/20200430114004.17477-1-qais.yousef@arm.com


# 2c9a583b 08-Apr-2020 Todd Brandt <todd.e.brandt@linux.intel.com>

pm-graph v5.6

sleepgraph:
- force usage of python3 instead of using system default
- fix bugzilla 204773 (https://bugzilla.kernel.org/show_bug.cgi?id=204773)
- fix issue of platform info not being reset in -multi (logs fill up)
- change -ftop call to "pm_suspend", this is one level below state_store
- add -wificheck command to read out the current wifi device details
- change -wifi behavior to poll /proc/net/wireless for wifi connect
- add wifi reconnect time to timeline, include time in summary column
- add "fail on wifi_resume" to timeline and summary when wifi fails
- add a set of commands to collect data before/after suspend in the log
- add "-cmdinfo" command which prints out all the data collected
- check for cmd info tools at start, print found/missing in green/red
- fix kernel suspend time calculation: tool used to look for start of
pm_suspend_console, but the order has changed. latest kernel starts
with ksys_sync, use this instead
- include time spent in mem/disk in the header (same as freeze/standby)
- ignore turbostat 32-bit capability warnings
- print to result.txt when -skiphtml is used, just say result: pass
- don't exit on SIGTSTP, it's a ctrl-Z and the tool may come back
- -multi argument supports duration as well as count: hours, minutes, seconds
- update the -multi status output to be more informative
- -maxfail sets maximum consecutive fails before a -multi run is aborted
- in -summary, ignore dmesg/ftrace/html files that are 0 size

bootgraph:
- force usage of python3 instead of using system default

README:
- add endurance testing instructions

Makefile:
- remove pycache on uninstall

Signed-off-by: Todd Brandt <todd.e.brandt@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 42161483 04-Sep-2019 Todd Brandt <todd.e.brandt@linux.intel.com>

pm-graph: make setVal unbuffered again for python2 and python3

sleepgraph:
- kprobe_events won't set correctly if the data is buffered
- force sysvals.setVal to be unbuffered and use binary mode
- tested in both python2 and python3

Link: https://bugzilla.kernel.org/show_bug.cgi?id=204773
Signed-off-by: Todd Brandt <todd.e.brandt@linux.intel.com>
[ rjw: Subject ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 1446794a 12-Aug-2019 Todd Brandt <todd.e.brandt@linux.intel.com>

pm-graph v5.5

Upgrade bootgraph/sleepgraph to be able to run on python2 and python3.
Both now simply require python, the system can choose which to use.

bootgraph python3 update:
- add floor function to handle integer arithmetic
- change argument loop to use next() instead of args.next()
- open dmesg log and popen in binary, use decode(ascii, ignore)
- sort all html data to allow diff between python versions
- change exception handler to use python3 as instead of comma

sleepgraph python3 update:
- import configparser not ConfigParser (p2 needs python-configparser)
- add floor function to handle integer arithmetic
- change argument loop to use next() instead of args.next()
- handle popen output in binary, use decode(ascii, ignore)
- sort all html/output data to allow diff between python versions
- force gzip open to use text mode, same for file open
- ensure no binary data is written to logs (ascii convert devprops info)
- use codecs library to handle zlib encoding for mcelog data
- remove all uses of python3.7 keyword "async" as members or vars
- assume all FPDT and DMI data is in binary string form

sleepgraph:
- turbostat will be used by default if it's found & the mode is freeze
- a new option "-noturbostat" will disable its use
- fix bug where two callgraphs with the same start time overwrite.
- fix s2idle processing where two suspend/resume_machines occur back2back
- update getexec function to use which first (assuming PATH exists)
- new platforminfo data in log with: lspci, gpe counts, /proc/interrupts
- new data is zipped, b64 encoded, and tacked on the end of ftrace

Signed-off-by: Todd Brandt <todd.e.brandt@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 2025cf9e 29-May-2019 Thomas Gleixner <tglx@linutronix.de>

treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 288

Based on 1 normalized pattern(s):

this program is free software you can redistribute it and or modify
it under the terms and conditions of the gnu general public license
version 2 as published by the free software foundation this program
is distributed in the hope it will be useful but without any
warranty without even the implied warranty of merchantability or
fitness for a particular purpose see the gnu general public license
for more details

extracted by the scancode license scanner the SPDX license identifier

GPL-2.0-only

has been chosen to replace the boilerplate/reference in 263 file(s).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Allison Randal <allison@lohutok.net>
Reviewed-by: Alexios Zavras <alexios.zavras@intel.com>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190529141901.208660670@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# 45dd0a42 14-May-2019 Todd Brandt <todd.e.brandt@linux.intel.com>

Update to pm-graph 5.4

bootgraph:
- dmesg log format has changed, update parser in two places
- fix prints in preparation for upgrade to python3

sleepgraph:
- fix prints in preparation for upgrade to python3
- add new trace events and kprobes to cover freeze more completely
- add new -ftop callgraph trace over suspend_devices_and_enter
- add -wifi option to check if a wifi connection is active
- add -skipkprobe option to suppress unwanted kprobes in dev mode
- add kernel params and sysinfo to the log output
- don't crash if /dev/mem is throwing IO errors, ignore FPDT and DMI
- fix kprobe length calculation when calls are recursive
- add several new kernel issue definitions for USB, ACPI, ATA, etc
- enable turbostat output to be read from stdout instead of from file
- add BIOS call data to the timeline from acpi_ps_execute_method kprobe

Signed-off-by: Todd Brandt <todd.e.brandt@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 7673896a 14-May-2019 Todd Brandt <todd.e.brandt@linux.intel.com>

Update to pm-graph 5.3

sleepgraph:
- add support for parsing kernel issues from timeline dmesg logs
- with -summary, generate a summary-issues.html for kernel issues found
- with -summary, generate a summary-devices.html for device callback times
- when recreating a timeline, use -o to set the output html filename
- capture mcelog data when hardware errors occur and store in log
- add -turbostat option to capture power data during freeze

Signed-off-by: Todd Brandt <todd.e.brandt@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 18d3f8fc 08-Oct-2018 Todd Brandt <todd.e.brandt@linux.intel.com>

PM / tools: sleepgraph and bootgraph: upgrade to v5.2

bootgraph & sleepgraph:
- funnel all prints through the pprint function
- remove superfluous print calls, arrange them in single blocks
- flush stdout on every print, enables log capture on hang

sleepgraph:
- in -summary, if all tests have the same host+kernel+mode, add to title
- update verbose device detail print to include machine suspend/resume
- match tKernSus and tKernRes to pm_prepare/restore_console
- fully support multiple suspend/resumes in a single timeline
- enable various disk modes (disk-suspend, disk-test_resume, etc)
- add warnings when -display (xset) fails

Signed-off-by: Todd Brandt <todd.e.brandt@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 5484f033 08-Oct-2018 Todd Brandt <todd.e.brandt@linux.intel.com>

PM / tools: sleepgraph: first batch of v5.2 changes

general:
- add battery charge data before and after test
- remove special s0i3 handling
- remove melding of dmesg & ftrace data in old kernels, use one only
- updates to various kprobes in trace (ksys_sync, etc)
- enable pm_debug_messages during the test
- instrument more subsystems with dev functions (phy0)

error handling:
- return codes for tool show the status of the test run
- 0: success, 1: general error (no timeline), 2: fail (suspend aborted)
- monitor output of /sys/power/state, mark as failure if exception occurs
- add signal handler when using -result to catch tool exceptions

display control
- add -x commands for testing xset with mode settings and status
- allow display setting to on, off, suspend, standby
- add display mode change info to the log, along with a warning on fail

s2idle (freeze)
- remove fixed 10-phase dependency, allow any phase order & any count
- multiple phase occurences show as phase_nameN e.g. suspend_noirq3
- if multiple freezes occur, print multiple time values in header

summary:
- add new columns to summary output: issues, worst suspend/resume devices
- worst device: includes summation of all phases of suspend or resume
- issues: includes WARNING/ERROR/BUG from dmesg log, and other issues
- s2idle: multiple freezes show as FREEZExN in the issues column

Signed-off-by: Todd Brandt <todd.e.brandt@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# ffbb95aa 24-May-2018 Todd E Brandt <todd.e.brandt@linux.intel.com>

PM / tools: pm-graph: upgrade to v5.1

general changes:
- make python dependent on version2 to enable clearlinux
- upgrade dmesg error/warning extraction to be more detailed
- enable logs generated from -cmd runs to be processed in gzip form
- add notification on power mode entry failure into the timeline
- add -battery option to show if battery is connected and its charge

summary changes (output of -summary):
- add -genhtml option to regenerate missing timelines from logs found
- add min/max/median/avg data to the summary page with links to the data
- add highlight to minimum, maximum, and median tests
- add result column to summary (pass or fail) with red highlight on fail
- add issues column to summary with a list of dmesg err/warn/bugs

Signed-off-by: Todd Brandt <todd.e.brandt@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 700abc90 30-Jan-2018 Todd E Brandt <todd.e.brandt@linux.intel.com>

pm-graph: AnalyzeSuspend v5.0

- add -cgskip option to reduce callgraph output size
- add -cgfilter option to focus on a list of devices
- add -result option for exporting batch test results
- removed all phoronix hooks, use -result to enable batch testing
- change -usbtopo to -devinfo, now prints all devices
- add -gzip option to read/write logs in gz format
- add -bufsize option to manually control ftrace buffer size
- add -sync option to run filesystem sync prior to test
- add -display option to enable/disable the display prior to test
- add -rs option to enable/disable runtime suspend on all devices for test
- add installed config files to search path
- add kernel error/warning links into the timeline
- fix callgraph trace to better handle interrupts
- include command string and kernel params in timeline output header

Signed-off-by: Todd Brandt <todd.e.brandt@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# a6fbdbb2 30-Jan-2018 Todd E Brandt <todd.e.brandt@linux.intel.com>

pm-graph: config files and installer

- name change: analyze_boot.py to bootgraph.py
- name change: analyze_suspend.py to sleepgraph.py
- added config files for easier sleepgraph usage
- added example.cfg which describes all config options
- added cgskip.txt definition for slimmer callgraphs

Signed-off-by: Todd Brandt <todd.e.brandt@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>