History log of /linux-master/drivers/perf/hisilicon/hisi_pcie_pmu.c
Revision Date Author Comments
# 7da37705 23-Feb-2024 Junhao He <hejunhao3@huawei.com>

drivers/perf: hisi_pcie: Merge find_related_event() and get_event_idx()

The function xxx_find_related_event() scan all working events to find
related events. During this process, we also can find the idle counters.
If not found related events, return the first idle counter to simplify
the code.

Signed-off-by: Junhao He <hejunhao3@huawei.com>
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20240223103359.18669-8-yangyicong@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>


# 2fbf96ed 23-Feb-2024 Junhao He <hejunhao3@huawei.com>

drivers/perf: hisi_pcie: Relax the check on related events

If we use two events with the same filter and related event type
(see the following example), the driver check whether they are related
events and are in the same group, otherwise the function
hisi_pcie_pmu_find_related_event() return -EINVAL, then the 2nd event
cannot count but the 1st event is running, although the PCIe PMU has
other idle counters.

In this case, The perf event scheduler will make the two events to
multiplex a counter, if the user use the formula
(1st event_value / 2nd event_value) to calculate the bandwidth, he/she
won't get the correct value, because they are not counting at the
same period.

This patch tries to fix this by making the related events to use
different idle counters if they are not in the same event group.

And finally, I'm going to say. The related events are best used in the
same group [1]. There are two ways to know if they are related events.
a) By event name, such as the latency events "xxx_latency, xxx_cnt" or
bandwidth events "xxx_flux, xxx_time".
b) By event type, such as "event=0xXXXX, event=0x1XXXX".

Use group to count the related events:
[1] -e "{pmu_name/xxx_latency,port=1/,pmu_name/xxx_cnt,port=1/}"

example:
1st event: hisi_pcie0_core1/event=0x804,port=1
2nd event: hisi_pcie0_core1/event=0x10804,port=1

test cmd:
perf stat -e hisi_pcie0_core1/event=0x804,port=1/ \
-e hisi_pcie0_core1/event=0x10804,port=1/

before patch:
25,281 hisi_pcie0_core1/event=0x804,port=1/ (49.91%)
470,598 hisi_pcie0_core1/event=0x10804,port=1/ (50.09%)

after patch:
24,147 hisi_pcie0_core1/event=0x804,port=1/
474,558 hisi_pcie0_core1/event=0x10804,port=1/

Signed-off-by: Junhao He <hejunhao3@huawei.com>
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huwei.com>
Link: https://lore.kernel.org/r/20240223103359.18669-7-yangyicong@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>


# 2f864fee 23-Feb-2024 Junhao He <hejunhao3@huawei.com>

drivers/perf: hisi_pcie: Check the target filter properly

The PMU can monitor traffic of certain target Root Port or downstream
target Endpoint. User can specify the target filter by the "port" or
"bdf" option respectively. The PMU can only monitor the Root Port or
Endpoint on the same PCIe core so the value of "port" or "bdf" should
be valid and will be checked by the driver.

Currently at least and only one of "port" and "bdf" option must be set.
If "port" filter is not set or is set explicitly to zero (default),
driver will regard the user specifies a "bdf" option since "port" option
is a bitmask of the target Root Ports and zero is not a valid
value.

If user not explicitly set "port" or "bdf" filter, the driver uses "bdf"
default value (zero) to set target filter, but driver will skip the
check of bdf=0, although it's a valid value (meaning 0000:000:00.0).
Then the user just gets zero.

Therefore, we need to check if both "port" and "bdf" are invalid, then
return failure and report warning.

Testing:
before the patch:
0 hisi_pcie0_core1/rx_mrd_flux/
0 hisi_pcie0_core1/rx_mrd_flux,port=0/
24,124 hisi_pcie0_core1/rx_mrd_flux,port=1/
0 hisi_pcie0_core1/rx_mrd_flux,bdf=0/
0 hisi_pcie0_core1/rx_mrd_flux,port=0x800/
<not supported> hisi_pcie0_core1/rx_mrd_flux,bdf=1/
24,132 hisi_pcie0_core1/rx_mrd_flux,bdf=0x1700/
<not supported> hisi_pcie0_core1/rx_mrd_flux,port=0x0,bdf=0x0/
<not supported> hisi_pcie0_core1/rx_mrd_flux,port=0x0,bdf=0x1/
24,138 hisi_pcie0_core1/rx_mrd_flux,port=0x0,bdf=0x1700/
24,126 hisi_pcie0_core1/rx_mrd_flux,port=0x1,bdf=0x0/

after the patch:
<not supported> hisi_pcie0_core1/rx_mrd_flux/
<not supported> hisi_pcie0_core1/rx_mrd_flux,port=0/
24,153 hisi_pcie0_core1/rx_mrd_flux,port=1/
0 hisi_pcie0_core1/rx_mrd_flux,port=0x800/
<not supported> hisi_pcie0_core1/rx_mrd_flux,bdf=0/
<not supported> hisi_pcie0_core1/rx_mrd_flux,bdf=1/
24,117 hisi_pcie0_core1/rx_mrd_flux,bdf=0x1700/
<not supported> hisi_pcie0_core1/rx_mrd_flux,port=0x0,bdf=0x0/
<not supported> hisi_pcie0_core1/rx_mrd_flux,port=0x0,bdf=0x1/
24,120 hisi_pcie0_core1/rx_mrd_flux,port=0x0,bdf=0x1700/
24,123 hisi_pcie0_core1/rx_mrd_flux,port=0x1,bdf=0x0/

Signed-off-by: Junhao He <hejunhao3@huawei.com>
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20240223103359.18669-6-yangyicong@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>


# 00ca69b8 23-Feb-2024 Yicong Yang <yangyicong@hisilicon.com>

drivers/perf: hisi_pcie: Add more events for counting TLP bandwidth

A typical PCIe transaction is consisted of various TLP packets in both
direction. For counting bandwidth only memory read events are exported
currently. Add memory write and completion counting events of both
direction to complete the bandwidth counting.

Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20240223103359.18669-5-yangyicong@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>


# b6693ad6 23-Feb-2024 Yicong Yang <yangyicong@hisilicon.com>

drivers/perf: hisi_pcie: Fix incorrect counting under metric mode

The metric counting shows incorrect results if the events in the
metric group using the same event but different filter options.
This is because we only judge the event code to decide whether
the event in the metric group should share the same hardware
counter, but ignore the settings of the filter.

For example, on a platform of 2 ports 0x1 and 0x2 but only port
0x1 has a downstream PCIe NVME device. The metric counting
shows both ports have the same counts because we misassign these
two events to one same hardware counter:
[root@localhost perf-iostat]# ./perf stat -e '{hisi_pcie0_core1/event=0x0104,port=0x2/,hisi_pcie0_core1/event=0x0104,port=0x1/}'

Performance counter stats for 'system wide':

7907484924 hisi_pcie0_core1/event=0x0104,port=0x2/
7907484924 hisi_pcie0_core1/event=0x0104,port=0x1/

10.153863691 seconds time elapsed

Fix this by using the whole config rather than the event only
to judge whether two events are the same and should share the
same hardware counter. With this patch, the metric counting in
the above case tends to be corrected:

[root@localhost perf-iostat]# ./perf stat -e '{hisi_pcie0_core1/event=0x0104,port=0x2/,hisi_pcie0_core1/event=0x0104,port=0x1/}'

Performance counter stats for 'system wide':

0 hisi_pcie0_core1/event=0x0104,port=0x2/
8123122077 hisi_pcie0_core1/event=0x0104,port=0x1/

10.152875631 seconds time elapsed

Fixes: 8404b0fbc7fb ("drivers/perf: hisi: Add driver for HiSilicon PCIe PMU")
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20240223103359.18669-4-yangyicong@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>


# 4d473461 23-Feb-2024 Yicong Yang <yangyicong@hisilicon.com>

drivers/perf: hisi_pcie: Introduce hisi_pcie_pmu_get_event_ctrl_val()

Factor out retrieving of the register value for the
corresponding event from hisi_pcie_config_event_ctrl() into a
new function hisi_pcie_pmu_get_event_ctrl_val() allowing future
reuse.

Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20240223103359.18669-3-yangyicong@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>


# 54a9e47e 23-Feb-2024 Yicong Yang <yangyicong@hisilicon.com>

drivers/perf: hisi_pcie: Rename hisi_pcie_pmu_{config,clear}_filter()

hisi_pcie_pmu_{config,clear}_filter() are config/clear HISI_PCIE_EVENT_CTRL
register which contains not only the filter but also the event code. The
function names are bit misleading. Rename it to
hisi_pcie_pmu_{config,clear}_event_ctrl() to reflects their functions
more accurately.

Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20240223103359.18669-2-yangyicong@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>


# 868f8a70 24-Oct-2023 Yicong Yang <yangyicong@hisilicon.com>

drivers/perf: hisi_pcie: Initialize event->cpu only on success

Initialize the event->cpu only on success. To be more reasonable
and keep consistent with other PMUs.

Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Link: https://lore.kernel.org/r/20231024092954.42297-3-yangyicong@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>


# 6d7d51e8 24-Oct-2023 Yicong Yang <yangyicong@hisilicon.com>

drivers/perf: hisi_pcie: Check the type first in pmu::event_init()

Check whether the event type matches the PMU type firstly in
pmu::event_init() before touching the event. Otherwise we'll
change the events of others and lead to incorrect results.
Since in perf_init_event() we may call every pmu's event_init()
in a certain case, we should not modify the event if it's not
ours.

Fixes: 8404b0fbc7fb ("drivers/perf: hisi: Add driver for HiSilicon PCIe PMU")
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Link: https://lore.kernel.org/r/20231024092954.42297-2-yangyicong@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>


# 83a6d80c 15-Aug-2023 Yicong Yang <yangyicong@hisilicon.com>

drivers/perf: hisi: Schedule perf session according to locality

The PCIe PMUs locate on different NUMA node but currently we don't
consider it and likely stack all the sessions on the same CPU:

[root@localhost tmp]# cat /sys/devices/hisi_pcie*/cpumask
0
0
0
0
0
0

This can be optimize a bit to use a local CPU for the PMU.

Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Link: https://lore.kernel.org/r/20230815131010.2147-1-yangyicong@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>


# 7a6a9f1c 08-Jun-2023 Junhao He <hejunhao3@huawei.com>

drivers/perf: hisi: Don't migrate perf to the CPU going to teardown

The driver needs to migrate the perf context if the current using CPU going
to teardown. By the time calling the cpuhp::teardown() callback the
cpu_online_mask() hasn't updated yet and still includes the CPU going to
teardown. In current driver's implementation we may migrate the context
to the teardown CPU and leads to the below calltrace:

...
[ 368.104662][ T932] task:cpuhp/0 state:D stack: 0 pid: 15 ppid: 2 flags:0x00000008
[ 368.113699][ T932] Call trace:
[ 368.116834][ T932] __switch_to+0x7c/0xbc
[ 368.120924][ T932] __schedule+0x338/0x6f0
[ 368.125098][ T932] schedule+0x50/0xe0
[ 368.128926][ T932] schedule_preempt_disabled+0x18/0x24
[ 368.134229][ T932] __mutex_lock.constprop.0+0x1d4/0x5dc
[ 368.139617][ T932] __mutex_lock_slowpath+0x1c/0x30
[ 368.144573][ T932] mutex_lock+0x50/0x60
[ 368.148579][ T932] perf_pmu_migrate_context+0x84/0x2b0
[ 368.153884][ T932] hisi_pcie_pmu_offline_cpu+0x90/0xe0 [hisi_pcie_pmu]
[ 368.160579][ T932] cpuhp_invoke_callback+0x2a0/0x650
[ 368.165707][ T932] cpuhp_thread_fun+0xe4/0x190
[ 368.170316][ T932] smpboot_thread_fn+0x15c/0x1a0
[ 368.175099][ T932] kthread+0x108/0x13c
[ 368.179012][ T932] ret_from_fork+0x10/0x18
...

Use function cpumask_any_but() to find one correct active cpu to fixes
this issue.

Fixes: 8404b0fbc7fb ("drivers/perf: hisi: Add driver for HiSilicon PCIe PMU")
Signed-off-by: Junhao He <hejunhao3@huawei.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Yicong Yang <yangyicong@hisilicon.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20230608114326.27649-1-hejunhao3@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>


# 17d57398 17-Nov-2022 Yicong Yang <yangyicong@hisilicon.com>

drivers/perf: hisi: Add TLP filter support

The PMU support to filter the TLP when counting the bandwidth with below
options:

- only count the TLP headers
- only count the TLP payloads
- count both TLP headers and payloads

In the current driver it's default to count the TLP payloads only, which
will have an implicity side effects that on the traffic only have header
only TLPs, we'll get no data.

Make this user configuration through "len_mode" parameter and make it
default to count both TLP headers and payloads when user not specified.
Also update the documentation for it.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Link: https://lore.kernel.org/r/20221117084136.53572-5-yangyicong@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>


# 6b4bb4f3 17-Nov-2022 Yicong Yang <yangyicong@hisilicon.com>

drivers/perf: hisi: Fix some event id for hisi-pcie-pmu

Some event id of hisi-pcie-pmu is incorrect, fix them.

Fixes: 8404b0fbc7fb ("drivers/perf: hisi: Add driver for HiSilicon PCIe PMU")
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Link: https://lore.kernel.org/r/20221117084136.53572-2-yangyicong@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>


# 8404b0fb 02-Dec-2021 Qi Liu <liuqi115@huawei.com>

drivers/perf: hisi: Add driver for HiSilicon PCIe PMU

PCIe PMU Root Complex Integrated End Point(RCiEP) device is supported
to sample bandwidth, latency, buffer occupation etc.

Each PMU RCiEP device monitors multiple Root Ports, and each RCiEP is
registered as a PMU in /sys/bus/event_source/devices, so users can
select target PMU, and use filter to do further sets.

Filtering options contains:
event - select the event.
port - select target Root Ports. Information of Root Ports are
shown under sysfs.
bdf - select requester_id of target EP device.
trig_len - set trigger condition for starting event statistics.
trig_mode - set trigger mode. 0 means starting to statistic when bigger
than trigger condition, and 1 means smaller.
thr_len - set threshold for statistics.
thr_mode - set threshold mode. 0 means count when bigger than threshold,
and 1 means smaller.

Acked-by: Krzysztof WilczyƄski <kw@linux.com>
Reviewed-by: John Garry <john.garry@huawei.com>
Signed-off-by: Qi Liu <liuqi115@huawei.com>
Reviewed-by: Shaokun Zhang <zhangshaokun@hisilicon.com>
Link: https://lore.kernel.org/r/20211202080633.2919-3-liuqi115@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>