History log of /linux-master/tools/perf/util/stat-shadow.c
Revision Date Author Comments
# a59fb796 21-Feb-2024 Ian Rogers <irogers@google.com>

perf metrics: Compute unmerged uncore metrics individually

When merging counts from multiple uncore PMUs the metric is only
computed for the metric leader. When merging/aggregation is disabled,
prior to this patch just the leader's metric would be computed. Fix
this by computing the metric for each PMU.

On a SkylakeX:
Before:
```
$ perf stat -A -M memory_bandwidth_total -a sleep 1

Performance counter stats for 'system wide':

CPU0 82,217 UNC_M_CAS_COUNT.RD [uncore_imc_0] # 9.2 MB/s memory_bandwidth_total
CPU18 0 UNC_M_CAS_COUNT.RD [uncore_imc_0] # 0.0 MB/s memory_bandwidth_total
CPU0 61,395 UNC_M_CAS_COUNT.WR [uncore_imc_0]
CPU18 0 UNC_M_CAS_COUNT.WR [uncore_imc_0]
CPU0 0 UNC_M_CAS_COUNT.RD [uncore_imc_1]
CPU18 0 UNC_M_CAS_COUNT.RD [uncore_imc_1]
CPU0 0 UNC_M_CAS_COUNT.WR [uncore_imc_1]
CPU18 0 UNC_M_CAS_COUNT.WR [uncore_imc_1]
CPU0 81,570 UNC_M_CAS_COUNT.RD [uncore_imc_2]
CPU18 113,886 UNC_M_CAS_COUNT.RD [uncore_imc_2]
CPU0 62,330 UNC_M_CAS_COUNT.WR [uncore_imc_2]
CPU18 66,942 UNC_M_CAS_COUNT.WR [uncore_imc_2]
CPU0 75,489 UNC_M_CAS_COUNT.RD [uncore_imc_3]
CPU18 27,958 UNC_M_CAS_COUNT.RD [uncore_imc_3]
CPU0 55,864 UNC_M_CAS_COUNT.WR [uncore_imc_3]
CPU18 38,727 UNC_M_CAS_COUNT.WR [uncore_imc_3]
CPU0 0 UNC_M_CAS_COUNT.RD [uncore_imc_4]
CPU18 0 UNC_M_CAS_COUNT.RD [uncore_imc_4]
CPU0 0 UNC_M_CAS_COUNT.WR [uncore_imc_4]
CPU18 0 UNC_M_CAS_COUNT.WR [uncore_imc_4]
CPU0 75,423 UNC_M_CAS_COUNT.RD [uncore_imc_5]
CPU18 104,527 UNC_M_CAS_COUNT.RD [uncore_imc_5]
CPU0 57,596 UNC_M_CAS_COUNT.WR [uncore_imc_5]
CPU18 56,777 UNC_M_CAS_COUNT.WR [uncore_imc_5]
CPU0 1,003,440,851 ns duration_time

1.003440851 seconds time elapsed
```

After:
```
$ perf stat -A -M memory_bandwidth_total -a sleep 1

Performance counter stats for 'system wide':

CPU0 88,968 UNC_M_CAS_COUNT.RD [uncore_imc_0] # 9.5 MB/s memory_bandwidth_total
CPU18 0 UNC_M_CAS_COUNT.RD [uncore_imc_0] # 0.0 MB/s memory_bandwidth_total
CPU0 59,498 UNC_M_CAS_COUNT.WR [uncore_imc_0]
CPU18 0 UNC_M_CAS_COUNT.WR [uncore_imc_0]
CPU0 0 UNC_M_CAS_COUNT.RD [uncore_imc_1] # 0.0 MB/s memory_bandwidth_total
CPU18 0 UNC_M_CAS_COUNT.RD [uncore_imc_1] # 0.0 MB/s memory_bandwidth_total
CPU0 0 UNC_M_CAS_COUNT.WR [uncore_imc_1]
CPU18 0 UNC_M_CAS_COUNT.WR [uncore_imc_1]
CPU0 88,635 UNC_M_CAS_COUNT.RD [uncore_imc_2] # 9.5 MB/s memory_bandwidth_total
CPU18 117,975 UNC_M_CAS_COUNT.RD [uncore_imc_2] # 11.5 MB/s memory_bandwidth_total
CPU0 60,829 UNC_M_CAS_COUNT.WR [uncore_imc_2]
CPU18 62,105 UNC_M_CAS_COUNT.WR [uncore_imc_2]
CPU0 82,238 UNC_M_CAS_COUNT.RD [uncore_imc_3] # 8.7 MB/s memory_bandwidth_total
CPU18 22,906 UNC_M_CAS_COUNT.RD [uncore_imc_3] # 3.6 MB/s memory_bandwidth_total
CPU0 53,959 UNC_M_CAS_COUNT.WR [uncore_imc_3]
CPU18 32,990 UNC_M_CAS_COUNT.WR [uncore_imc_3]
CPU0 0 UNC_M_CAS_COUNT.RD [uncore_imc_4] # 0.0 MB/s memory_bandwidth_total
CPU18 0 UNC_M_CAS_COUNT.RD [uncore_imc_4] # 0.0 MB/s memory_bandwidth_total
CPU0 0 UNC_M_CAS_COUNT.WR [uncore_imc_4]
CPU18 0 UNC_M_CAS_COUNT.WR [uncore_imc_4]
CPU0 83,595 UNC_M_CAS_COUNT.RD [uncore_imc_5] # 8.9 MB/s memory_bandwidth_total
CPU18 110,151 UNC_M_CAS_COUNT.RD [uncore_imc_5] # 10.5 MB/s memory_bandwidth_total
CPU0 56,540 UNC_M_CAS_COUNT.WR [uncore_imc_5]
CPU18 53,816 UNC_M_CAS_COUNT.WR [uncore_imc_5]
CPU0 1,003,353,416 ns duration_time
```

Signed-off-by: Ian Rogers <irogers@google.com> |
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Kaige Ye <ye@kaige.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: John Garry <john.g.garry@oracle.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240221070754.4163916-2-irogers@google.com


# eee41e6b 21-Feb-2024 Ian Rogers <irogers@google.com>

perf stat: Pass fewer metric arguments

Pass metric_expr and evsel rather than specific variables from the
struct, thereby reducing the number of arguments. This will enable
later fixes.

To reduce the size of the diff, local variables are added to match the
previous parameter names. This isn't done in the case of "name" as
evsel->name is more intention revealing. A whitespace issue is also
addressed.

Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Kaige Ye <ye@kaige.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: John Garry <john.g.garry@oracle.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240221070754.4163916-1-irogers@google.com


# 6d6be5eb 09-Feb-2024 Ian Rogers <irogers@google.com>

perf metric: Don't remove scale from counts

Counts were switched from the scaled saved value form to the
aggregated count to avoid double accounting. When this happened the
removing of scaling for a count should have been removed, however, it
wasn't and this wasn't observed as it normally doesn't matter because
a counter's scale is 1. A problem was observed with RAPL events that
are scaled.

Fixes: 37cc8ad77cf8 ("perf metric: Directly use counts rather than saved_value")
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: James Clark <james.clark@arm.com>
Cc: Kaige Ye <ye@kaige.org>
Cc: John Garry <john.g.garry@oracle.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240209204947.3873294-5-irogers@google.com


# f2567e12 11-Dec-2023 Ian Rogers <irogers@google.com>

perf stat: Fix hard coded LL miss units

Copy-paste error where LL cache misses are reported as l1i.

Fixes: 0a57b910807ad163 ("perf stat: Use counts rather than saved_value")
Suggested-by: Guillaume Endignoux <guillaumee@google.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20231211181242.1721059-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# 6a80d794 15-Jun-2023 Kan Liang <kan.liang@linux.intel.com>

perf stat: New metricgroup output for the default mode

In the default mode, the current output of the metricgroup include both
events and metrics, which is not necessary and just makes the output
hard to read. Since different ARCHs (even different generations in the
same ARCH) may use different events. The output also vary on different
platforms.

For a metricgroup, only outputting the value of each metric is good
enough.

Add a new field default_metricgroup in evsel to indicate an event of the
default metricgroup. For those events, printout() should print the
metricgroup name rather than each event.

Add perf_stat__skip_metric_event() to skip the evsel in the Default
metricgroup, if it's not running or not the metric event.

Add print_metricgroup_header_t to pass the functions which print the
display name of each metricgroup in the Default metricgroup. Support all
three output methods.

Factor out perf_stat__print_shadow_stats_metricgroup() to print out each
metrics.

On SPR:

Before:

./perf_old stat sleep 1

Performance counter stats for 'sleep 1':

0.54 msec task-clock:u # 0.001 CPUs utilized
0 context-switches:u # 0.000 /sec
0 cpu-migrations:u # 0.000 /sec
68 page-faults:u # 125.445 K/sec
540,970 cycles:u # 0.998 GHz
556,325 instructions:u # 1.03 insn per cycle
123,602 branches:u # 228.018 M/sec
6,889 branch-misses:u # 5.57% of all branches
3,245,820 TOPDOWN.SLOTS:u # 18.4 % tma_backend_bound
# 17.2 % tma_retiring
# 23.1 % tma_bad_speculation
# 41.4 % tma_frontend_bound
564,859 topdown-retiring:u
1,370,999 topdown-fe-bound:u
603,271 topdown-be-bound:u
744,874 topdown-bad-spec:u
12,661 INT_MISC.UOP_DROPPING:u # 23.357 M/sec

1.001798215 seconds time elapsed

0.000193000 seconds user
0.001700000 seconds sys

After:

$ ./perf stat sleep 1

Performance counter stats for 'sleep 1':

0.51 msec task-clock:u # 0.001 CPUs utilized
0 context-switches:u # 0.000 /sec
0 cpu-migrations:u # 0.000 /sec
68 page-faults:u # 132.683 K/sec
545,228 cycles:u # 1.064 GHz
555,509 instructions:u # 1.02 insn per cycle
123,574 branches:u # 241.120 M/sec
6,957 branch-misses:u # 5.63% of all branches
TopdownL1 # 17.5 % tma_backend_bound
# 22.6 % tma_bad_speculation
# 42.7 % tma_frontend_bound
# 17.1 % tma_retiring
TopdownL2 # 21.8 % tma_branch_mispredicts
# 11.5 % tma_core_bound
# 13.4 % tma_fetch_bandwidth
# 29.3 % tma_fetch_latency
# 2.7 % tma_heavy_operations
# 14.5 % tma_light_operations
# 0.8 % tma_machine_clears
# 6.1 % tma_memory_bound

1.001712086 seconds time elapsed

0.000151000 seconds user
0.001618000 seconds sys

Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ahmad Yasin <ahmad.yasin@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20230616031420.3751973-3-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# 2a939c86 02-May-2023 Ian Rogers <irogers@google.com>

perf metric: Change divide by zero and !support events behavior

Division by zero causes expression parsing to fail and no metric to be
generated. This can mean for short running benchmarks metrics are not
shown. Change the behavior to make the value nan, which gets shown like:

'''
$ perf stat -M TopdownL2 true

Performance counter stats for 'true':

1,031,492 INST_RETIRED.ANY # nan % tma_fetch_bandwidth
# nan % tma_heavy_operations
# nan % tma_light_operations
29,304 CPU_CLK_UNHALTED.REF_XCLK # nan % tma_fetch_latency
# nan % tma_branch_mispredicts
# nan % tma_machine_clears
# nan % tma_core_bound
# nan % tma_memory_bound
2,658,319 IDQ_UOPS_NOT_DELIVERED.CORE
11,167 EXE_ACTIVITY.BOUND_ON_STORES
262,058 EXE_ACTIVITY.1_PORTS_UTIL
<not counted> BR_MISP_RETIRED.ALL_BRANCHES (0.00%)
<not counted> INT_MISC.RECOVERY_CYCLES_ANY (0.00%)
<not counted> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE (0.00%)
<not counted> CPU_CLK_UNHALTED.THREAD (0.00%)
<not counted> UOPS_RETIRED.RETIRE_SLOTS (0.00%)
<not counted> CYCLE_ACTIVITY.STALLS_MEM_ANY (0.00%)
<not counted> UOPS_RETIRED.MACRO_FUSED (0.00%)
<not counted> IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE (0.00%)
<not counted> EXE_ACTIVITY.2_PORTS_UTIL (0.00%)
<not counted> CYCLE_ACTIVITY.STALLS_TOTAL (0.00%)
<not counted> MACHINE_CLEARS.COUNT (0.00%)
<not counted> UOPS_ISSUED.ANY (0.00%)

0.002864879 seconds time elapsed

0.003012000 seconds user
0.000000000 seconds sys
'''

When events aren't supported a count of 0 can be confusing and make
metrics look meaningful. Change these to be nan also which, with the
next change, gets shown like:

'''
$ perf stat true
Performance counter stats for 'true':

1.25 msec task-clock:u # 0.387 CPUs utilized
0 context-switches:u # 0.000 /sec
0 cpu-migrations:u # 0.000 /sec
46 page-faults:u # 36.702 K/sec
255,942 cycles:u # 0.204 GHz (88.66%)
123,046 instructions:u # 0.48 insn per cycle
28,301 branches:u # 22.580 M/sec
2,489 branch-misses:u # 8.79% of all branches
4,719 CPU_CLK_UNHALTED.REF_XCLK:u # 3.765 M/sec
# nan % tma_frontend_bound
# nan % tma_retiring
# nan % tma_backend_bound
# nan % tma_bad_speculation
344,855 IDQ_UOPS_NOT_DELIVERED.CORE:u # 275.147 M/sec
<not supported> INT_MISC.RECOVERY_CYCLES_ANY:u
<not counted> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE:u (0.00%)
<not counted> CPU_CLK_UNHALTED.THREAD:u (0.00%)
<not counted> UOPS_RETIRED.RETIRE_SLOTS:u (0.00%)
<not counted> UOPS_ISSUED.ANY:u (0.00%)

0.003238142 seconds time elapsed

0.000000000 seconds user
0.003434000 seconds sys
'''

Ensure that nan metric values are quoted as nan isn't a valid number
in JSON.

Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ahmad Yasin <ahmad.yasin@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Edward Baker <edward.baker@intel.com>
Cc: Florian Fischer <florian.fischer@muhq.space>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kang Minchul <tegongkang@gmail.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Samantha Alt <samantha.alt@intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Link: https://lore.kernel.org/r/20230502223851.2234828-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# ce5b8590 11-Mar-2023 Ian Rogers <irogers@google.com>

perf stat: Modify the group test

Currently nr_members is 0 for an event with no group, however, they
are always a leader of their own group. A later change will make that
count 1 because the event is its own leader. Make the find_stat logic
consistent with this, an improvement suggested by Namhyung Kim.

Suggested-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Florian Fischer <florian.fischer@muhq.space>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Kim Phillips <kim.phillips@amd.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Steinar H. Gunderson <sesse@google.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20230312021543.3060328-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# aa0964e3 19-Feb-2023 Ian Rogers <irogers@google.com>

perf stat: Remove saved_value/runtime_stat

As saved_value/runtime_stat are only written to and not read, remove
the associated logic and writes.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexandre Torgue <alexandre.torgue@foss.st.com>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Eduard Zingerman <eddyz87@gmail.com>
Cc: Florian Fischer <florian.fischer@muhq.space>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-stm32@st-md-mailman.stormreply.com
Link: https://lore.kernel.org/r/20230219092848.639226-52-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# 0a57b910 19-Feb-2023 Ian Rogers <irogers@google.com>

perf stat: Use counts rather than saved_value

Switch the hard coded metrics to use the aggregate value rather than
from saved_value. When computing a metric like IPC the aggregate count
comes from instructions then cycles is looked up and if present IPC
computed. Rather than lookup from the saved_value rbtree, search the
counter's evlist for the desired counter.

A new helper evsel__stat_type is used to both quickly find a metric
function and to identify when a counter is the one being sought. So
that both total and miss counts can be sought, the stat_type enum is
expanded. The ratio functions are rewritten to share a common helper
with the ratios being directly passed rather than computed from an
enum value.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexandre Torgue <alexandre.torgue@foss.st.com>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Eduard Zingerman <eddyz87@gmail.com>
Cc: Florian Fischer <florian.fischer@muhq.space>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-stm32@st-md-mailman.stormreply.com
Link: https://lore.kernel.org/r/20230219092848.639226-51-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# 37cc8ad7 19-Feb-2023 Ian Rogers <irogers@google.com>

perf metric: Directly use counts rather than saved_value

Bugs with double aggregation have been introduced because of
aggregation of counters and again with saved_value. Remove the generic
metric use-case. Update parse-metric and pmu-events tests to update
aggregate rather than saved_value counts.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexandre Torgue <alexandre.torgue@foss.st.com>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Eduard Zingerman <eddyz87@gmail.com>
Cc: Florian Fischer <florian.fischer@muhq.space>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-stm32@st-md-mailman.stormreply.com
Link: https://lore.kernel.org/r/20230219092848.639226-50-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# 8945bef3 19-Feb-2023 Ian Rogers <irogers@google.com>

perf stat: Add cpu_aggr_map for loop

Rename variables, add a comment and add a cpu_aggr_map__for_each_idx
to aid the readability of the stat-display code. In particular, try to
make sure aggr_idx is used consistently to differentiate from other
kinds of index.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexandre Torgue <alexandre.torgue@foss.st.com>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Eduard Zingerman <eddyz87@gmail.com>
Cc: Florian Fischer <florian.fischer@muhq.space>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-stm32@st-md-mailman.stormreply.com
Link: https://lore.kernel.org/r/20230219092848.639226-49-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# cc26ffaa 19-Feb-2023 Ian Rogers <irogers@google.com>

perf stat: Hide runtime_stat

runtime_stat is only shared for the sake of tests that don't care
about its value. Move the definition into stat-shadow.c and have the
tests also use the global version.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexandre Torgue <alexandre.torgue@foss.st.com>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Eduard Zingerman <eddyz87@gmail.com>
Cc: Florian Fischer <florian.fischer@muhq.space>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-stm32@st-md-mailman.stormreply.com
Link: https://lore.kernel.org/r/20230219092848.639226-48-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# 758bc8e6 19-Feb-2023 Ian Rogers <irogers@google.com>

perf stat: Move enums from header

The enums are only used in stat-shadow.c, so narrow their scope by
moving to the C file.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexandre Torgue <alexandre.torgue@foss.st.com>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Eduard Zingerman <eddyz87@gmail.com>
Cc: Florian Fischer <florian.fischer@muhq.space>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-stm32@st-md-mailman.stormreply.com
Link: https://lore.kernel.org/r/20230219092848.639226-47-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# c23f5cc0 19-Feb-2023 Ian Rogers <irogers@google.com>

perf stat: Use metrics for --smi-cost

Rather than parsing events for --smi-cost, use the json metric group
'smi'.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexandre Torgue <alexandre.torgue@foss.st.com>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Eduard Zingerman <eddyz87@gmail.com>
Cc: Florian Fischer <florian.fischer@muhq.space>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-stm32@st-md-mailman.stormreply.com
Link: https://lore.kernel.org/r/20230219092848.639226-45-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# d6964c5b 19-Feb-2023 Ian Rogers <irogers@google.com>

perf stat: Remove hard coded transaction events

The metric group "transaction" is now present for Intel architectures
so the legacy hard coded approach won't be used. Remove the associated
logic.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexandre Torgue <alexandre.torgue@foss.st.com>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Eduard Zingerman <eddyz87@gmail.com>
Cc: Florian Fischer <florian.fischer@muhq.space>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-stm32@st-md-mailman.stormreply.com
Link: https://lore.kernel.org/r/20230219092848.639226-44-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# 7b86475f 19-Feb-2023 Ian Rogers <irogers@google.com>

perf stat: Remove topdown event special handling

Now the events are computed from json metrics, the hard coded logic
can be removed.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexandre Torgue <alexandre.torgue@foss.st.com>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Eduard Zingerman <eddyz87@gmail.com>
Cc: Florian Fischer <florian.fischer@muhq.space>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-stm32@st-md-mailman.stormreply.com
Link: https://lore.kernel.org/r/20230219092848.639226-42-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# 1fd09e29 19-Feb-2023 Ian Rogers <irogers@google.com>

perf metric: Add --metric-no-threshold option

Thresholds may need additional events, this can impact things like
sharing groups of events to avoid multiplexing. Add a flag to make the
threshold calculations optional. The threshold will still be computed
if no additional events are necessary.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexandre Torgue <alexandre.torgue@foss.st.com>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Eduard Zingerman <eddyz87@gmail.com>
Cc: Florian Fischer <florian.fischer@muhq.space>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-stm32@st-md-mailman.stormreply.com
Link: https://lore.kernel.org/r/20230219092848.639226-39-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# d0a3052f 19-Feb-2023 Ian Rogers <irogers@google.com>

perf metric: Compute and print threshold values

Compute the threshold metric and use it to color the metric value as
red or green. The threshold expression is used to generate the set of
events as the threshold may require additional events. A later patch
make this behavior optional with a --metric-no-threshold flag.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexandre Torgue <alexandre.torgue@foss.st.com>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Eduard Zingerman <eddyz87@gmail.com>
Cc: Florian Fischer <florian.fischer@muhq.space>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-stm32@st-md-mailman.stormreply.com
Link: https://lore.kernel.org/r/20230219092848.639226-37-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# 37f322cd 08-Feb-2023 Ian Rogers <irogers@google.com>

perf stat: Avoid merging/aggregating metric counts twice

The added perf_stat_merge_counters combines uncore counters. When
metrics are enabled, the counts are merged into a metric_leader via the
stat-shadow saved_value logic. As the leader now is passed an aggregated
count, it leads to all counters being added together twice and counts
appearing approximately doubled in metrics.

This change disables the saved_value merging of counts for evsels that
are merged. It is recommended that later changes remove the saved_value
entirely as the two layers of aggregation in the code is confusing.

Fixes: 942c5593393d9418 ("perf stat: Add perf_stat_merge_counters()")
Reported-by: Perry Taylor <perry.taylor@intel.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Eduard Zingerman <eddyz87@gmail.com>
Cc: Florian Fischer <florian.fischer@muhq.space>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20230209064447.83733-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# 6f8f98ab 26-Jan-2023 Ian Rogers <irogers@google.com>

perf stat: Remove evsel metric_name/expr

Metrics are their own unit and these variables held broken metrics
previously and now just hold the value NULL. Remove code that used
these variables.

Reviewed-by: John Garry <john.g.garry@oracle.com>
Reviewed-by: Kajol Jain <kjain@linux.ibm.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Florian Fischer <florian.fischer@muhq.space>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Kang Minchul <tegongkang@gmail.com>
Cc: Kim Phillips <kim.phillips@amd.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Will Deacon <will@kernel.org>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linuxppc-dev@lists.ozlabs.org
Link: https://lore.kernel.org/r/20230126233645.200509-8-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# bd560973 09-Nov-2022 Ian Rogers <irogers@google.com>

perf expr: Tidy hashmap dependency

hashmap.h comes from libbpf but isn't installed with its
headers. Always use the header file of the code in util. Change the
hashmap.h dependency in expr.h to a forward declaration, add the
necessary header file includes in the C files.

Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Nicolas Schier <nicolas@fjasle.eu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: bpf@vger.kernel.org
Link: http://lore.kernel.org/lkml/20221109184914.1357295-12-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# c302378b 09-Nov-2022 Eduard Zingerman <eddyz87@gmail.com>

libbpf: Hashmap interface update to allow both long and void* keys/values

An update for libbpf's hashmap interface from void* -> void* to a
polymorphic one, allowing both long and void* keys and values.

This simplifies many use cases in libbpf as hashmaps there are mostly
integer to integer.

Perf copies hashmap implementation from libbpf and has to be
updated as well.

Changes to libbpf, selftests/bpf and perf are packed as a single
commit to avoid compilation issues with any future bisect.

Polymorphic interface is acheived by hiding hashmap interface
functions behind auxiliary macros that take care of necessary
type casts, for example:

#define hashmap_cast_ptr(p) \
({ \
_Static_assert((p) == NULL || sizeof(*(p)) == sizeof(long),\
#p " pointee should be a long-sized integer or a pointer"); \
(long *)(p); \
})

bool hashmap_find(const struct hashmap *map, long key, long *value);

#define hashmap__find(map, key, value) \
hashmap_find((map), (long)(key), hashmap_cast_ptr(value))

- hashmap__find macro casts key and value parameters to long
and long* respectively
- hashmap_cast_ptr ensures that value pointer points to a memory
of appropriate size.

This hack was suggested by Andrii Nakryiko in [1].
This is a follow up for [2].

[1] https://lore.kernel.org/bpf/CAEf4BzZ8KFneEJxFAaNCCFPGqp20hSpS2aCj76uRk3-qZUH5xg@mail.gmail.com/
[2] https://lore.kernel.org/bpf/af1facf9-7bc8-8a3d-0db4-7b3f333589a2@meta.com/T/#m65b28f1d6d969fcd318b556db6a3ad499a42607d

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20221109142611.879983-2-eddyz87@gmail.com


# 01b8957b 30-Sep-2022 Namhyung Kim <namhyung@kernel.org>

perf stat: Don't compare runtime stat for shadow stats

Now it always uses the global rt_stat. Let's get rid of the field from
the saved_value. When the both evsels are NULL, it'd return 0 so remove
the block in the saved_value_cmp.

Reviewed-by: James Clark <james.clark@arm.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20220930202110.845199-7-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# 87ae87fd 30-Sep-2022 Namhyung Kim <namhyung@kernel.org>

perf stat: Use thread map index for shadow stat

When AGGR_THREAD is active, it aggregates the values for each thread.
Previously it used cpu map index which is invalid for AGGR_THREAD so
it had to use separate runtime stats with index 0.

But it can just use the rt_stat with thread_map_index. Rename the
first_shadow_map_idx() and make it return the thread index.

Reviewed-by: James Clark <james.clark@arm.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20220930202110.845199-5-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# dfca2d69 30-Sep-2022 Namhyung Kim <namhyung@kernel.org>

perf stat: Rename saved_value->cpu_map_idx

The cpu_map_idx fields is just to differentiate values from other
entries. It doesn't need to be strictly cpu map index. Actually we can
pass thread map index or aggr map index. So rename the fields first.

No functional change intended.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20220930202110.845199-4-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# 8cff7490 03-Oct-2022 Ian Rogers <irogers@google.com>

perf metrics: Don't scale counts going into metrics

Counts are scaled prior to going into saved_value, reverse the scaling
so that metrics don't double scale values.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Ahmad Yasin <ahmad.yasin@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Florian Fischer <florian.fischer@muhq.space>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Kshipra Bopardikar <kshipra.bopardikar@intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Miaoqian Lin <linmq006@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Samantha Alt <samantha.alt@intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20221004021612.325521-5-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# 1725e9cd 31-Aug-2022 Ian Rogers <irogers@google.com>

perf metrics: Wire up core_wide

Pass state necessary for core_wide into the expression parser. Add
system_wide and user_requested_cpu_list to perf_stat_config to make it
available at display time. evlist isn't used as the
evlist__create_maps, that computes user_requested_cpus, needs the list
of events which is generated by the metric.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Ahmad Yasin <ahmad.yasin@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Florian Fischer <florian.fischer@muhq.space>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Kshipra Bopardikar <kshipra.bopardikar@intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Miaoqian Lin <linmq006@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20220831174926.579643-7-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# 1a6abdde 31-Aug-2022 Ian Rogers <irogers@google.com>

perf expr: Move the scanner_ctx into the parse_ctx

We currently maintain the two independently and copy from one to the
other. This is a burden when additional scanner context values are
necessary, so combine them.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Ahmad Yasin <ahmad.yasin@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Florian Fischer <florian.fischer@muhq.space>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Kshipra Bopardikar <kshipra.bopardikar@intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Miaoqian Lin <linmq006@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20220831174926.579643-3-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# 48648548 24-Aug-2022 Zhengjun Xing <zhengjun.xing@linux.intel.com>

perf stat: Capitalize topdown metrics' names

Capitalize topdown metrics' names to follow the intel SDM.

Before:

# ./perf stat -a sleep 1

Performance counter stats for 'system wide':

228,094.05 msec cpu-clock # 225.026 CPUs utilized
842 context-switches # 3.691 /sec
224 cpu-migrations # 0.982 /sec
70 page-faults # 0.307 /sec
23,164,105 cycles # 0.000 GHz
29,403,446 instructions # 1.27 insn per cycle
5,268,185 branches # 23.097 K/sec
33,239 branch-misses # 0.63% of all branches
136,248,990 slots # 597.337 K/sec
32,976,450 topdown-retiring # 24.2% retiring
4,651,918 topdown-bad-spec # 3.4% bad speculation
26,148,695 topdown-fe-bound # 19.2% frontend bound
72,515,776 topdown-be-bound # 53.2% backend bound
6,008,540 topdown-heavy-ops # 4.4% heavy operations # 19.8% light operations
3,934,049 topdown-br-mispredict # 2.9% branch mispredict # 0.5% machine clears
16,655,439 topdown-fetch-lat # 12.2% fetch latency # 7.0% fetch bandwidth
41,635,972 topdown-mem-bound # 30.5% memory bound # 22.7% Core bound

1.013634593 seconds time elapsed

After:

# ./perf stat -a sleep 1

Performance counter stats for 'system wide':

228,081.94 msec cpu-clock # 225.003 CPUs utilized
824 context-switches # 3.613 /sec
224 cpu-migrations # 0.982 /sec
67 page-faults # 0.294 /sec
22,647,423 cycles # 0.000 GHz
28,870,551 instructions # 1.27 insn per cycle
5,167,099 branches # 22.655 K/sec
32,383 branch-misses # 0.63% of all branches
133,411,074 slots # 584.926 K/sec
32,352,607 topdown-retiring # 24.3% Retiring
4,456,977 topdown-bad-spec # 3.3% Bad Speculation
25,626,487 topdown-fe-bound # 19.2% Frontend Bound
70,955,316 topdown-be-bound # 53.2% Backend Bound
5,834,844 topdown-heavy-ops # 4.4% Heavy Operations # 19.9% Light Operations
3,738,781 topdown-br-mispredict # 2.8% Branch Mispredict # 0.5% Machine Clears
16,286,803 topdown-fetch-lat # 12.2% Fetch Latency # 7.0% Fetch Bandwidth
40,802,069 topdown-mem-bound # 30.6% Memory Bound # 22.6% Core Bound

1.013683125 seconds time elapsed

Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220825015458.3252239-1-zhengjun.xing@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# 9aa09230 06-May-2022 Ian Rogers <irogers@google.com>

perf metrics: Support all tool events

Previously duration_time was hard coded, which was ok until commit
b03b89b350034f22 ("perf stat: Add user_time and system_time events")
added additional tool events. Do for all tool events what was previously
done just for duration_time.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Florian Fischer <florian.fischer@muhq.space>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Kim Phillips <kim.phillips@amd.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Shunsuke Nakamura <nakamura.shun@fujitsu.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20220507053410.3798748-5-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# c735b0a5 19-Apr-2022 Florian Fischer <florian.fischer@muhq.space>

perf stat: Introduce stats for the user and system rusage times

This is preparation for exporting rusage values as tool events.

Add new global stats tracking the values obtained via rusage.

For now only ru_utime and ru_stime are part of the tracked stats.

Both are stored as nanoseconds to be consistent with 'duration_time',
although the finest resolution the struct timeval data in rusage
provides are microseconds.

Signed-off-by: Florian Fischer <florian.fischer@muhq.space>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20220420102354.468173-2-florian.fischer@muhq.space
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# 5b1af93d 04-Jan-2022 Ian Rogers <irogers@google.com>

perf stat: Swap variable name cpu to index

The use of CPU is error prone, switch to cpu_map_idx.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Clarke <pc@us.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Vineet Singh <vineet.singh@intel.com>
Cc: coresight@lists.linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: zhengjun.xing@intel.com
Link: https://lore.kernel.org/r/20220105061351.120843-43-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# 9aba0ada 10-Nov-2021 Ian Rogers <irogers@google.com>

perf expr: Add source_count for aggregating events

Events like uncore_imc/cas_count_read/ on Skylake open multiple events
and then aggregate in the metric leader. To determine the average value
per event the number of these events is needed. Add a source_count
function that returns this value by counting the number of events with
the given metric leader. For most events the value is 1 but for
uncore_imc/cas_count_read/ it can yield values like 6.

Add a generic test, but manually tested with a test metric that uses
the function.

Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul A . Clarke <pc@us.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Song Liu <song@kernel.org>
Cc: Wan Jiabing <wanjiabing@vivo.com>
Cc: Yury Norov <yury.norov@gmail.com>
Link: https://lore.kernel.org/r/20211111002109.194172-9-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# e4e29079 07-Nov-2021 Ian Rogers <irogers@google.com>

perf stat: Fix memory leak on error path

strdup() is used to deduplicate, ensure it isn't leaking an already
created string by freeing first.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20211107085444.3781604-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# ec5c5b3d 15-Oct-2021 Ian Rogers <irogers@google.com>

perf metric: Encode and use metric-id as qualifier

For a metric like IPC a group of events like {instructions,cycles}:W
would be formed.

If the events names were changed in parsing then the metric expression
parser would fail to find them.

This change makes the event encoding be something like:

{instructions/metric-id=instructions/, cycles/metric-id=cycles/}

and then uses the evsel's stable metric-id value to locate the events.

This fixes the case that an event is restricted to user because of the
paranoia setting:

$ echo 2 > /proc/sys/kernel/perf_event_paranoid
$ perf stat -M IPC /bin/true
Performance counter stats for '/bin/true':

150,298 inst_retired.any:u # 0.77 IPC
187,095 cpu_clk_unhalted.thread:u

0.002042731 seconds time elapsed

0.000000000 seconds user
0.002377000 seconds sys

Adding the metric-id as a qualifier has a complication in that
qualifiers will become embedded in qualifiers.

For example, msr/tsc/ could become msr/tsc,metric-id=msr/tsc// which
will fail parse-events.

To solve this problem the metric is encoded and decoded for the
metric-id with !<num> standing in for an encoded value.

Previously ! wasn't parsed.

With this msr/tsc/ becomes msr/tsc,metric-id=msr!3tsc!3/

The metric expression parser is changed so that @ isn't changed to /,
instead this is done when the ID is encoded for parse events.

metricgroup__add_metric_non_group() and metricgroup__add_metric_weak_group()
need to inject the metric-id qualifier, so to avoid repetition they are
merged into a single metricgroup__build_event_string with error codes
more rigorously checked.

stat-shadow's prepare_metric() uses the metric-id to match the metricgroup
code.

As "metric-id=..." is added to all events, it is adding during testing
with the fake PMU.

This complicates pmu_str_check code as PE_PMU_EVENT_FAKE won't match as
part of a configuration.

The testing fake PMU case is fixed so that if a known qualifier with an
! is parsed then it isn't reported as a fake PMU.

This is sufficient to pass all testing but it and the original mechanism
are somewhat brittle.

Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Andi Kleen <ak@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Antonov <alexander.antonov@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andrew Kilroy <andrew.kilroy@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Changbin Du <changbin.du@intel.com>
Cc: Denys Zagorui <dzagorui@cisco.com>
Cc: Fabian Hemmer <copy@copy.sh>
Cc: Felix Fietkau <nbd@nbd.name>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jacob Keller <jacob.e.keller@intel.com>
Cc: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Joakim Zhang <qiangqing.zhang@nxp.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Kees Kook <keescook@chromium.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nicholas Fraser <nfraser@codeweavers.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Paul Clarke <pc@us.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Sami Tolvanen <samitolvanen@google.com>
Cc: ShihCheng Tu <mrtoastcheng@gmail.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Wan Jiabing <wanjiabing@vivo.com>
Cc: Zhen Lei <thunder.leizhen@huawei.com>
Link: https://lore.kernel.org/r/20211015172132.1162559-17-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# fa831fbb 15-Oct-2021 Ian Rogers <irogers@google.com>

perf metric: Move runtime value to the expr context

The runtime value is needed when recursively parsing metrics, currently
a value of 1 is passed which is incorrect.

Rather than add more arguments to the bison parser, add runtime to the
context.

Fix call sites not to pass a value. The runtime value is defaulted to 0,
which is arbitrary. In some places this replaces a value of 1, which was
also arbitrary.

This shouldn't affect anything other than PPC.

The use of 0 or 1 shouldn't matter as a proper runtime value would be
needed in a case that it did matter.

Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Andi Kleen <ak@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Antonov <alexander.antonov@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andrew Kilroy <andrew.kilroy@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Changbin Du <changbin.du@intel.com>
Cc: Denys Zagorui <dzagorui@cisco.com>
Cc: Fabian Hemmer <copy@copy.sh>
Cc: Felix Fietkau <nbd@nbd.name>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jacob Keller <jacob.e.keller@intel.com>
Cc: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Joakim Zhang <qiangqing.zhang@nxp.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Kees Kook <keescook@chromium.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nicholas Fraser <nfraser@codeweavers.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Paul Clarke <pc@us.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Sami Tolvanen <samitolvanen@google.com>
Cc: ShihCheng Tu <mrtoastcheng@gmail.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Wan Jiabing <wanjiabing@vivo.com>
Cc: Zhen Lei <thunder.leizhen@huawei.com>
Link: https://lore.kernel.org/r/20211015172132.1162559-6-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# 7e06a5e3 23-Sep-2021 Ian Rogers <irogers@google.com>

perf metric: Rename expr__find_other.

A later change will remove the notion of other, rename the function to
expr__find_ids as this is what it populates.

Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: John Garry <john.garry@huawei.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Clarke <pc@us.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sandeep Dasgupta <sdasgup@google.com>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20210923074616.674826-8-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# cb94a02e 23-Sep-2021 Ian Rogers <irogers@google.com>

perf metric: Restructure struct expr_parse_ctx.

A later change to parsing the ids out (in expr__find_other) will
potentially drop hashmaps and so it is more convenient to move
expr_parse_ctx to have a hashmap pointer rather than a struct value.

As this pointer must be freed, rather than just going out of scope, add
expr__ctx_new and expr__ctx_free to manage expr_parse_ctx memory.
Adjust use of struct expr_parse_ctx accordingly.

Reviewed-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: John Garry <john.garry@huawei.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Clarke <pc@us.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sandeep Dasgupta <sdasgup@google.com>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20210923074616.674826-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# fba7c866 06-Jul-2021 Jiri Olsa <jolsa@redhat.com>

libperf: Move 'leader' from tools/perf to perf_evsel::leader

Move evsel::leader to perf_evsel::leader, so we can move the group
interface to libperf.

Also add several evsel helpers to ease up the transition:

struct evsel *evsel__leader(struct evsel *evsel);
- get leader evsel

bool evsel__has_leader(struct evsel *evsel, struct evsel *leader);
- true if evsel has leader as leader

bool evsel__is_leader(struct evsel *evsel);
- true if evsel is itw own leader

void evsel__set_leader(struct evsel *evsel, struct evsel *leader);
- set leader for evsel

Committer notes:

Fix this when building with 'make BUILD_BPF_SKEL=1'

tools/perf/util/bpf_counter.c

- if (evsel->leader->core.nr_members > 1) {
+ if (evsel->core.leader->nr_members > 1) {

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Requested-by: Shunsuke Nakamura <nakamura.shun@fujitsu.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20210706151704.73662-4-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# f07952b1 18-Apr-2021 Alexander Antonov <alexander.antonov@linux.intel.com>

perf stat: Basic support for iostat in perf

Add basic flow for a new iostat mode in perf. Mode is intended to
provide four I/O performance metrics per each PCIe root port: Inbound Read,
Inbound Write, Outbound Read, Outbound Write.

The actual code to compute the metrics and attribute it to
root port is in follow-on patches.

Signed-off-by: Alexander Antonov <alexander.antonov@linux.intel.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey V Bayduraev <alexey.v.bayduraev@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20210419094147.15909-2-alexander.antonov@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# 6859bc0e 15-Mar-2021 Changbin Du <changbin.du@intel.com>

perf stat: Improve readability of shadow stats

This adds function convert_unit_double() and selects appropriate
unit for shadow stats between K/M/G.

$ sudo perf stat -a -- sleep 1

Before: Unit 'M' is selected even the number is very small.

Performance counter stats for 'system wide':

4,003.06 msec cpu-clock # 3.998 CPUs utilized
16,179 context-switches # 0.004 M/sec
161 cpu-migrations # 0.040 K/sec
4,699 page-faults # 0.001 M/sec
6,135,801,925 cycles # 1.533 GHz (83.21%)
5,783,308,491 stalled-cycles-frontend # 94.26% frontend cycles idle (83.21%)
4,543,694,050 stalled-cycles-backend # 74.05% backend cycles idle (66.49%)
4,720,130,587 instructions # 0.77 insn per cycle
# 1.23 stalled cycles per insn (83.28%)
753,848,078 branches # 188.318 M/sec (83.61%)
37,457,747 branch-misses # 4.97% of all branches (83.48%)

1.001283725 seconds time elapsed

After:

$ sudo perf stat -a -- sleep 2

Performance counter stats for 'system wide':

8,005.52 msec cpu-clock # 3.999 CPUs utilized
10,715 context-switches # 1.338 K/sec
785 cpu-migrations # 98.057 /sec
102 page-faults # 12.741 /sec
1,948,202,279 cycles # 0.243 GHz
2,816,470,932 stalled-cycles-frontend # 144.57% frontend cycles idle
2,661,172,207 stalled-cycles-backend # 136.60% backend cycles idle
464,172,105 instructions # 0.24 insn per cycle
# 6.07 stalled cycles per insn
91,567,662 branches # 11.438 M/sec
7,756,054 branch-misses # 8.47% of all branches

2.002040043 seconds time elapsed

v2:
o do not change 'sec' to 'cpu-sec'.
o use convert_unit_double to implement convert_unit.

Signed-off-by: Changbin Du <changbin.du@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20210315143047.3867-1-changbin.du@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# 63e39aa6 02-Feb-2021 Kan Liang <kan.liang@linux.intel.com>

perf stat: Support L2 Topdown events

The TMA method level 2 metrics is supported from the Intel Sapphire
Rapids server, which expose four L2 Topdown metrics events to user
space. There are eight L2 events in total. The other four L2 Topdown
metrics events are calculated from the corresponding L1 and the exposed
L2 events.

Now, the --topdown prints the complete top-down metrics that supported
by the CPU. For the Intel Sapphire Rapids server, there are 4 L1 events
and 8 L2 events displyed in one line.

Add a new option, --td-level, to display the top-down statistics that
equal to or lower than the input level.

The L2 event is marked only when both its L1 parent event and itself
crosse the threshold.

Here is an example:

$ perf stat --topdown --td-level=2 --no-metric-only sleep 1
Topdown accuracy may decrease when measuring long periods.
Please print the result regularly, e.g. -I1000

Performance counter stats for 'sleep 1':

16,734,390 slots
2,100,001 topdown-retiring # 12.6% retiring
2,034,376 topdown-bad-spec # 12.3% bad speculation
4,003,128 topdown-fe-bound # 24.1% frontend bound
328,125 topdown-heavy-ops # 2.0% heavy operations # 10.6% light operations
1,968,751 topdown-br-mispredict # 11.9% branch mispredict # 0.4% machine clears
2,953,127 topdown-fetch-lat # 17.8% fetch latency # 6.3% fetch bandwidth
5,906,255 topdown-mem-bound # 35.6% memory bound # 15.4% core bound

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/1612296553-21962-9-git-send-email-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# a1bf2305 15-Jan-2021 Namhyung Kim <namhyung@kernel.org>

perf stat: Take cgroups into account for shadow stats

As of now it doesn't consider cgroups when collecting shadow stats and
metrics so counter values from different cgroups will be saved in a same
slot. This resulted in incorrect numbers when those cgroups have
different workloads.

For example, let's look at the scenario below: cgroups A and C runs same
workload which burns a cpu while cgroup B runs a light workload.

$ perf stat -a -e cycles,instructions --for-each-cgroup A,B,C sleep 1

Performance counter stats for 'system wide':

3,958,116,522 cycles A
6,722,650,929 instructions A # 2.53 insn per cycle
1,132,741 cycles B
571,743 instructions B # 0.00 insn per cycle
4,007,799,935 cycles C
6,793,181,523 instructions C # 2.56 insn per cycle

1.001050869 seconds time elapsed

When I run 'perf stat' with single workload, it usually shows IPC around
1.7. We can verify it (6,722,650,929.0 / 3,958,116,522 = 1.698) for cgroup A.

But in this case, since cgroups are ignored, cycles are averaged so it
used the lower value for IPC calculation and resulted in around 2.5.

avg cycle: (3958116522 + 1132741 + 4007799935) / 3 = 2655683066
IPC (A) : 6722650929 / 2655683066 = 2.531
IPC (B) : 571743 / 2655683066 = 0.0002
IPC (C) : 6793181523 / 2655683066 = 2.557

We can simply compare cgroup pointers in the evsel and it'll be NULL
when cgroups are not specified. With this patch, I can see correct
numbers like below:

$ perf stat -a -e cycles,instructions --for-each-cgroup A,B,C sleep 1

Performance counter stats for 'system wide':

4,171,051,687 cycles A
7,219,793,922 instructions A # 1.73 insn per cycle
1,051,189 cycles B
583,102 instructions B # 0.55 insn per cycle
4,171,124,710 cycles C
7,192,944,580 instructions C # 1.72 insn per cycle

1.007909814 seconds time elapsed

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20210115071139.257042-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# 3ff1e718 15-Jan-2021 Namhyung Kim <namhyung@kernel.org>

perf stat: Introduce struct runtime_stat_data

To pass more info to the saved_value in the runtime_stat, add a new
struct runtime_stat_data. Currently it only has 'ctx' field but later
patch will add more.

Note that we intentionally pass 0 as ctx to clock-related events for
compatibility. It was already there in a few places. So move the code
into the saved_value_lookup() explicitly and add a comment.

Suggested-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20210115071139.257042-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# 55c36a9f 11-Sep-2020 Andi Kleen <ak@linux.intel.com>

perf stat: Support new per thread TopDown metrics

Icelake has support for reporting per thread TopDown metrics.

These are reported differently than the previous TopDown support,
each metric is standalone, but scaled to pipeline "slots".

We don't need to do anything special for HyperThreading anymore.
Teach perf stat --topdown to handle these new metrics and
print them in the same way as the previous TopDown metrics.

The restrictions of only being able to report information per core is
gone.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Co-developed-by: Kan Liang <kan.liang@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20200911144808.27603-4-kan.liang@linux.intel.com
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# ce9c13f3 16-Sep-2020 Qi Liu <liuqi115@huawei.com>

perf stat: Fix the ratio comments of miss-events

'perf stat' displays miss ratio of L1-dcache, L1-icache, dTLB cache,
iTLB cache and LL-cache. Take L1-dcache for example, miss ratio is
caculated as "L1-dcache-load-misses/L1-dcache-loads". So "of all
L1-dcache hits" is unsuitable to describe it, and "of all L1-dcache
accesses" seems better.

The comments of L1-icache, dTLB cache, iTLB cache and LL-cache are
fixed in the same way.

Signed-off-by: Qi Liu <liuqi115@huawei.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: linuxarm@huawei.com
Link: http://lore.kernel.org/lkml/1600253331-10535-1-git-send-email-liuqi115@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# 437822bf 14-Sep-2020 Namhyung Kim <namhyung@kernel.org>

perf metric: Release expr_parse_ctx after testing

The test_generic_metric() missed to release entries in the pctx. Asan
reported following leak (and more):

Direct leak of 128 byte(s) in 1 object(s) allocated from:
#0 0x7f4c9396980e in calloc (/lib/x86_64-linux-gnu/libasan.so.5+0x10780e)
#1 0x55f7e748cc14 in hashmap_grow (/home/namhyung/project/linux/tools/perf/perf+0x90cc14)
#2 0x55f7e748d497 in hashmap__insert (/home/namhyung/project/linux/tools/perf/perf+0x90d497)
#3 0x55f7e7341667 in hashmap__set /home/namhyung/project/linux/tools/perf/util/hashmap.h:111
#4 0x55f7e7341667 in expr__add_ref util/expr.c:120
#5 0x55f7e7292436 in prepare_metric util/stat-shadow.c:783
#6 0x55f7e729556d in test_generic_metric util/stat-shadow.c:858
#7 0x55f7e712390b in compute_single tests/parse-metric.c:128
#8 0x55f7e712390b in __compute_metric tests/parse-metric.c:180
#9 0x55f7e712446d in compute_metric tests/parse-metric.c:196
#10 0x55f7e712446d in test_dcache_l2 tests/parse-metric.c:295
#11 0x55f7e712446d in test__parse_metric tests/parse-metric.c:355
#12 0x55f7e70be09b in run_test tests/builtin-test.c:410
#13 0x55f7e70be09b in test_and_print tests/builtin-test.c:440
#14 0x55f7e70c101a in __cmd_test tests/builtin-test.c:661
#15 0x55f7e70c101a in cmd_test tests/builtin-test.c:807
#16 0x55f7e7126214 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:312
#17 0x55f7e6fc41a8 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:364
#18 0x55f7e6fc41a8 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:408
#19 0x55f7e6fc41a8 in main /home/namhyung/project/linux/tools/perf/perf.c:538
#20 0x7f4c93492cc9 in __libc_start_main ../csu/libc-start.c:308

Fixes: 6d432c4c8aa56 ("perf tools: Add test_generic_metric function")
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20200915031819.386559-8-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# fc393839 19-Jul-2020 Jiri Olsa <jolsa@kernel.org>

perf metric: Add referenced metrics to hash data

Adding referenced metrics to the parsing context so they can be resolved
during the metric processing.

Adding expr__add_ref function to store referenced metrics into parse
context.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Reviewed-by: Kajol Jain <kjain@linux.ibm.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Clarke <pc@us.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20200719181320.785305-11-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# 2c46f542 12-Jul-2020 Jiri Olsa <jolsa@kernel.org>

perf metric: Rename expr__add_id() to expr__add_val()

Rename expr__add_id() to expr__add_val() so we can use expr__add_id() to
actually add just the id without any value in following changes.

There's no functional change.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Clarke <pc@us.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20200712132634.138901-2-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# 6d432c4c 02-Jun-2020 Jiri Olsa <jolsa@kernel.org>

perf tools: Add test_generic_metric function

Adding test_generic_metric that prepares and runs given metric over the
data from struct runtime_stat object.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20200602214741.1218986-12-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# 2cfaa853 02-Jun-2020 Jiri Olsa <jolsa@kernel.org>

perf tools: Factor out prepare_metric function

Factoring out prepare_metric function so it can be used in test
interface coming in following changes.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20200602214741.1218986-10-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# 5f09ca5a 24-May-2020 Jiri Olsa <jolsa@kernel.org>

perf stat: Do not pass avg to generic_metric

There's no need to pass the given evsel's count to metric data, because
it will be pushed again within the following metric_events loop.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20200524224219.234847-3-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# ded80bda 15-May-2020 Ian Rogers <irogers@google.com>

perf expr: Migrate expr ids table to a hashmap

Use a hashmap between a char* string and a double* value. While bpf's
hashmap entries are size_t in size, we can't guarantee sizeof(size_t) >=
sizeof(double). Avoid a memory allocation when gathering ids by making
0.0 a special value encoded as NULL.

Original map suggestion by Andi Kleen:

https://lore.kernel.org/lkml/20200224210308.GQ160988@tassilo.jf.intel.com/

and seconded by Jiri Olsa:

https://lore.kernel.org/lkml/20200423112915.GH1136647@krava/

Committer notes:

There are fixes that need to land upstream before we can use libbpf's
headers, for now use our copy unconditionally, since the data structures
at this point are exactly the same, no problem.

When the fixes for libbpf's hashmap land upstream, we can fix this up.

Testing it:

Building with LIBBPF=1, i.e. the default:

$ perf -vv | grep -i bpf
bpf: [ on ] # HAVE_LIBBPF_SUPPORT
$ nm ~/bin/perf | grep -i libbpf_ | wc -l
39
$ nm ~/bin/perf | grep -i hashmap_ | wc -l
17
$

Explicitely building without LIBBPF:

$ perf -vv | grep -i bpf
bpf: [ OFF ] # HAVE_LIBBPF_SUPPORT
$
$ nm ~/bin/perf | grep -i libbpf_ | wc -l
0
$ nm ~/bin/perf | grep -i hashmap_ | wc -l
9
$

Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrii Nakryiko <andriin@fb.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Kim Phillips <kim.phillips@amd.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Yonghong Song <yhs@fb.com>
Cc: bpf@vger.kernel.org
Cc: kp singh <kpsingh@chromium.org>
Cc: netdev@vger.kernel.org
Link: http://lore.kernel.org/lkml/20200515221732.44078-8-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# d4d5ca0b 07-May-2020 Paul A. Clarke <pc@us.ibm.com>

perf stat: Increase perf metric output resolution

Add another digit of precision to the perf metrics output.

Before:

$ /usr/bin/perf stat --metrics run_cpi /bin/ls
[...]
4,345,526 pm_run_cyc # 1.1 run_cpi
3,818,069 pm_run_inst_cmpl
[...]
$ /usr/bin/perf stat --metrics run_cpi --metric-only /bin/ls
[...]
run_cpi
1.1
[...]

After:

$ perf stat --metrics run_cpi /bin/ls
[...]
4,280,882 pm_run_cyc # 1.12 run_cpi
3,817,016 pm_run_inst_cmpl
[...]
$ perf stat --metrics run_cpi --metric-only /bin/ls
[...]
run_cpi
1.06
[...]

Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Paul Clarke <pc@us.ibm.com>
LPU-Reference: 1588861087-31280-1-git-send-email-pc@us.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# c754c382 30-Apr-2020 Arnaldo Carvalho de Melo <acme@redhat.com>

perf evsel: Rename perf_evsel__is_*() to evsel__is*()

As those are 'struct evsel' methods, not part of tools/lib/perf/, aka
libperf, to whom the perf_ prefix belongs.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# 1e1a873d 01-Apr-2020 Kajol Jain <kjain@linux.ibm.com>

perf metricgroups: Enhance JSON/metric infrastructure to handle "?"

Patch enhances current metric infrastructure to handle "?" in the metric
expression. The "?" can be use for parameters whose value not known
while creating metric events and which can be replace later at runtime
to the proper value. It also add flexibility to create multiple events
out of single metric event added in JSON file.

Patch adds function 'arch_get_runtimeparam' which is a arch specific
function, returns the count of metric events need to be created. By
default it return 1.

This infrastructure needed for hv_24x7 socket/chip level events.
"hv_24x7" chip level events needs specific chip-id to which the data is
requested. Function 'arch_get_runtimeparam' implemented in header.c
which extract number of sockets from sysfs file "sockets" under
"/sys/devices/hv_24x7/interface/".

With this patch basically we are trying to create as many metric events
as define by runtime_param.

For that one loop is added in function 'metricgroup__add_metric', which
create multiple events at run time depend on return value of
'arch_get_runtimeparam' and merge that event in 'group_list'.

To achieve that we are actually passing this parameter value as part of
`expr__find_other` function and changing "?" present in metric
expression with this value.

As in our JSON file, there gonna be single metric event, and out of
which we are creating multiple events.

To understand which data count belongs to which parameter value,
we also printing param value in generic_metric function.

For example,

command:# ./perf stat -M PowerBUS_Frequency -C 0 -I 1000
1.000101867 9,356,933 hv_24x7/pm_pb_cyc,chip=0/ # 2.3 GHz PowerBUS_Frequency_0
1.000101867 9,366,134 hv_24x7/pm_pb_cyc,chip=1/ # 2.3 GHz PowerBUS_Frequency_1
2.000314878 9,365,868 hv_24x7/pm_pb_cyc,chip=0/ # 2.3 GHz PowerBUS_Frequency_0
2.000314878 9,366,092 hv_24x7/pm_pb_cyc,chip=1/ # 2.3 GHz PowerBUS_Frequency_1

So, here _0 and _1 after PowerBUS_Frequency specify parameter value.

Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Anju T Sudhakar <anju@linux.vnet.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Joe Mario <jmario@redhat.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Cc: Mamatha Inamdar <mamatha4@linux.vnet.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linuxppc-dev@lists.ozlabs.org
Link: http://lore.kernel.org/lkml/20200401203340.31402-5-kjain@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# aecce63e 01-Apr-2020 Jiri Olsa <jolsa@kernel.org>

perf expr: Add expr_ prefix for parse_ctx and parse_id

Adding expr_ prefix for parse_ctx and parse_id, to straighten out the
expr* namespace.

There's no functional change.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Anju T Sudhakar <anju@linux.vnet.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Joe Mario <jmario@redhat.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Cc: Mamatha Inamdar <mamatha4@linux.vnet.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linuxppc-dev@lists.ozlabs.org
Link: http://lore.kernel.org/lkml/20200401203340.31402-2-kjain@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# 8358f698 31-Mar-2020 Jin Yao <yao.jin@linux.intel.com>

perf stat: Fix no metric header if --per-socket and --metric-only set

We received a report that was no metric header displayed if --per-socket
and --metric-only were both set.

It's hard for script to parse the perf-stat output. This patch fixes this
issue.

Before:

root@kbl-ppc:~# perf stat -a -M CPI --metric-only --per-socket
^C
Performance counter stats for 'system wide':

S0 8 2.6

2.215270071 seconds time elapsed

root@kbl-ppc:~# perf stat -a -M CPI --metric-only --per-socket -I1000
# time socket cpus
1.000411692 S0 8 2.2
2.001547952 S0 8 3.4
3.002446511 S0 8 3.4
4.003346157 S0 8 4.0
5.004245736 S0 8 0.3

After:

root@kbl-ppc:~# perf stat -a -M CPI --metric-only --per-socket
^C
Performance counter stats for 'system wide':

CPI
S0 8 2.1

1.813579830 seconds time elapsed

root@kbl-ppc:~# perf stat -a -M CPI --metric-only --per-socket -I1000
# time socket cpus CPI
1.000415122 S0 8 3.2
2.001630051 S0 8 2.9
3.002612278 S0 8 4.3
4.003523594 S0 8 3.0
5.004504256 S0 8 3.7

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20200331180226.25915-1-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# 0f9b1e12 28-Feb-2020 Jiri Olsa <jolsa@kernel.org>

perf expr: Straighten expr__parse()/expr__find_other() interface

Now that we have a flex parser we don't need to update the parsed string
pointer, so the interface can just be passed the pointer to the
expression instead of a pointer to pointer.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Link: http://lore.kernel.org/lkml/20200228093616.67125-5-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# 80cc7bb6 07-Feb-2020 Kim Phillips <kim.phillips@amd.com>

perf stat: Don't report a null stalled cycles per insn metric

For data collected on machines with front end stalled cycles supported,
such as found on modern AMD CPU families, commit 146540fb545b ("perf
stat: Always separate stalled cycles per insn") introduces a new line in
CSV output with a leading comma that upsets some automated scripts.
Scripts have to use "-e ex_ret_instr" to work around this issue, after
upgrading to a version of perf with that commit.

We could add "if (have_frontend_stalled && !config->csv_sep)" to the not
(total && avg) else clause, to emphasize that CSV users are usually
scripts, and are written to do only what is needed, i.e., they wouldn't
typically invoke "perf stat" without specifying an explicit event list.

But - let alone CSV output - why should users now tolerate a constant
0-reporting extra line in regular terminal output?:

BEFORE:

$ sudo perf stat --all-cpus -einstructions,cycles -- sleep 1

Performance counter stats for 'system wide':

181,110,981 instructions # 0.58 insn per cycle
# 0.00 stalled cycles per insn
309,876,469 cycles

1.002202582 seconds time elapsed

The user would not like to see the now permanent:

"0.00 stalled cycles per insn"

line fixture, as it gives no useful information.

So this patch removes the printing of the zeroed stalled cycles line
altogether, almost reverting the very original commit fb4605ba47e7
("perf stat: Check for frontend stalled for metrics"), which seems like
it was written to normalize --metric-only column output of common Intel
machines at the time: modern Intel machines have ceased to support the
genericised frontend stalled metrics AFAICT.

AFTER:

$ sudo perf stat --all-cpus -einstructions,cycles -- sleep 1

Performance counter stats for 'system wide':

244,071,432 instructions # 0.69 insn per cycle
355,353,490 cycles

1.001862516 seconds time elapsed

Output behaviour when stalled cycles is indeed measured is not affected
(BEFORE == AFTER):

$ sudo perf stat --all-cpus -einstructions,cycles,stalled-cycles-frontend -- sleep 1

Performance counter stats for 'system wide':

247,227,799 instructions # 0.63 insn per cycle
# 0.26 stalled cycles per insn
394,745,636 cycles
63,194,485 stalled-cycles-frontend # 16.01% frontend cycles idle

1.002079770 seconds time elapsed

Fixes: 146540fb545b ("perf stat: Always separate stalled cycles per insn")
Signed-off-by: Kim Phillips <kim.phillips@amd.com>
Acked-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Song Liu <songliubraving@fb.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20200207230613.26709-1-kim.phillips@amd.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# 6f6473c3 23-Sep-2019 Andi Kleen <ak@linux.intel.com>

perf stat: Fix free memory access / memory leaks in metrics

Make sure to not free the name passed in by the caller, but free all the
allocated ids when parsing expressions.

The loop at the end knows that the first entry shouldn't be freed, so
make sure the caller name is the first entry.

Fixes

% perf stat -M IpB,IpCall,IpTB,IPC,Retiring_SMT,Frontend_Bound_SMT,Kernel_Utilization,CPU_Utilization --metric-only -a -I 1000 sleep 2

valgrind:
1.009943231 ==21527== Invalid read of size 1
==21527== at 0x483CB74: strcmp (vg_replace_strmem.c:849)
==21527== by 0x582CF8: collect_all_aliases (stat-display.c:554)
==21527== by 0x582EB3: collect_data (stat-display.c:577)
==21527== by 0x583A32: print_counter_aggr (stat-display.c:806)
==21527== by 0x584FAD: perf_evlist__print_counters (stat-display.c:1200)
==21527== by 0x45133A: print_counters (builtin-stat.c:655)
==21527== by 0x450629: process_interval (builtin-stat.c:353)
==21527== by 0x450FBD: __run_perf_stat (builtin-stat.c:564)
==21527== by 0x451285: run_perf_stat (builtin-stat.c:636)
==21527== by 0x454619: cmd_stat (builtin-stat.c:1966)
==21527== by 0x4D557D: run_builtin (perf.c:310)
==21527== by 0x4D57EA: handle_internal_command (perf.c:362)
==21527== Address 0x12826cd0 is 0 bytes inside a block of size 25 free'd
==21527== at 0x4839A0C: free (vg_replace_malloc.c:540)
==21527== by 0x627041: __zfree (zalloc.c:13)
==21527== by 0x57F66A: generic_metric (stat-shadow.c:814)
==21527== by 0x580B21: perf_stat__print_shadow_stats (stat-shadow.c:1057)
==21527== by 0x58418E: print_metric_headers (stat-display.c:943)
==21527== by 0x5844BC: print_interval (stat-display.c:1004)
==21527== by 0x584DEB: perf_evlist__print_counters (stat-display.c:1172)
==21527== by 0x45133A: print_counters (builtin-stat.c:655)
==21527== by 0x450629: process_interval (builtin-stat.c:353)
==21527== by 0x450FBD: __run_perf_stat (builtin-stat.c:564)
==21527== by 0x451285: run_perf_stat (builtin-stat.c:636)
==21527== by 0x454619: cmd_stat (builtin-stat.c:1966)
==21527== Block was alloc'd at
==21527== at 0x483880B: malloc (vg_replace_malloc.c:309)
==21527== by 0x51677DE: strdup (in /usr/lib64/libc-2.29.so)
==21527== by 0x506457: parse_events_name (parse-events.c:1754)
==21527== by 0x5550BB: parse_events_parse (parse-events.y:214)
==21527== by 0x50694D: parse_events__scanner (parse-events.c:1887)
==21527== by 0x506AEF: parse_events (parse-events.c:1927)
==21527== by 0x521D8B: metricgroup__parse_groups (metricgroup.c:527)
==21527== by 0x45156F: parse_metric_groups (builtin-stat.c:721)
==21527== by 0x6228A9: get_value (parse-options.c:243)
==21527== by 0x62363F: parse_short_opt (parse-options.c:348)
==21527== by 0x62363F: parse_options_step (parse-options.c:536)
==21527== by 0x62363F: parse_options_subcommand (parse-options.c:651)
==21527== by 0x453C1D: cmd_stat (builtin-stat.c:1718)
==21527== by 0x4D557D: run_builtin (perf.c:310)

and also a leak report.

Committer testing:

Before:

# perf stat -M IpB,IpCall,IpTB,IPC,Retiring_SMT,Frontend_Bound_SMT,Kernel_Utilization,CPU_Utilization --metric-only -a -I 1000 sleep 2
# time CPU_Utilization
1.000470810 free(): double free detected in tcache 2
Aborted (core dumped)
#

After:

# perf stat -M IpB,IpCall,IpTB,IPC,Retiring_SMT,Frontend_Bound_SMT,Kernel_Utilization,CPU_Utilization --metric-only -a -I 1000 sleep 2
# time CPU_Utilization
1.000494752 0.1
2.001105112 0.1
#

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: http://lore.kernel.org/lkml/20190923233339.25326-3-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# f01642e4 27-Aug-2019 Jin Yao <yao.jin@linux.intel.com>

perf metricgroup: Support multiple events for metricgroup

Some uncore metrics don't work as expected. For example, on
cascadelakex:

root@lkp-csl-2sp2:~# perf stat -M UNC_M_PMM_BANDWIDTH.TOTAL -a -- sleep 1

Performance counter stats for 'system wide':

1841092 unc_m_pmm_rpq_inserts
3680816 unc_m_pmm_wpq_inserts

1.001775055 seconds time elapsed

root@lkp-csl-2sp2:~# perf stat -M UNC_M_PMM_READ_LATENCY -a -- sleep 1

Performance counter stats for 'system wide':

860649746 unc_m_pmm_rpq_occupancy.all
1840557 unc_m_pmm_rpq_inserts
12790627455 unc_m_clockticks

1.001773348 seconds time elapsed

No metrics 'UNC_M_PMM_BANDWIDTH.TOTAL' or 'UNC_M_PMM_READ_LATENCY' are
reported.

The issue is, the case of an alias expanding to mulitple events is not
supported, typically the uncore events. (see comments in
find_evsel_group()).

For UNC_M_PMM_BANDWIDTH.TOTAL in above example, the expanded event group
is '{unc_m_pmm_rpq_inserts,unc_m_pmm_wpq_inserts}:W', but the actual
events passed to find_evsel_group are:

unc_m_pmm_rpq_inserts
unc_m_pmm_rpq_inserts
unc_m_pmm_rpq_inserts
unc_m_pmm_rpq_inserts
unc_m_pmm_rpq_inserts
unc_m_pmm_rpq_inserts
unc_m_pmm_wpq_inserts
unc_m_pmm_wpq_inserts
unc_m_pmm_wpq_inserts
unc_m_pmm_wpq_inserts
unc_m_pmm_wpq_inserts
unc_m_pmm_wpq_inserts

For this multiple events case, it's not supported well.

This patch introduces a new field 'metric_leader' in struct evsel. The
first event is considered as a metric leader. For the rest of same
events, they point to the first event via it's metric_leader field in
struct evsel.

This design is for adding the counting results of all same events to the
first event in group (the metric_leader).

With this patch,

root@lkp-csl-2sp2:~# perf stat -M UNC_M_PMM_BANDWIDTH.TOTAL -a -- sleep 1

Performance counter stats for 'system wide':

1842108 unc_m_pmm_rpq_inserts # 337.2 MB/sec UNC_M_PMM_BANDWIDTH.TOTAL
3682209 unc_m_pmm_wpq_inserts

1.001819706 seconds time elapsed

root@lkp-csl-2sp2:~# perf stat -M UNC_M_PMM_READ_LATENCY -a -- sleep 1

Performance counter stats for 'system wide':

861970685 unc_m_pmm_rpq_occupancy.all # 219.4 ns UNC_M_PMM_READ_LATENCY
1842772 unc_m_pmm_rpq_inserts
12790196356 unc_m_clockticks

1.001749103 seconds time elapsed

Now we can see the correct metrics 'UNC_M_PMM_BANDWIDTH.TOTAL' and
'UNC_M_PMM_READ_LATENCY'.

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20190828055932.8269-5-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# 287f2649 27-Aug-2019 Jin Yao <yao.jin@linux.intel.com>

perf metricgroup: Scale the metric result

Some metrics define the scale unit, such as

{
"BriefDescription": "Intel Optane DC persistent memory read latency (ns). Derived from unc_m_pmm_rpq_occupancy.all",
"Counter": "0,1,2,3",
"EventCode": "0xE0",
"EventName": "UNC_M_PMM_READ_LATENCY",
"MetricExpr": "UNC_M_PMM_RPQ_OCCUPANCY.ALL / UNC_M_PMM_RPQ_INSERTS / UNC_M_CLOCKTICKS",
"MetricName": "UNC_M_PMM_READ_LATENCY",
"PerPkg": "1",
"ScaleUnit": "6000000000ns",
"UMask": "0x1",
"Unit": "iMC"
},

For above example, the ratio should be,

ratio = (UNC_M_PMM_RPQ_OCCUPANCY.ALL / UNC_M_PMM_RPQ_INSERTS / UNC_M_CLOCKTICKS) * 6000000000

But in current code, the ratio is not scaled ( * 6000000000)

With this patch, the ratio is scaled and the unit (ns) is printed.

For example,
# 219.4 ns UNC_M_PMM_READ_LATENCY

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20190828055932.8269-4-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# 1fc632ce 21-Jul-2019 Jiri Olsa <jolsa@kernel.org>

libperf: Move perf_event_attr field from perf's evsel to libperf's perf_evsel

Move the perf_event_attr struct fron 'struct evsel' to 'struct perf_evsel'.

Committer notes:

Fixed up these:

tools/perf/arch/arm/util/auxtrace.c
tools/perf/arch/arm/util/cs-etm.c
tools/perf/arch/arm64/util/arm-spe.c
tools/perf/arch/s390/util/auxtrace.c
tools/perf/util/cs-etm.c

Also

cc1: warnings being treated as errors
tests/sample-parsing.c: In function 'do_test':
tests/sample-parsing.c:162: error: missing initializer
tests/sample-parsing.c:162: error: (near initialization for 'evsel.core.cpus')

struct evsel evsel = {
.needs_swap = false,
- .core.attr = {
- .sample_type = sample_type,
- .read_format = read_format,
+ .core = {
+ . attr = {
+ .sample_type = sample_type,
+ .read_format = read_format,
+ },

[perfbuilder@a70e4eeb5549 /]$ gcc --version |& head -1
gcc (GCC) 4.4.7

Also we don't need to include perf_event.h in
tools/perf/lib/include/perf/evsel.h, forward declaring 'struct
perf_event_attr' is enough. And this even fixes the build in some
systems where things are used somewhere down the include path from
perf_event.h without defining __always_inline.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20190721112506.12306-43-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# 63503dba 21-Jul-2019 Jiri Olsa <jolsa@kernel.org>

perf evlist: Rename struct perf_evlist to struct evlist

Rename struct perf_evlist to struct evlist, so we don't have a name
clash when we add struct perf_evlist in libperf.

Committer notes:

Added fixes to build on arm64, from Jiri and from me
(tools/perf/util/cs-etm.c)

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20190721112506.12306-6-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# 32dcd021 21-Jul-2019 Jiri Olsa <jolsa@kernel.org>

perf evsel: Rename struct perf_evsel to struct evsel

Rename struct perf_evsel to struct evsel, so we don't have a name clash
when we add struct perf_evsel in libperf.

Committer notes:

Added fixes for arm64, provided by Jiri.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20190721112506.12306-5-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# 146540fb 17-May-2019 Cong Wang <xiyou.wangcong@gmail.com>

perf stat: Always separate stalled cycles per insn

The "stalled cycles per insn" is appended to "instructions" when the CPU
has this hardware counter directly. We should always make it a separate
line, which also aligns to the output when we hit the "if (total &&
avg)" branch.

Before:

$ sudo perf stat --all-cpus --field-separator , --log-fd 1 -einstructions,cycles -- sleep 1
4565048704,,instructions,64114578096,100.00,1.34,insn per cycle,,
3396325133,,cycles,64146628546,100.00,,

After:

$ sudo ./tools/perf/perf stat --all-cpus --field-separator , --log-fd 1 -einstructions,cycles -- sleep 1
6721924,,instructions,24026790339,100.00,0.22,insn per cycle
,,,,,0.00,stalled cycles per insn
30939953,,cycles,24025512526,100.00,,

Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Link: http://lkml.kernel.org/r/20190517221039.8975-1-xiyou.wangcong@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# d8f9da24 03-Jul-2019 Arnaldo Carvalho de Melo <acme@redhat.com>

perf tools: Use zfree() where applicable

In places where the equivalent was already being done, i.e.:

free(a);
a = NULL;

And in placs where struct members are being freed so that if we have
some erroneous reference to its struct, then accesses to freed members
will result in segfaults, which we can detect faster than use after free
to areas that may still have something seemingly valid.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-jatyoofo5boc1bsvoig6bb6i@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# e3a94273 24-Jun-2019 Andi Kleen <ak@linux.intel.com>

perf stat: Fix metrics with --no-merge

Since Fixes: 8c5421c016a4 ("perf pmu: Display pmu name when printing
unmerged events in stat") using --no-merge adds the PMU name to the
evsel name.

This breaks the metric value lookup because the parser doesn't know
about this.

Remove the extra postfixes for the metric evaluation.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Agustin Vega-Frias <agustinv@codeaurora.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Fixes: 8c5421c016a4 ("perf pmu: Display pmu name when printing unmerged events in stat")
Link: http://lkml.kernel.org/r/20190624193711.35241-5-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# 145c407c 24-Jun-2019 Andi Kleen <ak@linux.intel.com>

perf stat: Make metric event lookup more robust

After setting up metric groups through the event parser, the metricgroup
code looks them up again in the event list.

Make sure we only look up events that haven't been used by some other
metric. The data structures currently cannot handle more than one metric
per event. This avoids problems with multiple events partially
overlapping.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Link: http://lkml.kernel.org/r/20190624193711.35241-2-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# db5742b6 04-Jun-2019 Kan Liang <kan.liang@linux.intel.com>

perf stat: Support per-die aggregation

It is useful to aggregate counts per die. E.g. Uncore becomes die-scope
on Xeon Cascade Lake-AP.

Introduce a new option "--per-die" to support per-die aggregation.

The global id for each core has been changed to socket + die id + core
id. The global id for each die is socket + die id.

Add die information for per-core aggregation. The output of per-core
aggregation will be changed from "S0-C0" to "S0-D0-C0". Any scripts
which rely on the output format of per-core aggregation probably be
broken.

For 'perf stat record/report', there is no die information when
processing the old perf.data. The per-die result will be the same as
per-socket.

Committer notes:

Renamed 'die' variable to 'die_id' to fix the build in some systems:

CC /tmp/build/perf/builtin-script.o
cc1: warnings being treated as errors
builtin-stat.c: In function 'perf_env__get_die':
builtin-stat.c:963: error: declaration of 'die' shadows a global declaration
util/util.h:19: error: shadowed declaration is here
mv: cannot stat `/tmp/build/perf/.builtin-stat.o.tmp': No such file or directory

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/n/tip-bsnhx7vgsuu6ei307mw60mbj@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# ca227029 06-Dec-2018 Davidlohr Bueso <dave@stgolabs.net>

perf util: Use cached rbtree for rblists

At the cost of an extra pointer, we can avoid the O(logN) cost of
finding the first element in the tree (smallest node), which is
something required for any of the strlist or intlist traversals
(XXX_for_each_entry()). There are a number of users in perf of these
(particularly strlists), including probes, and buildid.

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/20181206191819.30182-5-dave@stgolabs.net
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# 57ddf091 15-Nov-2018 Ravi Bangoria <ravi.bangoria@linux.ibm.com>

perf stat: Fix shadow stats for clock events

Commit 0aa802a79469 ("perf stat: Get rid of extra clock display
function") introduced scale and unit for clock events. Thus,
perf_stat__update_shadow_stats() now saves scaled values of clock events
in msecs, instead of original nsecs. But while calculating values of
shadow stats we still consider clock event values in nsecs. This results
in a wrong shadow stat values. Ex,

# ./perf stat -e task-clock,cycles ls
<SNIP>
2.60 msec task-clock:u # 0.877 CPUs utilized
2,430,564 cycles:u # 1215282.000 GHz

Fix this by saving original nsec values for clock events in
perf_stat__update_shadow_stats(). After patch:

# ./perf stat -e task-clock,cycles ls
<SNIP>
3.14 msec task-clock:u # 0.839 CPUs utilized
3,094,528 cycles:u # 0.985 GHz

Suggested-by: Jiri Olsa <jolsa@redhat.com>
Reported-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Thomas Richter <tmricht@linux.vnet.ibm.com>
Cc: yuzhoujian@didichuxing.com
Fixes: 0aa802a79469 ("perf stat: Get rid of extra clock display function")
Link: http://lkml.kernel.org/r/20181116042843.24067-1-ravi.bangoria@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# eb08d006 15-Nov-2018 Ravi Bangoria <ravi.bangoria@linux.ibm.com>

perf stat: Use perf_evsel__is_clocki() for clock events

We already have function to check if a given event is either
SW_CPU_CLOCK or SW_TASK_CLOCK. Utilize it.

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Anton Blanchard <anton@samba.org>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Thomas Richter <tmricht@linux.vnet.ibm.com>
Cc: yuzhoujian@didichuxing.com
Link: http://lkml.kernel.org/r/20181115095533.16930-1-ravi.bangoria@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# 6ca9a082 30-Aug-2018 Jiri Olsa <jolsa@kernel.org>

perf stat: Pass a 'struct perf_stat_config' argument to global print functions

Add 'struct perf_stat_config' argument to the global print functions, so
that these functions can be used out of the 'perf stat' command code.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180830063252.23729-20-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# 0aa802a7 20-Jul-2018 Jiri Olsa <jolsa@kernel.org>

perf stat: Get rid of extra clock display function

There's no reason to have separate function to display clock events.
It's only purpose was to convert the nanosecond value into microseconds.
We do that now in generic code, if the unit and scale values are
properly set, which this patch do for clock events.

The output differs in the unit field being displayed in its columns
rather than having it added as a suffix of the event name. Plus the
value is rounded into 2 decimal numbers as for any other event.

Before:

# perf stat -e cpu-clock,task-clock -C 0 sleep 3

Performance counter stats for 'CPU(s) 0':

3001.123137 cpu-clock (msec) # 1.000 CPUs utilized
3001.133250 task-clock (msec) # 1.000 CPUs utilized

3.001159813 seconds time elapsed

Now:

# perf stat -e cpu-clock,task-clock -C 0 sleep 3

Performance counter stats for 'CPU(s) 0':

3,001.05 msec cpu-clock # 1.000 CPUs utilized
3,001.05 msec task-clock # 1.000 CPUs utilized

3.001077794 seconds time elapsed

There's a small difference in csv output, as we now output the unit
field, which was empty before. It's in the proper spot, so there's no
compatibility issue.

Before:

# perf stat -e cpu-clock,task-clock -C 0 -x, sleep 3
3001.065177,,cpu-clock,3001064187,100.00,1.000,CPUs utilized
3001.077085,,task-clock,3001077085,100.00,1.000,CPUs utilized

# perf stat -e cpu-clock,task-clock -C 0 -x, sleep 3
3000.80,msec,cpu-clock,3000799026,100.00,1.000,CPUs utilized
3000.80,msec,task-clock,3000799550,100.00,1.000,CPUs utilized

Add perf_evsel__is_clock to replace nsec_counter.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180720110036.32251-2-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# 6a1e2c5c 05-Dec-2017 Jin Yao <yao.jin@linux.intel.com>

perf stat: Remove a set of shadow stats static variables

In previous patches, we have reconstructed the code and let it not
access the static variables directly.

This patch removes these static variables.

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1512482591-4646-7-git-send-email-yao.jin@linux.intel.com
[ Rename 'stat' variables to 'st' to build on centos:{5,6} and others where it shadows a global declaration ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# e0128b30 05-Dec-2017 Jin Yao <yao.jin@linux.intel.com>

perf stat: Print per-thread shadow stats

The function perf_stat__print_shadow_stats() is called to print the
shadow stats on a set of static variables.

But the static variables are the limitations to support
per-thread shadow stats.

This patch lets the perf_stat__print_shadow_stats() support
to print the shadow stats from a input parameter 'st'.

It will not directly get value from static variable. Instead,
it now uses runtime_stat_avg() and runtime_stat_n() to get and
compute the values.

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1512482591-4646-6-git-send-email-yao.jin@linux.intel.com
[ Rename 'stat' variables to 'st' to build on centos:{5,6} and others where it shadows a global declaration ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# 1fcd0394 05-Dec-2017 Jin Yao <yao.jin@linux.intel.com>

perf stat: Update per-thread shadow stats

The functions perf_stat__update_shadow_stats() is called to update the
shadow stats on a set of static variables.

But the static variables are the limitations to be extended to support
per-thread shadow stats.

This patch lets the perf_stat__update_shadow_stats() support to update
the shadow stats on a input parameter 'st' and uses
update_runtime_stat() to update the stats. It will not directly update
the static variables as before.

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1512482591-4646-5-git-send-email-yao.jin@linux.intel.com
[ Rename 'stat' variables to 'st' to build on centos:{5,6} and others where it shadows a global declaration ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# 8efb2df1 05-Dec-2017 Jin Yao <yao.jin@linux.intel.com>

perf stat: Create the runtime_stat init/exit function

It mainly initializes and releases the rblist which is defined in struct
runtime_stat.

For the original rblist 'runtime_saved_values', it's still kept there
for keeping the patch bisectable.

The rblist 'runtime_saved_values' will be removed in later patch at
switching time.

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1512482591-4646-4-git-send-email-yao.jin@linux.intel.com
[ Rename 'stat' variables to 'st' to build on centos:{5,6} and others where it shadows a global declaration ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# 49cd456a 05-Dec-2017 Jin Yao <yao.jin@linux.intel.com>

perf stat: Extend rbtree to support per-thread shadow stats

Previously the rbtree was used to link generic metrics.

This patches adds new ctx/type/stat into rbtree keys because we will use
this rbtree to maintain shadow metrics to replace original a couple of
static arrays for supporting per-thread shadow stats.

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1512482591-4646-3-git-send-email-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# e5fcc2ab 05-Dec-2017 Jin Yao <yao.jin@linux.intel.com>

perf stat: Define a structure for per-thread shadow stats

Perf has a set of static variables to record the runtime shadow metrics
stats.

While if we want to record the runtime shadow stats for per-thread, it
will be the limitation. This patch creates a structure and the next
patches will use this structure to update the runtime shadow stats for
per-thread.

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1512482591-4646-2-git-send-email-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# b984aff7 01-Dec-2017 Jin Yao <yao.jin@linux.intel.com>

perf stat: Add rbtree node_delete op

In current stat-shadow.c, the rbtree deleting is ignored.

The patch adds the implementation to node_delete method of rblist.

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1512125856-22056-5-git-send-email-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# b2441318 01-Nov-2017 Greg Kroah-Hartman <gregkh@linuxfoundation.org>

License cleanup: add SPDX GPL-2.0 license identifier to files with no license

Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.

By default all files without license information are under the default
license of the kernel, which is GPL version 2.

Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.

This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.

How this work was done:

Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,

Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.

The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.

The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.

Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if <5
lines).

All documentation files were explicitly excluded.

The following heuristics were used to determine which SPDX license
identifiers to apply.

- when both scanners couldn't find any license traces, file was
considered to have no license information in it, and the top level
COPYING file license applied.

For non */uapi/* files that summary was:

SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 11139

and resulted in the first patch in this series.

If that file was a */uapi/* path one, it was "GPL-2.0 WITH
Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was:

SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 WITH Linux-syscall-note 930

and resulted in the second patch in this series.

- if a file had some form of licensing information in it, and was one
of the */uapi/* ones, it was denoted with the Linux-syscall-note if
any GPL family license was found in the file or had no licensing in
it (per prior point). Results summary:

SPDX license identifier # files
---------------------------------------------------|------
GPL-2.0 WITH Linux-syscall-note 270
GPL-2.0+ WITH Linux-syscall-note 169
((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
LGPL-2.1+ WITH Linux-syscall-note 15
GPL-1.0+ WITH Linux-syscall-note 14
((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
LGPL-2.0+ WITH Linux-syscall-note 4
LGPL-2.1 WITH Linux-syscall-note 3
((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1

and that resulted in the third patch in this series.

- when the two scanners agreed on the detected license(s), that became
the concluded license(s).

- when there was disagreement between the two scanners (one detected a
license but the other didn't, or they both detected different
licenses) a manual inspection of the file occurred.

- In most cases a manual inspection of the information in the file
resulted in a clear resolution of the license that should apply (and
which scanner probably needed to revisit its heuristics).

- When it was not immediately clear, the license identifier was
confirmed with lawyers working with the Linux Foundation.

- If there was any question as to the appropriate license identifier,
the file was flagged for further research and to be revisited later
in time.

In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.

Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights. The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.

Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.

In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.

Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
- a full scancode scan run, collecting the matched texts, detected
license ids and scores
- reviewing anything where there was a license detected (about 500+
files) to ensure that the applied SPDX license was correct
- reviewing anything where there was no detection but the patch license
was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
SPDX license was correct

This produced a worksheet with 20 files needing minor correction. This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.

These .csv files were then reviewed by Greg. Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected. This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.) Finally Greg ran the script using the .csv files to
generate the patches.

Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# 54830dd0 23-Jan-2017 Jiri Olsa <jolsa@kernel.org>

perf stat: Move the shadow stats scale computation in perf_stat__update_shadow_stats

Move the shadow stats scale computation to the
perf_stat__update_shadow_stats() function, so it's centralized and we
don't forget to do it. It also saves few lines of code.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Changbin Du <changbin.du@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-htg7mmyxv6pcrf57qyo6msid@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# fd48aad9 31-Aug-2017 Andi Kleen <ak@linux.intel.com>

perf stat: Support duration_time for metrics

Some of the metrics formulas (like GFLOPs) need to know how long the
measurement period is. Support an internal event called duration_time,
which reports time in second. It maps to the dummy event, but is special
cased for statistics to report the walltime duration.

So far it is not printed, but only used internally for metrics.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/20170831194036.30146-10-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# 4e1a0963 31-Aug-2017 Andi Kleen <ak@linux.intel.com>

perf stat: Don't use ctx for saved values lookup

We don't need to use ctx to look up events for saved values. The
context is already part of the evsel pointer, which is the primary key.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/20170831194036.30146-9-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# b18f3e36 31-Aug-2017 Andi Kleen <ak@linux.intel.com>

perf stat: Support JSON metrics in perf stat

Add generic support for standalone metrics specified in JSON files to
perf stat. A metric is a formula that uses multiple events to compute a
higher level result (e.g. IPC).

Previously metrics were always tied to an event and automatically
enabled with that event. But now change it that we can have standalone
metrics. They are in the same JSON data structure as events, but don't
have an event name.

We also allow to organize the metrics in metric groups, which allows a
short cut to select several related metrics at once.

Add a new -M / --metrics option to perf stat that adds the metrics or
metric groups specified.

Add the core code to manage and parse the metric groups. They are
collected from the JSON data structures into a separate rblist. When
computing shadow values look for metrics in that list. Then they are
computed using the existing saved values infrastructure in stat-shadow.c

The actual JSON metrics are in a separate pull request.

% perf stat -M Summary --metric-only -a sleep 1

Performance counter stats for 'system wide':

Instructions CLKS CPU_Utilization GFLOPs SMT_2T_Utilization Kernel_Utilization
317614222.0 1392930775.0 0.0 0.0 0.2 0.1

1.001497549 seconds time elapsed

% perf stat -M GFLOPs flops

Performance counter stats for 'flops':

3,999,541,471 fp_comp_ops_exe.sse_scalar_single # 1.2 GFLOPs (66.65%)
14 fp_comp_ops_exe.sse_scalar_double (66.65%)
0 fp_comp_ops_exe.sse_packed_double (66.67%)
0 fp_comp_ops_exe.sse_packed_single (66.70%)
0 simd_fp_256.packed_double (66.70%)
0 simd_fp_256.packed_single (66.67%)
0 duration_time

3.238372845 seconds time elapsed

v2: Add missing header file
v3: Move find_map to pmu.c

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/20170831194036.30146-7-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# 4ed962eb 31-Aug-2017 Andi Kleen <ak@linux.intel.com>

perf stat: Print generic metric header even for failed expressions

Print the generic metric header even when the expression evaluation
failed. Otherwise an expression that fails on the first collections due
to division by zero may suddenly reappear later without an header.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/20170831194036.30146-5-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# bba49af8 31-Aug-2017 Andi Kleen <ak@linux.intel.com>

perf stat: Factor out generic metric printing

The 'perf stat' shadow metric printing already supports generic metrics.
Factor out the code doing that into a separate function that can be
re-used in a later patch.

No behavior changes.

v2: Fix indentation

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/20170831194036.30146-4-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# 5e97665f 24-Jul-2017 Andi Kleen <ak@linux.intel.com>

perf stat: Fix saved values rbtree lookup

The stat shadow saved values rbtree is indexed by a pointer. Fix the
comparison function:

- We cannot return a pointer delta as an int because that loses bits on
64bit.

- Doing pointer arithmetic on the struct pointer only works if the
objects are spaced by the multiple of the object size, which is not
guaranteed for individual malloc'ed object

Replace it with a proper comparison.

This fixes various problems with values not being found.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/20170724234015.5165-4-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# daefd0bc 26-May-2017 Kan Liang <Kan.liang@intel.com>

perf stat: Add support to measure SMI cost

Implementing a new --smi-cost mode in perf stat to measure SMI cost.

During the measurement, the /sys/device/cpu/freeze_on_smi will be set.

The measurement can be done with one counter (unhalted core cycles), and
two free running MSR counters (IA32_APERF and SMI_COUNT).

In practice, the percentages of SMI core cycles should be more useful
than absolute value. So the output will be the percentage of SMI core
cycles and SMI#. metric_only will be set by default.

SMI cycles% = (aperf - unhalted core cycles) / aperf

Here is an example output.

Performance counter stats for 'sudo echo ':

SMI cycles% SMI#
0.1% 1

0.010858678 seconds time elapsed

Users who wants to get the actual value can apply additional
--no-metric-only.

Signed-off-by: Kan Liang <Kan.liang@intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Robert Elliott <elliott@hpe.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1495825538-5230-3-git-send-email-kan.liang@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# 96284814 20-Mar-2017 Andi Kleen <ak@linux.intel.com>

perf pmu: Add support for MetricName JSON attribute

Add support for a new JSON event attribute to name MetricExpr for better
output in perf stat.

If the event has no MetricName it uses the normal event name instead to
describe the metric.

Before

% perf stat -a -I 1000 -e '{unc_p_clockticks,unc_p_freq_max_os_cycles}' --metric-only
time unc_p_freq_max_os_cycles
1.000149775 15.7
2.000344807 19.3
3.000502544 16.7
4.000640656 6.6
5.000779955 9.9

After

% perf stat -a -I 1000 -e '{unc_p_clockticks,unc_p_freq_max_os_cycles}' --metric-only
time freq_max_os_cycles %
1.000149775 15.7
2.000344807 19.3
3.000502544 16.7
4.000640656 6.6
5.000779955 9.9

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/20170320201711.14142-13-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# 37932c18 20-Mar-2017 Andi Kleen <ak@linux.intel.com>

perf stat: Output JSON MetricExpr metric

Add generic infrastructure to perf stat to output ratios for
"MetricExpr" entries in the event lists. Many events are more useful as
ratios than in raw form, typically some count in relation to total
ticks.

Transfer the MetricExpr information from the alias to the evsel.

We mark the events that need to be collected for MetricExpr, and also
link the events using them with a pointer. The code is careful to always
prefer the right event in the same group to minimize multiplexing
errors. At the moment only a single relation is supported.

Then add a rblist to the stat shadow code that remembers stats based on
the cpu and context.

Then finally update and retrieve and print these values similarly to the
existing hardcoded perf metrics. We use the simple expression parser
added earlier to evaluate the expression.

Normally we just output the result without further commentary, but for
--metric-only this would lead to empty columns. So for this case use the
original event as description.

There is no attempt to automatically add the MetricExpr event, if it is
missing, however we suggest it to the user, because the user tool
doesn't have enough information to reliably construct a group that is
guaranteed to schedule. So we leave that to the user.

% perf stat -a -I 1000 -e '{unc_p_clockticks,unc_p_freq_max_os_cycles}'
1.000147889 800,085,181 unc_p_clockticks
1.000147889 93,126,241 unc_p_freq_max_os_cycles # 11.6
2.000448381 800,218,217 unc_p_clockticks
2.000448381 142,516,095 unc_p_freq_max_os_cycles # 17.8
3.000639852 800,243,057 unc_p_clockticks
3.000639852 162,292,689 unc_p_freq_max_os_cycles # 20.3

% perf stat -a -I 1000 -e '{unc_p_clockticks,unc_p_freq_max_os_cycles}' --metric-only
# time freq_max_os_cycles %
1.000127077 0.9
2.000301436 0.7
3.000456379 0.0

v2: Change from DivideBy to MetricExpr
v3: Use expr__ prefix. Support more than one other event.
v4: Update description
v5: Only print warning message once for multiple PMUs.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/20170320201711.14142-11-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# 239bd47f 24-May-2016 Andi Kleen <ak@linux.intel.com>

perf stat: Add computation of TopDown formulas

Implement the TopDown formulas in 'perf stat'. The topdown basic metrics
reported by the kernel are collected, and the formulas are computed and
output as normal metrics.

See the kernel commit exporting the events for details on the used
metrics.

Committer note:

Output example:

# perf stat --topdown -a usleep 1

Performance counter stats for 'system wide':

retiring bad speculation frontend bound backend bound
S0-C0 2 23.8% 11.6% 28.3% 36.3%
S0-C1 2 16.2% 15.7% 36.5% 31.6%

0.000579956 seconds time elapsed
#

v2: Always print all metrics, only use thresholds for coloring.
v3: Mark retiring over threshold green, not red.
v4: Only print one decimal digit
Fix color printing of one metric
v5: Avoid printing -0.0
v6: Remove extra frontend event lookup

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/1464119559-17203-2-git-send-email-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# daf4f478 13-May-2016 Namhyung Kim <namhyung@kernel.org>

perf stat: Update runtime using cpu-clock event

Currently only the task-clock event updates the runtime_nsec so it
cannot show the metric when using cpu-clock events. However cpu clock
works basically same as task-clock, so no need to not update the runtime
IMHO.

Before:

# perf stat -a -e cpu-clock,context-switches,page-faults,cycles sleep 0.1

Performance counter stats for 'system wide':

1217.759506 cpu-clock (msec)
93 context-switches
61 page-faults
18,958,022 cycles

0.101393794 seconds time elapsed

After:

Performance counter stats for 'system wide':

1220.471884 cpu-clock (msec) # 12.013 CPUs utilized
118 context-switches # 0.097 K/sec
59 page-faults # 0.048 K/sec
17,941,247 cycles # 0.015 GHz

0.101594777 seconds time elapsed

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1463119263-5569-2-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# b0404be8 13-May-2016 Namhyung Kim <namhyung@kernel.org>

perf stat: Fix indentation of stalled backend cycle

The commit 140aeadc1fb5 ("perf stat: Abstract stat metrics printing")
changed how shadow metrics are printed, but it missed to update the
width of the stalled backend cycles event to 7.2% like others. This
resulted in misaligned output like below:

Performance counter stats for 'pwd':

0.638313 task-clock (msec) # 0.567 CPUs utilized
0 context-switches # 0.000 K/sec
0 cpu-migrations # 0.000 K/sec
54 page-faults # 0.085 M/sec
885,600 cycles # 1.387 GHz
558,438 stalled-cycles-frontend # 63.06% frontend cycles idle
431,355 stalled-cycles-backend # 48.71% backend cycles idle
674,956 instructions # 0.76 insn per cycle
# 0.83 stalled cycles per insn
130,380 branches # 204.257 M/sec
<not counted> branch-misses

0.001125426 seconds time elapsed

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Fixes: 140aeadc1fb5 ("perf stat: Abstract stat metrics printing")
Link: http://lkml.kernel.org/r/1463119263-5569-1-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# b8f8eb84 22-Mar-2016 Arnaldo Carvalho de Melo <acme@redhat.com>

perf tools: Remove misplaced __maybe_unused

All over the tree.

Cc: David Ahern <dsahern@gmail.com>
cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/n/tip-8nzhnokxyp8y4v7gf0j00oyb@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# fb4605ba 01-Mar-2016 Andi Kleen <ak@linux.intel.com>

perf stat: Check for frontend stalled for metrics

Add an extra check for frontend stalled in the metrics. This avoids an
extra column for the --metric-only case when the CPU does not support
frontend stalled.

v2: Add separate init function

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/1456858672-21594-8-git-send-email-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# 44d49a60 29-Feb-2016 Andi Kleen <ak@linux.intel.com>

perf stat: Support metrics in --per-core/socket mode

Enable metrics printing in --per-core / --per-socket mode. We need to
save the shadow metrics in a unique place. Always use the first CPU in
the aggregation. Then use the same CPU to retrieve the shadow value
later.

Example output:

% perf stat --per-core -a ./BC1s

Performance counter stats for 'system wide':

S0-C0 2 2966.020381 task-clock (msec) # 2.004 CPUs utilized (100.00%)
S0-C0 2 49 context-switches # 0.017 K/sec (100.00%)
S0-C0 2 4 cpu-migrations # 0.001 K/sec (100.00%)
S0-C0 2 467 page-faults # 0.157 K/sec
S0-C0 2 4,599,061,773 cycles # 1.551 GHz (100.00%)
S0-C0 2 9,755,886,883 instructions # 2.12 insn per cycle (100.00%)
S0-C0 2 1,906,272,125 branches # 642.704 M/sec (100.00%)
S0-C0 2 81,180,867 branch-misses # 4.26% of all branches
S0-C1 2 2965.995373 task-clock (msec) # 2.003 CPUs utilized (100.00%)
S0-C1 2 62 context-switches # 0.021 K/sec (100.00%)
S0-C1 2 8 cpu-migrations # 0.003 K/sec (100.00%)
S0-C1 2 281 page-faults # 0.095 K/sec
S0-C1 2 6,347,290 cycles # 0.002 GHz (100.00%)
S0-C1 2 4,654,156 instructions # 0.73 insn per cycle (100.00%)
S0-C1 2 947,121 branches # 0.319 M/sec (100.00%)
S0-C1 2 37,322 branch-misses # 3.94% of all branches

1.480409747 seconds time elapsed

v2: Rebase to older patches
v3: Document shadow cpus. Fix aggr_get_id argument. Fix -A shadows (Jiri)

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/1456785386-19481-4-git-send-email-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# 92a61f64 29-Feb-2016 Andi Kleen <ak@linux.intel.com>

perf stat: Implement CSV metrics output

Now support CSV output for metrics. With the new output callbacks this
is relatively straight forward by creating new callbacks.

This allows to easily plot metrics from CSV files.

The new line callback needs to know the number of fields to skip them
correctly

Example output before:

% perf stat -x, true
0.200687,,task-clock,200687,100.00
0,,context-switches,200687,100.00
0,,cpu-migrations,200687,100.00
40,,page-faults,200687,100.00
730871,,cycles,203601,100.00
551056,,stalled-cycles-frontend,203601,100.00
<not supported>,,stalled-cycles-backend,0,100.00
385523,,instructions,203601,100.00
78028,,branches,203601,100.00
3946,,branch-misses,203601,100.00

After:

% perf stat -x, true
.502457,,task-clock,502457,100.00,0.485,CPUs utilized
0,,context-switches,502457,100.00,0.000,K/sec
0,,cpu-migrations,502457,100.00,0.000,K/sec
45,,page-faults,502457,100.00,0.090,M/sec
644692,,cycles,509102,100.00,1.283,GHz
423470,,stalled-cycles-frontend,509102,100.00,65.69,frontend cycles idle
<not supported>,,stalled-cycles-backend,0,100.00,,,,
492701,,instructions,509102,100.00,0.76,insn per cycle
,,,,,0.86,stalled cycles per insn
97767,,branches,509102,100.00,194.578,M/sec
4788,,branch-misses,509102,100.00,4.90,of all branches

or easier readable

$ perf stat -x, -o x.csv true
$ column -s, -t x.csv
0.490635 task-clock 490635 100.00 0.489 CPUs utilized
0 context-switches 490635 100.00 0.000 K/sec
0 cpu-migrations 490635 100.00 0.000 K/sec
45 page-faults 490635 100.00 0.092 M/sec
629080 cycles 497698 100.00 1.282 GHz
409498 stalled-cycles-frontend 497698 100.00 65.09 frontend cycles idle
<not supported> stalled-cycles-backend 0 100.00
491424 instructions 497698 100.00 0.78 insn per cycle
0.83 stalled cycles per insn
97278 branches 497698 100.00 198.270 M/sec
4569 branch-misses 497698 100.00 4.70 of all branches

Two new fields are added: metric value and metric name.

v2: Split out function argument changes
v3: Reenable metrics for real.
v4: Fix wrong hunk from refactoring.
v5: Remove extra "noise" printing (Jiri), but add it to the not counted case.
Print empty metrics for not counted.
v6: Avoid outputting metric on empty format.
v7: Print metric at the end
v8: Remove extra run, ena fields
v9: Avoid extra new line for unsupported counters

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: http://lkml.kernel.org/r/1456785386-19481-3-git-send-email-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# 140aeadc 30-Jan-2016 Andi Kleen <ak@linux.intel.com>

perf stat: Abstract stat metrics printing

Abstract the printing of shadow metrics. Instead of every metric calling
fprintf directly and taking care of indentation, use two call backs: one
to print metrics and another to start a new line.

This will allow adding metrics to CSV mode and also using them for other
purposes.

The computation of padding is now done in the central callback, instead
of every metric doing it manually. This makes it easier to add new
metrics.

v2: Refactor functions, printout now does more. Move
shadow printing. Improve fallback callbacks. Don't
use void * callback data.
v3: Remove unnecessary hunk. Add typedef for new_line
v4: Remove unnecessary hunk. Don't print metrics for CSV/interval
mode yet. Move printout change to separate patch.
v5: Fix bisect bugs. Avoid bogus frontend cycles printing.
Fix indentation in different aggregation modes.
v6: Delay newline handling

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1454173616-17710-2-git-send-email-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# 4579ecc8 02-Nov-2015 Andi Kleen <ak@linux.intel.com>

perf stat: Move sw clock metrics printout to stat-shadow

The sw clock metrics printing was missed in the earlier move to
stat-shadow of all the other metric printouts. Move it too.

v2: Fix metrics printing in this version to make bisect safe.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/1446515428-7450-2-git-send-email-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# 54976285 27-Jul-2015 Andi Kleen <ak@linux.intel.com>

perf stat: Fix transaction lenght metrics

The transaction length metrics in perf stat -T broke recently.

It would not match the metric correctly and always print K/sec.

This was caused by a incorrect update of the cycles_in_tx statistics.

Update the correct variable.

Also the check for zero division was reversed, which resulted in K/sec
being printed for no transactions. Fix this also up.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/1438039491-22091-1-git-send-email-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# f87027b9 03-Jun-2015 Jiri Olsa <jolsa@kernel.org>

perf stat: Move shadow stat counters into separate object

Separating shadow counters code into separate object as a cleanup, but
mainly for upcomming changes, so could use it from script command
context.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1433341559-31848-10-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>