History log of /linux-master/tools/perf/bench/numa.c
Revision Date Author Comments
# 337fa2db 30-Mar-2023 Andreas Herrmann <aherrmann@suse.de>

perf bench numa: Fix type of loop iterator in do_work, it should be 'long'

'j' is of type int and start/end are of type 'long'. Thus 'j' might become
negative and cause segfault in access_data(). Fix it by using 'long' for
'j' as well.

Reviewed-by: James Clark <james.clark@arm.com>
Signed-off-by: Andreas Herrmann <aherrmann@suse.de>
Link: https://lore.kernel.org/r/20230330074202.14052-1-aherrmann@suse.de
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# a527c2c1 18-Oct-2022 James Clark <james.clark@arm.com>

perf tools: Make quiet mode consistent between tools

Use the global quiet variable everywhere so that all tools hide warnings
in quiet mode and update the documentation to reflect this.

'perf probe' claimed that errors are not printed in quiet mode but I
don't see this so remove it from the docs.

Signed-off-by: James Clark <james.clark@arm.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20221018094137.783081-3-james.clark@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# a64d3af5 26-Aug-2022 Ian Rogers <irogers@google.com>

perf bench: Update use of pthread mutex/cond

Switch to the use of mutex wrappers that provide better error checking.

Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexandre Truong <alexandre.truong@arm.com>
Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andres Freund <andres@anarazel.de>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: André Almeida <andrealmeid@igalia.com>
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Cc: Colin Ian King <colin.king@intel.com>
Cc: Dario Petrillo <dario.pk1@gmail.com>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Dave Marchevsky <davemarchevsky@fb.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Fangrui Song <maskray@google.com>
Cc: Hewenliang <hewenliang4@huawei.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jason Wang <wangborong@cdjrlc.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kim Phillips <kim.phillips@amd.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Martin Liška <mliska@suse.cz>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Pavithra Gurushankar <gpavithrasha@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Monnet <quentin@isovalent.com>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Remi Bernon <rbernon@codeweavers.com>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Tom Rix <trix@redhat.com>
Cc: Weiguo Li <liwg06@foxmail.com>
Cc: Wenyu Liu <liuwenyu7@huawei.com>
Cc: William Cohen <wcohen@redhat.com>
Cc: Zechuan Chen <chenzechuan1@huawei.com>
Cc: bpf@vger.kernel.org
Cc: llvm@lists.linux.dev
Cc: yaowenbin <yaowenbin1@huawei.com>
Link: https://lore.kernel.org/r/20220826164242.43412-3-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# f8ac1c47 20-May-2022 Thomas Richter <tmricht@linux.ibm.com>

perf bench numa: Address compiler error on s390

The compilation on s390 results in this error:

# make DEBUG=y bench/numa.o
...
bench/numa.c: In function ‘__bench_numa’:
bench/numa.c:1749:81: error: ‘%d’ directive output may be truncated
writing between 1 and 11 bytes into a region of size between
10 and 20 [-Werror=format-truncation=]
1749 | snprintf(tname, sizeof(tname), "process%d:thread%d", p, t);
^~
...
bench/numa.c:1749:64: note: directive argument in the range
[-2147483647, 2147483646]
...
#

The maximum length of the %d replacement is 11 characters because of the
negative sign. Therefore extend the array by two more characters.

Output after:

# make DEBUG=y bench/numa.o > /dev/null 2>&1; ll bench/numa.o
-rw-r--r-- 1 root root 418320 May 19 09:11 bench/numa.o
#

Fixes: 3aff8ba0a4c9c919 ("perf bench numa: Avoid possible truncation when using snprintf()")
Suggested-by: Namhyung Kim <namhyung@gmail.com>
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Link: https://lore.kernel.org/r/20220520081158.2990006-1-tmricht@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# 183d4f2d 28-Apr-2022 Ian Rogers <irogers@google.com>

perf bench: Fix two numa NDEBUG warnings

BUG_ON is a no-op if NDEBUG is defined, otherwise it is an assert.
Compiling with NDEBUG yields:

bench/numa.c: In function ‘bind_to_cpu’:
bench/numa.c:314:1: error: control reaches end of non-void function [-Werror=return-type]
314 | }
| ^
bench/numa.c: In function ‘bind_to_node’:
bench/numa.c:367:1: error: control reaches end of non-void function [-Werror=return-type]
367 | }
| ^

Add return statements to cover this case.

Reviewed-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20220428202912.1056444-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# f58faed7 12-Apr-2022 Athira Rajeev <atrajeev@linux.vnet.ibm.com>

perf bench: Fix numa bench to fix usage of affinity for machines with #CPUs > 1K

The 'perf bench numa' testcase fails on systems with more than 1K CPUs.

Testcase: perf bench numa mem -p 1 -t 3 -P 512 -s 100 -zZ0qcm --thp 1

Snippet of code:

<<>>
perf: bench/numa.c:302: bind_to_node: Assertion `!(ret)' failed.
Aborted (core dumped)
<<>>

bind_to_node() uses "sched_getaffinity" to save the original cpumask and
this call is returning EINVAL ((invalid argument).

This happens because the default mask size in glibc is 1024. To
overcome this 1024 CPUs mask size limitation of cpu_set_t, change the
mask size using the CPU_*_S macros ie, use CPU_ALLOC to allocate
cpumask, CPU_ALLOC_SIZE for size.

Apart from fixing this for "orig_mask", apply same logic to "mask" as
well which is used to setaffinity so that mask size is large enough to
represent number of possible CPU's in the system.

sched_getaffinity is used in one more place in perf numa bench. It is in
"bind_to_cpu" function. Apply the same logic there also. Though
currently no failure is reported from there, it is ideal to change
getaffinity to work with such system configurations having CPU's more
than default mask size supported by glibc.

Also fix "sched_setaffinity" to use mask size which is large enough to
represent number of possible CPU's in the system.

Fixed all places where "bind_cpumask" which is part of "struct
thread_data" is used such that bind_cpumask works in all configuration.

Reported-by: Disha Goel <disgoel@linux.vnet.ibm.com>
Signed-off-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nageswara R Sastry <rnsastry@linux.ibm.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: linuxppc-dev@lists.ozlabs.org
Link: https://lore.kernel.org/r/20220412164059.42654-3-atrajeev@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# 8cb7a188 12-Apr-2022 Athira Rajeev <atrajeev@linux.vnet.ibm.com>

perf bench: Fix numa testcase to check if CPU used to bind task is online

Perf numa bench test fails with error:

Testcase:

./perf bench numa mem -p 2 -t 1 -P 1024 -C 0,8 -M 1,0 -s 20 -zZq --thp 1 --no-data_rand_walk

Failure snippet:

<<>>
Running 'numa/mem' benchmark:

# Running main, "perf bench numa numa-mem -p 2 -t 1 -P 1024 -C 0,8 -M 1,0 -s 20 -zZq --thp 1 --no-data_rand_walk"

perf: bench/numa.c:333: bind_to_cpumask: Assertion `!(ret)' failed.
<<>>

The Testcases uses CPU's 0 and 8. In function "parse_setup_cpu_list",
There is check to see if cpu number is greater than max cpu's possible
in the system ie via "if (bind_cpu_0 >= g->p.nr_cpus || bind_cpu_1 >=
g->p.nr_cpus) {".

But it could happen that system has say 48 CPU's, but only number of
online CPU's is 0-7. Other CPU's are offlined. Since "g->p.nr_cpus" is
48, so function will go ahead and set bit for CPU 8 also in cpumask (
td->bind_cpumask).

bind_to_cpumask function is called to set affinity using
sched_setaffinity and the cpumask. Since the CPU8 is not present, set
affinity will fail here with EINVAL.

Fix this issue by adding a check to make sure that, CPU's provided in
the input argument values are online before proceeding further and skip
the test. For this, include new helper function "is_cpu_online" in
"tools/perf/util/header.c".

Since "BIT(x)" definition will get included from header.h, remove
that from bench/numa.c

Reported-by: Disha Goel <disgoel@linux.vnet.ibm.com>
Signed-off-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Tested-by: Disha Goel <disgoel@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nageswara R Sastry <rnsastry@linux.ibm.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: linuxppc-dev@lists.ozlabs.org
Link: https://lore.kernel.org/r/20220412164059.42654-2-atrajeev@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# 4d39c89f 23-Mar-2021 Ingo Molnar <mingo@kernel.org>

perf tools: Fix various typos in comments

Fix ~124 single-word typos and a few spelling errors in the perf tooling code,
accumulated over the years.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20210321113734.GA248990@gmail.com
Link: http://lore.kernel.org/lkml/20210323160915.GA61903@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# 394e4306 25-Feb-2021 Athira Rajeev <atrajeev@linux.vnet.ibm.com>

perf bench numa: Fix the condition checks for max number of NUMA nodes

In systems having higher node numbers available like node
255, perf numa bench will fail with SIGABORT.

<<>>
perf: bench/numa.c:1416: init: Assertion `!(g->p.nr_nodes > 64 || g->p.nr_nodes < 0)' failed.
Aborted (core dumped)
<<>>

Snippet from 'numactl -H' below on a powerpc system where the highest
node number available is 255:

available: 6 nodes (0,8,252-255)
node 0 cpus: <cpu-list>
node 0 size: 519587 MB
node 0 free: 516659 MB
node 8 cpus: <cpu-list>
node 8 size: 523607 MB
node 8 free: 486757 MB
node 252 cpus:
node 252 size: 0 MB
node 252 free: 0 MB
node 253 cpus:
node 253 size: 0 MB
node 253 free: 0 MB
node 254 cpus:
node 254 size: 0 MB
node 254 free: 0 MB
node 255 cpus:
node 255 size: 0 MB
node 255 free: 0 MB
node distances:
node 0 8 252 253 254 255

Note: <cpu-list> expands to actual cpu list in the original output.
These nodes 252-255 are to represent the memory on GPUs and are valid
nodes.

The perf numa bench init code has a condition check to see if the number
of NUMA nodes (nr_nodes) exceeds MAX_NR_NODES. The value of MAX_NR_NODES
defined in perf code is 64. And the 'nr_nodes' is the value from
numa_max_node() which represents the highest node number available in the
system. In some systems where we could have NUMA node 255, this condition
check fails and results in SIGABORT.

The numa benchmark uses static value of MAX_NR_NODES in the code to
represent size of two NUMA node arrays and node bitmask used for setting
memory policy. Patch adds a fix to dynamically allocate size for the
two arrays and bitmask value based on the node numbers available in the
system. With the fix, perf numa benchmark will work with node configuration
on any system and thus removes the static MAX_NR_NODES value.

Signed-off-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Cc: linuxppc-dev@lists.ozlabs.org
Link: http://lore.kernel.org/lkml/1614271802-1503-1-git-send-email-atrajeev@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# f9299385 12-Oct-2020 Ian Rogers <irogers@google.com>

perf bench: Use condition variables in numa.

The existing approach to synchronization between threads in the numa
benchmark is unbalanced mutexes.

This synchronization causes thread sanitizer to warn of locks being
taken twice on a thread without an unlock, as well as unlocks with no
corresponding locks.

This change replaces the synchronization with more regular condition
variables.

While this fixes one class of thread sanitizer warnings, there still
remain warnings of data races due to threads reading and writing shared
memory without any atomics.

Committer testing:

Basic run on a non-NUMA machine.

# perf bench numa

# List of available benchmarks for collection 'numa':

mem: Benchmark for NUMA workloads
all: Run all NUMA benchmarks

# perf bench numa all
# Running numa/mem benchmark...

# Running main, "perf bench numa numa-mem"
#
# Running test on: Linux five 5.8.12-200.fc32.x86_64 #1 SMP Mon Sep 28 12:17:31 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
#

# Running RAM-bw-local, "perf bench numa mem -p 1 -t 1 -P 1024 -C 0 -M 0 -s 20 -zZq --thp 1 --no-data_rand_walk"
20.076 secs slowest (max) thread-runtime
20.000 secs fastest (min) thread-runtime
20.073 secs average thread-runtime
0.190 % difference between max/avg runtime
241.828 GB data processed, per thread
241.828 GB data processed, total
0.083 nsecs/byte/thread runtime
12.045 GB/sec/thread speed
12.045 GB/sec total speed

# Running RAM-bw-local-NOTHP, "perf bench numa mem -p 1 -t 1 -P 1024 -C 0 -M 0 -s 20 -zZq --thp 1 --no-data_rand_walk --thp -1"
20.045 secs slowest (max) thread-runtime
20.000 secs fastest (min) thread-runtime
20.014 secs average thread-runtime
0.111 % difference between max/avg runtime
234.304 GB data processed, per thread
234.304 GB data processed, total
0.086 nsecs/byte/thread runtime
11.689 GB/sec/thread speed
11.689 GB/sec total speed

# Running RAM-bw-remote, "perf bench numa mem -p 1 -t 1 -P 1024 -C 0 -M 1 -s 20 -zZq --thp 1 --no-data_rand_walk"

Test not applicable, system has only 1 nodes.

# Running RAM-bw-local-2x, "perf bench numa mem -p 2 -t 1 -P 1024 -C 0,2 -M 0x2 -s 20 -zZq --thp 1 --no-data_rand_walk"
20.138 secs slowest (max) thread-runtime
20.000 secs fastest (min) thread-runtime
20.121 secs average thread-runtime
0.342 % difference between max/avg runtime
135.961 GB data processed, per thread
271.922 GB data processed, total
0.148 nsecs/byte/thread runtime
6.752 GB/sec/thread speed
13.503 GB/sec total speed

# Running RAM-bw-remote-2x, "perf bench numa mem -p 2 -t 1 -P 1024 -C 0,2 -M 1x2 -s 20 -zZq --thp 1 --no-data_rand_walk"

Test not applicable, system has only 1 nodes.

# Running RAM-bw-cross, "perf bench numa mem -p 2 -t 1 -P 1024 -C 0,8 -M 1,0 -s 20 -zZq --thp 1 --no-data_rand_walk"

Test not applicable, system has only 1 nodes.

# Running 1x3-convergence, "perf bench numa mem -p 1 -t 3 -P 512 -s 100 -zZ0qcm --thp 1"
0.747 secs latency to NUMA-converge
0.747 secs slowest (max) thread-runtime
0.000 secs fastest (min) thread-runtime
0.714 secs average thread-runtime
50.000 % difference between max/avg runtime
3.228 GB data processed, per thread
9.683 GB data processed, total
0.231 nsecs/byte/thread runtime
4.321 GB/sec/thread speed
12.964 GB/sec total speed

# Running 1x4-convergence, "perf bench numa mem -p 1 -t 4 -P 512 -s 100 -zZ0qcm --thp 1"
1.127 secs latency to NUMA-converge
1.127 secs slowest (max) thread-runtime
1.000 secs fastest (min) thread-runtime
1.089 secs average thread-runtime
5.624 % difference between max/avg runtime
3.765 GB data processed, per thread
15.062 GB data processed, total
0.299 nsecs/byte/thread runtime
3.342 GB/sec/thread speed
13.368 GB/sec total speed

# Running 1x6-convergence, "perf bench numa mem -p 1 -t 6 -P 1020 -s 100 -zZ0qcm --thp 1"
1.003 secs latency to NUMA-converge
1.003 secs slowest (max) thread-runtime
0.000 secs fastest (min) thread-runtime
0.889 secs average thread-runtime
50.000 % difference between max/avg runtime
2.141 GB data processed, per thread
12.847 GB data processed, total
0.469 nsecs/byte/thread runtime
2.134 GB/sec/thread speed
12.805 GB/sec total speed

# Running 2x3-convergence, "perf bench numa mem -p 2 -t 3 -P 1020 -s 100 -zZ0qcm --thp 1"
1.814 secs latency to NUMA-converge
1.814 secs slowest (max) thread-runtime
1.000 secs fastest (min) thread-runtime
1.716 secs average thread-runtime
22.440 % difference between max/avg runtime
3.747 GB data processed, per thread
22.483 GB data processed, total
0.484 nsecs/byte/thread runtime
2.065 GB/sec/thread speed
12.393 GB/sec total speed

# Running 3x3-convergence, "perf bench numa mem -p 3 -t 3 -P 1020 -s 100 -zZ0qcm --thp 1"
2.065 secs latency to NUMA-converge
2.065 secs slowest (max) thread-runtime
1.000 secs fastest (min) thread-runtime
1.947 secs average thread-runtime
25.788 % difference between max/avg runtime
2.855 GB data processed, per thread
25.694 GB data processed, total
0.723 nsecs/byte/thread runtime
1.382 GB/sec/thread speed
12.442 GB/sec total speed

# Running 4x4-convergence, "perf bench numa mem -p 4 -t 4 -P 512 -s 100 -zZ0qcm --thp 1"
1.912 secs latency to NUMA-converge
1.912 secs slowest (max) thread-runtime
1.000 secs fastest (min) thread-runtime
1.775 secs average thread-runtime
23.852 % difference between max/avg runtime
1.479 GB data processed, per thread
23.668 GB data processed, total
1.293 nsecs/byte/thread runtime
0.774 GB/sec/thread speed
12.378 GB/sec total speed

# Running 4x4-convergence-NOTHP, "perf bench numa mem -p 4 -t 4 -P 512 -s 100 -zZ0qcm --thp 1 --thp -1"
1.783 secs latency to NUMA-converge
1.783 secs slowest (max) thread-runtime
1.000 secs fastest (min) thread-runtime
1.633 secs average thread-runtime
21.960 % difference between max/avg runtime
1.345 GB data processed, per thread
21.517 GB data processed, total
1.326 nsecs/byte/thread runtime
0.754 GB/sec/thread speed
12.067 GB/sec total speed

# Running 4x6-convergence, "perf bench numa mem -p 4 -t 6 -P 1020 -s 100 -zZ0qcm --thp 1"
5.396 secs latency to NUMA-converge
5.396 secs slowest (max) thread-runtime
4.000 secs fastest (min) thread-runtime
4.928 secs average thread-runtime
12.937 % difference between max/avg runtime
2.721 GB data processed, per thread
65.306 GB data processed, total
1.983 nsecs/byte/thread runtime
0.504 GB/sec/thread speed
12.102 GB/sec total speed

# Running 4x8-convergence, "perf bench numa mem -p 4 -t 8 -P 512 -s 100 -zZ0qcm --thp 1"
3.121 secs latency to NUMA-converge
3.121 secs slowest (max) thread-runtime
2.000 secs fastest (min) thread-runtime
2.836 secs average thread-runtime
17.962 % difference between max/avg runtime
1.194 GB data processed, per thread
38.192 GB data processed, total
2.615 nsecs/byte/thread runtime
0.382 GB/sec/thread speed
12.236 GB/sec total speed

# Running 8x4-convergence, "perf bench numa mem -p 8 -t 4 -P 512 -s 100 -zZ0qcm --thp 1"
4.302 secs latency to NUMA-converge
4.302 secs slowest (max) thread-runtime
3.000 secs fastest (min) thread-runtime
4.045 secs average thread-runtime
15.133 % difference between max/avg runtime
1.631 GB data processed, per thread
52.178 GB data processed, total
2.638 nsecs/byte/thread runtime
0.379 GB/sec/thread speed
12.128 GB/sec total speed

# Running 8x4-convergence-NOTHP, "perf bench numa mem -p 8 -t 4 -P 512 -s 100 -zZ0qcm --thp 1 --thp -1"
4.418 secs latency to NUMA-converge
4.418 secs slowest (max) thread-runtime
3.000 secs fastest (min) thread-runtime
4.104 secs average thread-runtime
16.045 % difference between max/avg runtime
1.664 GB data processed, per thread
53.254 GB data processed, total
2.655 nsecs/byte/thread runtime
0.377 GB/sec/thread speed
12.055 GB/sec total speed

# Running 3x1-convergence, "perf bench numa mem -p 3 -t 1 -P 512 -s 100 -zZ0qcm --thp 1"
0.973 secs latency to NUMA-converge
0.973 secs slowest (max) thread-runtime
0.000 secs fastest (min) thread-runtime
0.955 secs average thread-runtime
50.000 % difference between max/avg runtime
4.124 GB data processed, per thread
12.372 GB data processed, total
0.236 nsecs/byte/thread runtime
4.238 GB/sec/thread speed
12.715 GB/sec total speed

# Running 4x1-convergence, "perf bench numa mem -p 4 -t 1 -P 512 -s 100 -zZ0qcm --thp 1"
0.820 secs latency to NUMA-converge
0.820 secs slowest (max) thread-runtime
0.000 secs fastest (min) thread-runtime
0.808 secs average thread-runtime
50.000 % difference between max/avg runtime
2.555 GB data processed, per thread
10.220 GB data processed, total
0.321 nsecs/byte/thread runtime
3.117 GB/sec/thread speed
12.468 GB/sec total speed

# Running 8x1-convergence, "perf bench numa mem -p 8 -t 1 -P 512 -s 100 -zZ0qcm --thp 1"
0.667 secs latency to NUMA-converge
0.667 secs slowest (max) thread-runtime
0.000 secs fastest (min) thread-runtime
0.607 secs average thread-runtime
50.000 % difference between max/avg runtime
1.009 GB data processed, per thread
8.069 GB data processed, total
0.661 nsecs/byte/thread runtime
1.512 GB/sec/thread speed
12.095 GB/sec total speed

# Running 16x1-convergence, "perf bench numa mem -p 16 -t 1 -P 256 -s 100 -zZ0qcm --thp 1"
1.546 secs latency to NUMA-converge
1.546 secs slowest (max) thread-runtime
1.000 secs fastest (min) thread-runtime
1.485 secs average thread-runtime
17.664 % difference between max/avg runtime
1.162 GB data processed, per thread
18.594 GB data processed, total
1.331 nsecs/byte/thread runtime
0.752 GB/sec/thread speed
12.025 GB/sec total speed

# Running 32x1-convergence, "perf bench numa mem -p 32 -t 1 -P 128 -s 100 -zZ0qcm --thp 1"
0.812 secs latency to NUMA-converge
0.812 secs slowest (max) thread-runtime
0.000 secs fastest (min) thread-runtime
0.739 secs average thread-runtime
50.000 % difference between max/avg runtime
0.309 GB data processed, per thread
9.874 GB data processed, total
2.630 nsecs/byte/thread runtime
0.380 GB/sec/thread speed
12.166 GB/sec total speed

# Running 2x1-bw-process, "perf bench numa mem -p 2 -t 1 -P 1024 -s 20 -zZ0q --thp 1"
20.044 secs slowest (max) thread-runtime
20.000 secs fastest (min) thread-runtime
20.020 secs average thread-runtime
0.109 % difference between max/avg runtime
125.750 GB data processed, per thread
251.501 GB data processed, total
0.159 nsecs/byte/thread runtime
6.274 GB/sec/thread speed
12.548 GB/sec total speed

# Running 3x1-bw-process, "perf bench numa mem -p 3 -t 1 -P 1024 -s 20 -zZ0q --thp 1"
20.148 secs slowest (max) thread-runtime
20.000 secs fastest (min) thread-runtime
20.090 secs average thread-runtime
0.367 % difference between max/avg runtime
85.267 GB data processed, per thread
255.800 GB data processed, total
0.236 nsecs/byte/thread runtime
4.232 GB/sec/thread speed
12.696 GB/sec total speed

# Running 4x1-bw-process, "perf bench numa mem -p 4 -t 1 -P 1024 -s 20 -zZ0q --thp 1"
20.169 secs slowest (max) thread-runtime
20.000 secs fastest (min) thread-runtime
20.100 secs average thread-runtime
0.419 % difference between max/avg runtime
63.144 GB data processed, per thread
252.576 GB data processed, total
0.319 nsecs/byte/thread runtime
3.131 GB/sec/thread speed
12.523 GB/sec total speed

# Running 8x1-bw-process, "perf bench numa mem -p 8 -t 1 -P 512 -s 20 -zZ0q --thp 1"
20.175 secs slowest (max) thread-runtime
20.000 secs fastest (min) thread-runtime
20.107 secs average thread-runtime
0.433 % difference between max/avg runtime
31.267 GB data processed, per thread
250.133 GB data processed, total
0.645 nsecs/byte/thread runtime
1.550 GB/sec/thread speed
12.398 GB/sec total speed

# Running 8x1-bw-process-NOTHP, "perf bench numa mem -p 8 -t 1 -P 512 -s 20 -zZ0q --thp 1 --thp -1"
20.216 secs slowest (max) thread-runtime
20.000 secs fastest (min) thread-runtime
20.113 secs average thread-runtime
0.535 % difference between max/avg runtime
30.998 GB data processed, per thread
247.981 GB data processed, total
0.652 nsecs/byte/thread runtime
1.533 GB/sec/thread speed
12.266 GB/sec total speed

# Running 16x1-bw-process, "perf bench numa mem -p 16 -t 1 -P 256 -s 20 -zZ0q --thp 1"
20.234 secs slowest (max) thread-runtime
20.000 secs fastest (min) thread-runtime
20.174 secs average thread-runtime
0.577 % difference between max/avg runtime
15.377 GB data processed, per thread
246.039 GB data processed, total
1.316 nsecs/byte/thread runtime
0.760 GB/sec/thread speed
12.160 GB/sec total speed

# Running 1x4-bw-thread, "perf bench numa mem -p 1 -t 4 -T 256 -s 20 -zZ0q --thp 1"
20.040 secs slowest (max) thread-runtime
20.000 secs fastest (min) thread-runtime
20.028 secs average thread-runtime
0.099 % difference between max/avg runtime
66.832 GB data processed, per thread
267.328 GB data processed, total
0.300 nsecs/byte/thread runtime
3.335 GB/sec/thread speed
13.340 GB/sec total speed

# Running 1x8-bw-thread, "perf bench numa mem -p 1 -t 8 -T 256 -s 20 -zZ0q --thp 1"
20.064 secs slowest (max) thread-runtime
20.000 secs fastest (min) thread-runtime
20.034 secs average thread-runtime
0.160 % difference between max/avg runtime
32.911 GB data processed, per thread
263.286 GB data processed, total
0.610 nsecs/byte/thread runtime
1.640 GB/sec/thread speed
13.122 GB/sec total speed

# Running 1x16-bw-thread, "perf bench numa mem -p 1 -t 16 -T 128 -s 20 -zZ0q --thp 1"
20.092 secs slowest (max) thread-runtime
20.000 secs fastest (min) thread-runtime
20.052 secs average thread-runtime
0.230 % difference between max/avg runtime
16.131 GB data processed, per thread
258.088 GB data processed, total
1.246 nsecs/byte/thread runtime
0.803 GB/sec/thread speed
12.845 GB/sec total speed

# Running 1x32-bw-thread, "perf bench numa mem -p 1 -t 32 -T 64 -s 20 -zZ0q --thp 1"
20.099 secs slowest (max) thread-runtime
20.000 secs fastest (min) thread-runtime
20.063 secs average thread-runtime
0.247 % difference between max/avg runtime
7.962 GB data processed, per thread
254.773 GB data processed, total
2.525 nsecs/byte/thread runtime
0.396 GB/sec/thread speed
12.676 GB/sec total speed

# Running 2x3-bw-process, "perf bench numa mem -p 2 -t 3 -P 512 -s 20 -zZ0q --thp 1"
20.150 secs slowest (max) thread-runtime
20.000 secs fastest (min) thread-runtime
20.120 secs average thread-runtime
0.372 % difference between max/avg runtime
44.827 GB data processed, per thread
268.960 GB data processed, total
0.450 nsecs/byte/thread runtime
2.225 GB/sec/thread speed
13.348 GB/sec total speed

# Running 4x4-bw-process, "perf bench numa mem -p 4 -t 4 -P 512 -s 20 -zZ0q --thp 1"
20.258 secs slowest (max) thread-runtime
20.000 secs fastest (min) thread-runtime
20.168 secs average thread-runtime
0.636 % difference between max/avg runtime
17.079 GB data processed, per thread
273.263 GB data processed, total
1.186 nsecs/byte/thread runtime
0.843 GB/sec/thread speed
13.489 GB/sec total speed

# Running 4x6-bw-process, "perf bench numa mem -p 4 -t 6 -P 512 -s 20 -zZ0q --thp 1"
20.559 secs slowest (max) thread-runtime
20.000 secs fastest (min) thread-runtime
20.382 secs average thread-runtime
1.359 % difference between max/avg runtime
10.758 GB data processed, per thread
258.201 GB data processed, total
1.911 nsecs/byte/thread runtime
0.523 GB/sec/thread speed
12.559 GB/sec total speed

# Running 4x8-bw-process, "perf bench numa mem -p 4 -t 8 -P 512 -s 20 -zZ0q --thp 1"
20.744 secs slowest (max) thread-runtime
20.000 secs fastest (min) thread-runtime
20.516 secs average thread-runtime
1.792 % difference between max/avg runtime
8.069 GB data processed, per thread
258.201 GB data processed, total
2.571 nsecs/byte/thread runtime
0.389 GB/sec/thread speed
12.447 GB/sec total speed

# Running 4x8-bw-process-NOTHP, "perf bench numa mem -p 4 -t 8 -P 512 -s 20 -zZ0q --thp 1 --thp -1"
20.855 secs slowest (max) thread-runtime
20.000 secs fastest (min) thread-runtime
20.561 secs average thread-runtime
2.050 % difference between max/avg runtime
8.069 GB data processed, per thread
258.201 GB data processed, total
2.585 nsecs/byte/thread runtime
0.387 GB/sec/thread speed
12.381 GB/sec total speed

# Running 3x3-bw-process, "perf bench numa mem -p 3 -t 3 -P 512 -s 20 -zZ0q --thp 1"
20.134 secs slowest (max) thread-runtime
20.000 secs fastest (min) thread-runtime
20.077 secs average thread-runtime
0.333 % difference between max/avg runtime
28.091 GB data processed, per thread
252.822 GB data processed, total
0.717 nsecs/byte/thread runtime
1.395 GB/sec/thread speed
12.557 GB/sec total speed

# Running 5x5-bw-process, "perf bench numa mem -p 5 -t 5 -P 512 -s 20 -zZ0q --thp 1"
20.588 secs slowest (max) thread-runtime
20.000 secs fastest (min) thread-runtime
20.375 secs average thread-runtime
1.427 % difference between max/avg runtime
10.177 GB data processed, per thread
254.436 GB data processed, total
2.023 nsecs/byte/thread runtime
0.494 GB/sec/thread speed
12.359 GB/sec total speed

# Running 2x16-bw-process, "perf bench numa mem -p 2 -t 16 -P 512 -s 20 -zZ0q --thp 1"
20.657 secs slowest (max) thread-runtime
20.000 secs fastest (min) thread-runtime
20.429 secs average thread-runtime
1.589 % difference between max/avg runtime
8.170 GB data processed, per thread
261.429 GB data processed, total
2.528 nsecs/byte/thread runtime
0.395 GB/sec/thread speed
12.656 GB/sec total speed

# Running 1x32-bw-process, "perf bench numa mem -p 1 -t 32 -P 2048 -s 20 -zZ0q --thp 1"
22.981 secs slowest (max) thread-runtime
20.000 secs fastest (min) thread-runtime
21.996 secs average thread-runtime
6.486 % difference between max/avg runtime
8.863 GB data processed, per thread
283.606 GB data processed, total
2.593 nsecs/byte/thread runtime
0.386 GB/sec/thread speed
12.341 GB/sec total speed

# Running numa02-bw, "perf bench numa mem -p 1 -t 32 -T 32 -s 20 -zZ0q --thp 1"
20.047 secs slowest (max) thread-runtime
19.000 secs fastest (min) thread-runtime
20.026 secs average thread-runtime
2.611 % difference between max/avg runtime
8.441 GB data processed, per thread
270.111 GB data processed, total
2.375 nsecs/byte/thread runtime
0.421 GB/sec/thread speed
13.474 GB/sec total speed

# Running numa02-bw-NOTHP, "perf bench numa mem -p 1 -t 32 -T 32 -s 20 -zZ0q --thp 1 --thp -1"
20.088 secs slowest (max) thread-runtime
19.000 secs fastest (min) thread-runtime
20.025 secs average thread-runtime
2.709 % difference between max/avg runtime
8.411 GB data processed, per thread
269.142 GB data processed, total
2.388 nsecs/byte/thread runtime
0.419 GB/sec/thread speed
13.398 GB/sec total speed

# Running numa01-bw-thread, "perf bench numa mem -p 2 -t 16 -T 192 -s 20 -zZ0q --thp 1"
20.293 secs slowest (max) thread-runtime
20.000 secs fastest (min) thread-runtime
20.175 secs average thread-runtime
0.721 % difference between max/avg runtime
7.918 GB data processed, per thread
253.374 GB data processed, total
2.563 nsecs/byte/thread runtime
0.390 GB/sec/thread speed
12.486 GB/sec total speed

# Running numa01-bw-thread-NOTHP, "perf bench numa mem -p 2 -t 16 -T 192 -s 20 -zZ0q --thp 1 --thp -1"
20.411 secs slowest (max) thread-runtime
20.000 secs fastest (min) thread-runtime
20.226 secs average thread-runtime
1.006 % difference between max/avg runtime
7.931 GB data processed, per thread
253.778 GB data processed, total
2.574 nsecs/byte/thread runtime
0.389 GB/sec/thread speed
12.434 GB/sec total speed

#

Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: https://lore.kernel.org/r/20201012161611.366482-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# a508d061 14-Aug-2020 Peng Fan <fanpeng@loongson.cn>

perf bench numa: Remove dead code in parse_nodes_opt()

In the function parse_nodes_opt(), the statement "return 0;" is dead
code, remove it.

Signed-off-by: Peng Fan <fanpeng@loongson.cn>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/1597401894-27549-1-git-send-email-fanpeng@loongson.cn
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# 2db13a9b 13-Aug-2020 Alexander Gordeev <agordeev@linux.ibm.com>

perf bench numa: Use numa_node_to_cpus() to bind tasks to nodes

It is currently assumed that each node contains at most nr_cpus/nr_nodes
CPUs and nodes' CPU ranges do not overlap.

That assumption is generally incorrect as there are archs where a CPU
number does not depend on to its node number.

This update removes the described assumption by simply calling
numa_node_to_cpus() interface and using the returned mask for binding
CPUs to nodes.

Also, variable types and names made consistent in functions using
cpumask.

Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Balamuruhan S <bala24@linux.vnet.ibm.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
Link: http://lore.kernel.org/lkml/20200813113247.GA2014@oc3871087118.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# 509f68e3 13-Aug-2020 Alexander Gordeev <agordeev@linux.ibm.com>

perf bench numa: Fix cpumask memory leak in node_has_cpus()

Couple numa_allocate_cpumask() and numa_free_cpumask() functions

Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Balamuruhan S <bala24@linux.vnet.ibm.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
Link: http://lore.kernel.org/lkml/20200813113041.GA1685@oc3871087118.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# 85372c69 10-Aug-2020 Alexander Gordeev <agordeev@linux.ibm.com>

perf bench numa: Fix benchmark names

Standard benchmark names let users know the tests specifics. For
example "2x1-bw-process" name tells that two processes one thread each
are run and the RAM bandwidth is measured.

Several benchmarks names do not correspond to their actual running
configuration. Fix that and also some whitespace and comment
inconsistencies.

Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/6b6f2084f132ee8e9203dc7c32f9deb209b87a68.1597004831.git.agordeev@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# 72d69c2a 10-Aug-2020 Alexander Gordeev <agordeev@linux.ibm.com>

perf bench numa: Fix number of processes in "2x3-convergence" test

Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/d949f5f48e17fc816f3beecf8479f1b2480345e4.1597004831.git.agordeev@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# 8fcbeae4 03-Sep-2019 Arnaldo Carvalho de Melo <acme@redhat.com>

perf tools: Remove needless builtin.h include directives

Now that builtin.h isn't included by any other header, we can check
where it is really needed, i.e. we can remove it and be sure that it
isn't being obtained indirectly.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-mn7jheex85iw9qo6tlv26hb2@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# 0ac25fd0 29-Aug-2019 Arnaldo Carvalho de Melo <acme@redhat.com>

perf tools: Remove perf.h from source files not needing it

With the movement of lots of stuff out of perf.h to other headers we
ended up not needing it in lots of places, remove it from those places.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-c718m0sxxwp73lp9d8vpihb4@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# 6bbfe4e6 01-Aug-2019 Jiri Olsa <jolsa@kernel.org>

perf bench numa: Fix cpu0 binding

Michael reported an issue with perf bench numa failing with binding to
cpu0 with '-0' option.

# perf bench numa mem -p 3 -t 1 -P 512 -s 100 -zZcm0 --thp 1 -M 1 -ddd
# Running 'numa/mem' benchmark:

# Running main, "perf bench numa numa-mem -p 3 -t 1 -P 512 -s 100 -zZcm0 --thp 1 -M 1 -ddd"
binding to node 0, mask: 0000000000000001 => -1
perf: bench/numa.c:356: bind_to_memnode: Assertion `!(ret)' failed.
Aborted (core dumped)

This happens when the cpu0 is not part of node0, which is the benchmark
assumption and we can see that's not the case for some powerpc servers.

Using correct node for cpu0 binding.

Reported-by: Michael Petlan <mpetlan@redhat.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/20190801142642.28004-1-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# 7f7c536f 04-Jul-2019 Arnaldo Carvalho de Melo <acme@redhat.com>

tools lib: Adopt zalloc()/zfree() from tools/perf

Eroding a bit more the tools/perf/util/util.h hodpodge header.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-natazosyn9rwjka25tvcnyi0@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# bf561d3c 25-Apr-2019 Arnaldo Carvalho de Melo <acme@redhat.com>

perf bench numa: Add define for RUSAGE_THREAD if not present

While cross building perf to the ARC architecture on a fedora 30 host,
we were failing with:

CC /tmp/build/perf/bench/numa.o
bench/numa.c: In function ‘worker_thread’:
bench/numa.c:1261:12: error: ‘RUSAGE_THREAD’ undeclared (first use in this function); did you mean ‘SIGEV_THREAD’?
getrusage(RUSAGE_THREAD, &rusage);
^~~~~~~~~~~~~
SIGEV_THREAD
bench/numa.c:1261:12: note: each undeclared identifier is reported only once for each function it appears in

[perfbuilder@60d5802468f6 perf]$ /arc_gnu_2019.03-rc1_prebuilt_uclibc_le_archs_linux_install/bin/arc-linux-gcc --version | head -1
arc-linux-gcc (ARCv2 ISA Linux uClibc toolchain 2019.03-rc1) 8.3.1 20190225
[perfbuilder@60d5802468f6 perf]$

Trying to reproduce a report by Vineet, I noticed that, with just
cross-built zlib and numactl libraries, I ended up with the above
failure.

So, since RUSAGE_THREAD is available as a define, check for that and
numactl libraries, I ended up with the above failure.

So, since RUSAGE_THREAD is available as a define in the system headers,
check if it is defined in the 'perf bench numa' sources and define it if
not.

Now it builds and I have to figure out if the problem reported by Vineet
only takes place if we have libelf or some other library available.

Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: linux-snps-arc@lists.infradead.org
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Vineet Gupta <Vineet.Gupta1@synopsys.com>
Link: https://lkml.kernel.org/n/tip-2wb4r1gir9xrevbpq7qp0amk@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# 7c9eefe8 05-Mar-2019 Stephen Rothwell <sfr@canb.auug.org.au>

tools/: replace open encodings for NUMA_NO_NODE

This replaces all open encodings in tools with NUMA_NO_NODE. Also
linux/numa.h is now needed for the perf build.

[sfr@canb.auug.org.au: fix for replace open encodings for NUMA_NO_NODE]
Link: http://lkml.kernel.org/r/20190108131141.730e9c4f@canb.auug.org.au
Link: http://lkml.kernel.org/r/1545127933-10711-3-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: David Hildenbrand <david@redhat.com>
Cc: Doug Ledford <dledford@redhat.com> [drivers/infiniband]
Cc: Hans Verkuil <hverkuil@xs4all.nl>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com> [ixgbe]
Cc: Jens Axboe <axboe@kernel.dk> [mtip32xx]
Cc: Joseph Qi <jiangqi903@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au> [powerpc]
Cc: Vinod Koul <vkoul@kernel.org> [dmaengine.c]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 98310707 20-Jun-2018 Jiri Olsa <jolsa@kernel.org>

perf bench: Fix numa report output code

Currently we can hit following assert when running numa bench:

$ perf bench numa mem -p 3 -t 1 -P 512 -s 100 -zZ0cm --thp 1
perf: bench/numa.c:1577: __bench_numa: Assertion `!(!(((wait_stat) & 0x7f) == 0))' failed.

The assertion is correct, because we hit the SIGFPE in following line:

Thread 2.2 "thread 0/0" received signal SIGFPE, Arithmetic exception.
[Switching to Thread 0x7fffd28c6700 (LWP 11750)]
0x000.. in worker_thread (__tdata=0x7.. ) at bench/numa.c:1257
1257 td->speed_gbs = bytes_done / (td->runtime_ns / NSEC_PER_SEC) / 1e9;

We don't check if the runtime is actually bigger than 1 second,
and thus this might end up with zero division within FPU.

Adding the check to prevent this.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180620094036.17278-1-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# 2abb80da 25-Apr-2018 Yisheng Xie <xieyisheng1@huawei.com>

perf bench numa: Fix typo in options

'R' means access the data via reads instead of writes, fix this typo.

Signed-off-by: Yisheng Xie <xieyisheng1@huawei.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1524644707-11030-1-git-send-email-xieyisheng1@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# 321a7c35 22-Nov-2017 Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>

perf bench numa: Fixup discontiguous/sparse numa nodes

Certain systems are designed to have sparse/discontiguous nodes. On
such systems, 'perf bench numa' hangs, shows wrong number of nodes and
shows values for non-existent nodes. Handle this by only taking nodes
that are exposed by kernel to userspace.

Signed-off-by: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/1edbcd353c009e109e93d78f2f46381930c340fe.1511368645.git.sathnaga@linux.vnet.ibm.com
Signed-off-by: Balamuruhan S <bala24@linux.vnet.ibm.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# b2441318 01-Nov-2017 Greg Kroah-Hartman <gregkh@linuxfoundation.org>

License cleanup: add SPDX GPL-2.0 license identifier to files with no license

Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.

By default all files without license information are under the default
license of the kernel, which is GPL version 2.

Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.

This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.

How this work was done:

Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,

Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.

The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.

The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.

Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if <5
lines).

All documentation files were explicitly excluded.

The following heuristics were used to determine which SPDX license
identifiers to apply.

- when both scanners couldn't find any license traces, file was
considered to have no license information in it, and the top level
COPYING file license applied.

For non */uapi/* files that summary was:

SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 11139

and resulted in the first patch in this series.

If that file was a */uapi/* path one, it was "GPL-2.0 WITH
Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was:

SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 WITH Linux-syscall-note 930

and resulted in the second patch in this series.

- if a file had some form of licensing information in it, and was one
of the */uapi/* ones, it was denoted with the Linux-syscall-note if
any GPL family license was found in the file or had no licensing in
it (per prior point). Results summary:

SPDX license identifier # files
---------------------------------------------------|------
GPL-2.0 WITH Linux-syscall-note 270
GPL-2.0+ WITH Linux-syscall-note 169
((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
LGPL-2.1+ WITH Linux-syscall-note 15
GPL-1.0+ WITH Linux-syscall-note 14
((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
LGPL-2.0+ WITH Linux-syscall-note 4
LGPL-2.1 WITH Linux-syscall-note 3
((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1

and that resulted in the third patch in this series.

- when the two scanners agreed on the detected license(s), that became
the concluded license(s).

- when there was disagreement between the two scanners (one detected a
license but the other didn't, or they both detected different
licenses) a manual inspection of the file occurred.

- In most cases a manual inspection of the information in the file
resulted in a clear resolution of the license that should apply (and
which scanner probably needed to revisit its heuristics).

- When it was not immediately clear, the license identifier was
confirmed with lawyers working with the Linux Foundation.

- If there was any question as to the appropriate license identifier,
the file was flagged for further research and to be revisited later
in time.

In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.

Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights. The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.

Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.

In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.

Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
- a full scancode scan run, collecting the matched texts, detected
license ids and scores
- reviewing anything where there was a license detected (about 500+
files) to ensure that the applied SPDX license was correct
- reviewing anything where there was no detection but the patch license
was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
SPDX license was correct

This produced a worksheet with 20 files needing minor correction. This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.

These .csv files were then reviewed by Greg. Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected. This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.) Finally Greg ran the script using the .csv files to
generate the patches.

Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# 0353631a 15-Jun-2017 Arnaldo Carvalho de Melo <acme@redhat.com>

perf tools: Use __maybe_unused consistently

Instead of defining __unused or redefining __maybe_unused.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-4eleto5pih31jw1q4dypm9pf@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# fd20e811 17-Apr-2017 Arnaldo Carvalho de Melo <acme@redhat.com>

perf tools: Including missing inttypes.h header

Needed to use the PRI[xu](32,64) formatting macros.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-wkbho8kaw24q67dd11q0j39f@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# 877a7a11 17-Apr-2017 Arnaldo Carvalho de Melo <acme@redhat.com>

perf tools: Add include <linux/kernel.h> where ARRAY_SIZE() is used

To pave the way for further cleanups where linux/kernel.h may stop being
included in some header.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-qqxan6tfsl6qx3l0v3nwgjvk@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# b0ad8ea6 27-Mar-2017 Arnaldo Carvalho de Melo <acme@redhat.com>

perf tools: Remove unused 'prefix' from builtin functions

We got it from the git sources but never used it for anything, with the
place where this would be somehow used remaining:

static int run_builtin(struct cmd_struct *p, int argc, const char **argv)
{
prefix = NULL;
if (p->option & RUN_SETUP)
prefix = NULL; /* setup_perf_directory(); */

Ditch it.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-uw5swz05vol0qpr32c5lpvus@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# 001916b9 05-Mar-2017 Jiri Olsa <jolsa@kernel.org>

perf bench numa: Add more comment for -c option

Adding more commentary for -c/--show_convergence option, to explain how
the convergence is defined.

Before:
-c, --show_convergence
show convergence details

Now:
-c, --show_convergence
convergence is reached when each process \
(all its threads) is running on a single NUMA node.

Suggested--by: Jiri Hladky <jhladky@redhat.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Hladky <jhladky@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1488732011-27384-1-git-send-email-jolsa@kernel.org
[ Rephrased a bit based on a IRC conversation with Jiri ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# 6aa4d826 14-Feb-2017 Arnaldo Carvalho de Melo <acme@redhat.com>

perf bench numa: Make sure dprintf() is not defined

When building with clang we get this error:

bench/numa.c:46:9: error: 'dprintf' macro redefined [-Werror,-Wmacro-redefined]
#define dprintf(x...) do { if (g && g->p.show_details >= 1) printf(x); } while (0)
^
/usr/include/bits/stdio2.h:145:12: note: previous definition is here
# define dprintf(fd, ...) \
^
CC /tmp/build/perf/tests/parse-no-sample-id-all.o
1 error generated.

So, make sure it is undefined before using that name.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp>
Cc: Jakub Jelen <jjelen@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-f654o2svtrutamvxt7igwz74@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# 3aff8ba0 09-Feb-2017 Arnaldo Carvalho de Melo <acme@redhat.com>

perf bench numa: Avoid possible truncation when using snprintf()

Addressing this warning from gcc 7:

CC /tmp/build/perf/bench/numa.o
bench/numa.c: In function '__bench_numa':
bench/numa.c:1582:42: error: '%d' directive output may be truncated writing between 1 and 10 bytes into a region of size between 8 and 17 [-Werror=format-truncation=]
snprintf(tname, 32, "process%d:thread%d", p, t);
^~
bench/numa.c:1582:25: note: directive argument in the range [0, 2147483647]
snprintf(tname, 32, "process%d:thread%d", p, t);
^~~~~~~~~~~~~~~~~~~~
In file included from /usr/include/stdio.h:939:0,
from bench/../util/util.h:47,
from bench/../builtin.h:4,
from bench/numa.c:11:
/usr/include/bits/stdio2.h:64:10: note: '__builtin___snprintf_chk' output between 17 and 35 bytes into a destination of size 32
return __builtin___snprintf_chk (__s, __n, __USE_FORTIFY_LEVEL - 1,
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
__bos (__s), __fmt, __va_arg_pack ());
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Petr Holasek <pholasek@redhat.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-twa37vsfqcie5gwpqwnjuuz9@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# a8ad8329 08-Aug-2016 Arnaldo Carvalho de Melo <acme@redhat.com>

perf bench numa: Use NSEC_PER_U?SEC

Following kernel practices, using linux/time64.h

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-7vnv15263y50qku76p4w5xk6@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# 8a158589 05-Jul-2016 Arnaldo Carvalho de Melo <acme@redhat.com>

perf bench: Add missing pthread.h include for CPU_*() macros

Cc: David Ahern <dsahern@gmail.com>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-48qbfv7tqs8n8ey74lbyfjtq@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# 3c52b658 18-Mar-2016 Jakub Jelen <jakuje@gmail.com>

perf bench numa: Fix assertion for nodes bitfield

Comparing bits and bytes in numa benchmark assertion

I hit the issue on two socket Power8 machine presenting its numa nodes
as 0,1,16,17 (according to numactl). Therefore I got error (and hang of
parent process):

perf: bench/numa.c:296: bind_to_memnode: Assertion `!(g->p.nr_nodes > (int)sizeof(nodemask))' failed.

This is obviously false positive. We can fit all the 18 nodes into
bitfield of 8 bytes (long on 64b architecture).

Signed-off-by: Jakub Jelen <jakuje@gmail.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jakub Jelen <jjelen@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: trivial@kernel.org
Link: http://lkml.kernel.org/r/1458388687-24421-1-git-send-email-jakuje@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# 4b6ab94e 15-Dec-2015 Josh Poimboeuf <jpoimboe@redhat.com>

perf subcmd: Create subcmd library

Move the subcommand-related files from perf to a new library named
libsubcmd.a.

Since we're moving files anyway, go ahead and rename 'exec_cmd.*' to
'exec-cmd.*' to be consistent with the naming of all the other files.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/c0a838d4c878ab17fee50998811612b2281355c1.1450193761.git.jpoimboe@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# b0d22e52 19-Oct-2015 Ingo Molnar <mingo@kernel.org>

perf bench: Harmonize all the -l/--nr_loops options

We have three benchmarking subsystems that specify some sort of 'number
of loops' parameter - but all of them do it inconsistently:

numa: -l/--nr_loops
sched messaging: -l/--loops
mem memset/memcpy: -i/--iterations

Harmonize them to -l/--nr_loops by picking the numa variant - which is
also the most likely one to have existing scripting which we don't want
to break.

Plus improve the parameter help texts to indicate the default value for
the nr_loops variable to keep users from guessing ...

Also propagate the naming to internal variables.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1445241870-24854-13-git-send-email-mingo@kernel.org
[ Let the harmonisation reach the perf-bench man page as well ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# 2d8e405a 17-May-2015 Arnaldo Carvalho de Melo <acme@redhat.com>

perf bench numa: Share sched_getcpu() __weak def with cloexec.c

We really should move the sched_getcpu() to some more suitable place,
but this one-liner fixes this build problem on ancient distros like
RHEL5.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: David Ahern <dsahern@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Vinson Lee <vlee@twitter.com>
Link: http://lkml.kernel.org/n/tip-5yqg4p11f9uii6yremz3r35v@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# b64aa553 16-Apr-2015 Petr Holasek <pholasek@redhat.com>

perf bench numa: Show more stats of particular threads in verbose mode

In verbose mode perf bench numa shows also GB/s speed, system and user cpu
time for each particular thread. Using of getrusage() can provide much more
per process or per thread stats in future.

Signed-off-by: Petr Holasek <pholasek@redhat.com>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1429198699-25039-3-git-send-email-pholasek@redhat.com
[ Rename 'usage' variable to not shadow util.h's usage() ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# 1d90a685 16-Apr-2015 Petr Holasek <pholasek@redhat.com>

perf bench numa: Fix immediate meeting of convergence condition

This patch fixes the race in the beginning of benchmark run when some
threads hasn't got assigned curr_cpu yet so they don't occur in
nodes-of-process stats and benchmark concludes that all remaining
threads are converged already.

The race can be reproduced with small amount of threads and some bigger
amount of shared process memory, e.g. one process, two threads and 5GB
of process memory.

Signed-off-by: Petr Holasek <pholasek@redhat.com>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1429198699-25039-4-git-send-email-pholasek@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# 24f1ced1 16-Apr-2015 Petr Holasek <pholasek@redhat.com>

perf bench numa: Fixes of --quiet argument

Corrected description and fixed function of --quiet argument.

Signed-off-by: Petr Holasek <pholasek@redhat.com>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1429198699-25039-2-git-send-email-pholasek@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# 40ba93e3 27-Mar-2014 Ramkumar Ramachandra <artagnon@gmail.com>

perf bench: Set more defaults in the 'numa' suite

Currently,

$ perf bench numa mem

errors out with usage information. To make this more user-friendly, let
us provide a minimum set of default values required for a test
run. As an added bonus,

$ perf bench all

now goes all the way to completion.

Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1395964219-22173-2-git-send-email-artagnon@gmail.com
Signed-off-by: Jiri Olsa <jolsa@redhat.com>


# 0fae799e 13-Mar-2014 Arnaldo Carvalho de Melo <acme@redhat.com>

perf bench numa: Make no args mean 'run all tests'

If we call just:

perf bench numa mem

it will present the same output as:

perf bench numa mem -h

i.e. ask for instructions about what to run.

While that is kinda ok, using 'run all tests' as the default, i.e.
making 'no parms' be equivalent to:

perf bench numa mem -a

Will allow:

perf bench numa all

to actually do what is asked: i.e. run all the 'bench' tests, instead of
responding to that by asking what to do.

That, in turn, allows:

perf bench all

to actually complete, for the same reasons.

And after that, the tests that come after that, and that at some point
hit a NULL deref, will run, allowing me to reproduce a recently reported
problem.

That when you have the needed numa libraries, which wasn't the case for
the reporter, making me a bit confused after trying to reproduce his
report.

So make no parms mean -a.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Patrick Palka <patrick@parcs.ath.cx>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-x7h0ghx4pef4n0brywg21krk@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# 32bf5bd1 22-Sep-2013 Wei Yang <weiyang@linux.vnet.ibm.com>

perf bench: Fix two warnings

There are two warnings in bench/numa, when building this on 32-bit
machine.

The warning output is attached:

bench/numa.c:1113:20: error: comparison between signed and unsigned integer expressions [-Werror=sign-compare]
bench/numa.c:1161:6: error: format ‘%lx’ expects argument of t'long unsigned int’, but argument 5 has type ‘u64’ [-Werror=format]

This patch fixes these two warnings.

Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Link: http://lkml.kernel.org/r/1379839764-9245-1-git-send-email-weiyang@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# 2100f778 18-Oct-2013 Adrian Hunter <adrian.hunter@intel.com>

perf tools: Fix bench/numa.c for 32-bit build

bench/numa.c: In function 'worker_thread':
bench/numa.c:1123:20: error: comparison between signed and unsigned integer expressions [-Werror=sign-compare]
bench/numa.c:1171:6: error: format '%lx' expects argument of type 'long unsigned int', but argument 5 has type 'u64' [-Werror=format]
cc1: all warnings being treated as errors

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1382099356-4918-13-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# b81a48ea 03-Oct-2013 Petr Holasek <pholasek@redhat.com>

perf bench: Fix failing assertions in numa bench

Patch adds more subtle handling of -C and -N parameters in
parse_{cpu,node}_setup_list() functions when there isn't enough NUMA
nodes or CPUs present. Instead of assertion and terminating benchmark,
partial test is skipped with error message and perf will continue to the
next one.

Fixed problem can be easily reproduced on machine with only one NUMA
node:

# Running numa/mem benchmark...

# Running main, "perf bench numa mem -a"

...

# Running RAM-bw-remote, "perf bench numa mem -p 1 -t 1 -P 1024 -C 0 -M 1 -s
perf: bench/numa.c:622: parse_setup_node_list: Assertion `!(bind_node_0 < 0 ||
bind_node_0 >= g->p.nr_nodes)' failed.
Aborted

Signed-off-by: Petr Holasek <pholasek@redhat.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Petr Benas <pbenas@redhat.com>
Link: http://lkml.kernel.org/r/1380821325-4017-1-git-send-email-pholasek@redhat.com
Signed-off-by: Petr Benas <pbenas@redhat.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# 1c13f3c9 06-Dec-2012 Ingo Molnar <mingo@kernel.org>

perf: Add 'perf bench numa mem' NUMA performance measurement suite

Add a suite of NUMA performance benchmarks.

The goal was simulate the behavior and access patterns of real NUMA
workloads, via a wide range of parameters, so this tool goes well
beyond simple bzero() measurements that most NUMA micro-benchmarks use:

- It processes the data and creates a chain of data dependencies,
like a real workload would. Neither the compiler, nor the
kernel (via KSM and other optimizations) nor the CPU can
eliminate parts of the workload.

- It randomizes the initial state and also randomizes the target
addresses of the processing - it's not a simple forward scan
of addresses.

- It provides flexible options to set process, thread and memory
relationship information: -G sets "global" memory shared between
all test processes, -P sets "process" memory shared by all
threads of a process and -T sets "thread" private memory.

- There's a NUMA convergence monitoring and convergence latency
measurement option via -c and -m.

- Micro-sleeps and synchronization can be injected to provoke lock
contention and scheduling, via the -u and -S options. This simulates
IO and contention.

- The -x option instructs the workload to 'perturb' itself artificially
every N seconds, by moving to the first and last CPU of the system
periodically. This way the stability of convergence equilibrium and
the number of steps taken for the scheduler to reach equilibrium again
can be measured.

- The amount of work can be specified via the -l loop count, and/or
via a -s seconds-timeout value.

- CPU and node memory binding options, to test hard binding scenarios.
THP can be turned on and off via madvise() calls.

- Live reporting of convergence progress in an 'at glance' output format.
Printing of convergence and deconvergence events.

The 'perf bench numa mem -a' option will start an array of about 30
individual tests that will each output such measurements:

# Running 5x5-bw-thread, "perf bench numa mem -p 5 -t 5 -P 512 -s 20 -zZ0q --thp 1"
5x5-bw-thread, 20.276, secs, runtime-max/thread
5x5-bw-thread, 20.004, secs, runtime-min/thread
5x5-bw-thread, 20.155, secs, runtime-avg/thread
5x5-bw-thread, 0.671, %, spread-runtime/thread
5x5-bw-thread, 21.153, GB, data/thread
5x5-bw-thread, 528.818, GB, data-total
5x5-bw-thread, 0.959, nsecs, runtime/byte/thread
5x5-bw-thread, 1.043, GB/sec, thread-speed
5x5-bw-thread, 26.081, GB/sec, total-speed

See the help text and the code for more details.

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>