History log of /linux-master/tools/perf/bench/find-bit-bench.c
Revision Date Author Comments
# d1babea9 30-Mar-2023 Ian Rogers <irogers@google.com>

perf bench: Avoid NDEBUG warning

With NDEBUG set the asserts are compiled out. This yields
"unused-but-set-variable" variables. Move these variables behind
NDEBUG to avoid the warning.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20230330183827.1412303-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# 49bd97c2 18-Nov-2022 Sean Christopherson <seanjc@google.com>

perf tools: Use dedicated non-atomic clear/set bit helpers

Use the dedicated non-atomic helpers for {clear,set}_bit() and their
test variants, i.e. the double-underscore versions. Depsite being
defined in atomic.h, and despite the kernel versions being atomic in the
kernel, tools' {clear,set}_bit() helpers aren't actually atomic. Move
to the double-underscore versions so that the versions that are expected
to be atomic (for kernel developers) can be made atomic without
affecting users that don't want atomic operations.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: James Morse <james.morse@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Yury Norov <yury.norov@gmail.com>
Cc: alexandru elisei <alexandru.elisei@arm.com>
Cc: kvm@vger.kernel.org
Cc: kvmarm@lists.cs.columbia.edu
Cc: kvmarm@lists.linux.dev
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lore.kernel.org/lkml/20221119013450.2643007-6-seanjc@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# 75d7ba32 18-Nov-2022 Sean Christopherson <seanjc@google.com>

perf tools: Use dedicated non-atomic clear/set bit helpers

Use the dedicated non-atomic helpers for {clear,set}_bit() and their
test variants, i.e. the double-underscore versions. Depsite being
defined in atomic.h, and despite the kernel versions being atomic in the
kernel, tools' {clear,set}_bit() helpers aren't actually atomic. Move
to the double-underscore versions so that the versions that are expected
to be atomic (for kernel developers) can be made atomic without affecting
users that don't want atomic operations.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Message-Id: <20221119013450.2643007-6-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# 7fc5b571 07-Sep-2021 Andy Shevchenko <andriy.shevchenko@linux.intel.com>

tools: rename bitmap_alloc() to bitmap_zalloc()

Rename bitmap_alloc() to bitmap_zalloc() in tools to follow the bitmap API
in the kernel.

No functional changes intended.

Link: https://lkml.kernel.org/r/20210814211713.180533-14-yury.norov@gmail.com
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Yury Norov <yury.norov@gmail.com>
Suggested-by: Yury Norov <yury.norov@gmail.com>
Acked-by: Yury Norov <yury.norov@gmail.com>
Tested-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Lobakin <alobakin@pm.me>
Cc: Alexey Klimov <aklimov@redhat.com>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Ulf Hansson <ulf.hansson@linaro.org>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# f9f95068 12-Aug-2020 Colin Ian King <colin.king@canonical.com>

perf bench: Fix a couple of spelling mistakes in options text

There are a couple of spelling mistakes in the text. Fix these.

Signed-off-by: Colin King <colin.king@canonical.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: kernel-janitors@vger.kernel.org
Link: http://lore.kernel.org/lkml/20200812064647.200132-1-colin.king@canonical.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# 7c43b0c1 29-Jul-2020 Ian Rogers <irogers@google.com>

perf bench: Add benchmark of find_next_bit

for_each_set_bit, or similar functions like for_each_cpu, may be hot
within the kernel. If many bits were set then one could imagine on Intel
a "bt" instruction with every bit may be faster than the function call
and word length find_next_bit logic. Add a benchmark to measure this.

This benchmark on AMD rome and Intel skylakex shows "bt" is not a good
option except for very small bitmaps.

Committer testing:

# perf bench
Usage:
perf bench [<common options>] <collection> <benchmark> [<options>]

# List of all available benchmark collections:

sched: Scheduler and IPC benchmarks
syscall: System call benchmarks
mem: Memory access benchmarks
numa: NUMA scheduling and MM benchmarks
futex: Futex stressing benchmarks
epoll: Epoll stressing benchmarks
internals: Perf-internals benchmarks
all: All benchmarks

# perf bench mem

# List of available benchmarks for collection 'mem':

memcpy: Benchmark for memcpy() functions
memset: Benchmark for memset() functions
find_bit: Benchmark for find_bit() functions
all: Run all memory access benchmarks

# perf bench mem find_bit
# Running 'mem/find_bit' benchmark:
100000 operations 1 bits set of 1 bits
Average for_each_set_bit took: 730.200 usec (+- 6.468 usec)
Average test_bit loop took: 366.200 usec (+- 4.652 usec)
100000 operations 1 bits set of 2 bits
Average for_each_set_bit took: 781.000 usec (+- 24.247 usec)
Average test_bit loop took: 550.200 usec (+- 4.152 usec)
100000 operations 2 bits set of 2 bits
Average for_each_set_bit took: 1113.400 usec (+- 112.340 usec)
Average test_bit loop took: 1098.500 usec (+- 182.834 usec)
100000 operations 1 bits set of 4 bits
Average for_each_set_bit took: 843.800 usec (+- 8.772 usec)
Average test_bit loop took: 948.800 usec (+- 10.278 usec)
100000 operations 2 bits set of 4 bits
Average for_each_set_bit took: 1185.800 usec (+- 114.345 usec)
Average test_bit loop took: 1473.200 usec (+- 175.498 usec)
100000 operations 4 bits set of 4 bits
Average for_each_set_bit took: 1769.667 usec (+- 233.177 usec)
Average test_bit loop took: 1864.933 usec (+- 187.470 usec)
100000 operations 1 bits set of 8 bits
Average for_each_set_bit took: 898.000 usec (+- 21.755 usec)
Average test_bit loop took: 1768.400 usec (+- 23.672 usec)
100000 operations 2 bits set of 8 bits
Average for_each_set_bit took: 1244.900 usec (+- 116.396 usec)
Average test_bit loop took: 2201.800 usec (+- 145.398 usec)
100000 operations 4 bits set of 8 bits
Average for_each_set_bit took: 1822.533 usec (+- 231.554 usec)
Average test_bit loop took: 2569.467 usec (+- 168.453 usec)
100000 operations 8 bits set of 8 bits
Average for_each_set_bit took: 2845.100 usec (+- 441.365 usec)
Average test_bit loop took: 3023.300 usec (+- 219.575 usec)
100000 operations 1 bits set of 16 bits
Average for_each_set_bit took: 923.400 usec (+- 17.560 usec)
Average test_bit loop took: 3240.000 usec (+- 16.492 usec)
100000 operations 2 bits set of 16 bits
Average for_each_set_bit took: 1264.300 usec (+- 114.034 usec)
Average test_bit loop took: 3714.400 usec (+- 158.898 usec)
100000 operations 4 bits set of 16 bits
Average for_each_set_bit took: 1817.867 usec (+- 222.199 usec)
Average test_bit loop took: 4015.333 usec (+- 154.162 usec)
100000 operations 8 bits set of 16 bits
Average for_each_set_bit took: 2826.350 usec (+- 433.457 usec)
Average test_bit loop took: 4460.350 usec (+- 210.762 usec)
100000 operations 16 bits set of 16 bits
Average for_each_set_bit took: 4615.600 usec (+- 809.350 usec)
Average test_bit loop took: 5129.960 usec (+- 320.821 usec)
100000 operations 1 bits set of 32 bits
Average for_each_set_bit took: 904.400 usec (+- 14.250 usec)
Average test_bit loop took: 6194.000 usec (+- 29.254 usec)
100000 operations 2 bits set of 32 bits
Average for_each_set_bit took: 1252.700 usec (+- 116.432 usec)
Average test_bit loop took: 6652.400 usec (+- 154.352 usec)
100000 operations 4 bits set of 32 bits
Average for_each_set_bit took: 1824.200 usec (+- 229.133 usec)
Average test_bit loop took: 6961.733 usec (+- 154.682 usec)
100000 operations 8 bits set of 32 bits
Average for_each_set_bit took: 2823.950 usec (+- 432.296 usec)
Average test_bit loop took: 7351.900 usec (+- 193.626 usec)
100000 operations 16 bits set of 32 bits
Average for_each_set_bit took: 4552.560 usec (+- 785.141 usec)
Average test_bit loop took: 7998.360 usec (+- 305.629 usec)
100000 operations 32 bits set of 32 bits
Average for_each_set_bit took: 7557.067 usec (+- 1407.702 usec)
Average test_bit loop took: 9072.400 usec (+- 513.209 usec)
100000 operations 1 bits set of 64 bits
Average for_each_set_bit took: 896.800 usec (+- 14.389 usec)
Average test_bit loop took: 11927.200 usec (+- 68.862 usec)
100000 operations 2 bits set of 64 bits
Average for_each_set_bit took: 1230.400 usec (+- 111.731 usec)
Average test_bit loop took: 12478.600 usec (+- 189.382 usec)
100000 operations 4 bits set of 64 bits
Average for_each_set_bit took: 1844.733 usec (+- 244.826 usec)
Average test_bit loop took: 12911.467 usec (+- 206.246 usec)
100000 operations 8 bits set of 64 bits
Average for_each_set_bit took: 2779.300 usec (+- 413.612 usec)
Average test_bit loop took: 13372.650 usec (+- 239.623 usec)
100000 operations 16 bits set of 64 bits
Average for_each_set_bit took: 4423.920 usec (+- 748.240 usec)
Average test_bit loop took: 13995.800 usec (+- 318.427 usec)
100000 operations 32 bits set of 64 bits
Average for_each_set_bit took: 7580.600 usec (+- 1462.407 usec)
Average test_bit loop took: 15063.067 usec (+- 516.477 usec)
100000 operations 64 bits set of 64 bits
Average for_each_set_bit took: 13391.514 usec (+- 2765.371 usec)
Average test_bit loop took: 16974.914 usec (+- 916.936 usec)
100000 operations 1 bits set of 128 bits
Average for_each_set_bit took: 1153.800 usec (+- 124.245 usec)
Average test_bit loop took: 26959.000 usec (+- 714.047 usec)
100000 operations 2 bits set of 128 bits
Average for_each_set_bit took: 1445.200 usec (+- 113.587 usec)
Average test_bit loop took: 25798.800 usec (+- 512.908 usec)
100000 operations 4 bits set of 128 bits
Average for_each_set_bit took: 1990.933 usec (+- 219.362 usec)
Average test_bit loop took: 25589.400 usec (+- 348.288 usec)
100000 operations 8 bits set of 128 bits
Average for_each_set_bit took: 2963.000 usec (+- 419.487 usec)
Average test_bit loop took: 25690.050 usec (+- 262.025 usec)
100000 operations 16 bits set of 128 bits
Average for_each_set_bit took: 4585.200 usec (+- 741.734 usec)
Average test_bit loop took: 26125.040 usec (+- 274.127 usec)
100000 operations 32 bits set of 128 bits
Average for_each_set_bit took: 7626.200 usec (+- 1404.950 usec)
Average test_bit loop took: 27038.867 usec (+- 442.554 usec)
100000 operations 64 bits set of 128 bits
Average for_each_set_bit took: 13343.371 usec (+- 2686.460 usec)
Average test_bit loop took: 28936.543 usec (+- 883.257 usec)
100000 operations 128 bits set of 128 bits
Average for_each_set_bit took: 23442.950 usec (+- 4880.541 usec)
Average test_bit loop took: 32484.125 usec (+- 1691.931 usec)
100000 operations 1 bits set of 256 bits
Average for_each_set_bit took: 1183.000 usec (+- 32.073 usec)
Average test_bit loop took: 50114.600 usec (+- 198.880 usec)
100000 operations 2 bits set of 256 bits
Average for_each_set_bit took: 1550.000 usec (+- 124.550 usec)
Average test_bit loop took: 50334.200 usec (+- 128.425 usec)
100000 operations 4 bits set of 256 bits
Average for_each_set_bit took: 2164.333 usec (+- 246.359 usec)
Average test_bit loop took: 49959.867 usec (+- 188.035 usec)
100000 operations 8 bits set of 256 bits
Average for_each_set_bit took: 3211.200 usec (+- 454.829 usec)
Average test_bit loop took: 50140.850 usec (+- 176.046 usec)
100000 operations 16 bits set of 256 bits
Average for_each_set_bit took: 5181.640 usec (+- 882.726 usec)
Average test_bit loop took: 51003.160 usec (+- 419.601 usec)
100000 operations 32 bits set of 256 bits
Average for_each_set_bit took: 8369.333 usec (+- 1513.150 usec)
Average test_bit loop took: 52096.700 usec (+- 573.022 usec)
100000 operations 64 bits set of 256 bits
Average for_each_set_bit took: 13866.857 usec (+- 2649.393 usec)
Average test_bit loop took: 53989.600 usec (+- 938.808 usec)
100000 operations 128 bits set of 256 bits
Average for_each_set_bit took: 23588.350 usec (+- 4724.222 usec)
Average test_bit loop took: 57300.625 usec (+- 1625.962 usec)
100000 operations 256 bits set of 256 bits
Average for_each_set_bit took: 42752.200 usec (+- 9202.084 usec)
Average test_bit loop took: 64426.933 usec (+- 3402.326 usec)
100000 operations 1 bits set of 512 bits
Average for_each_set_bit took: 1632.000 usec (+- 229.954 usec)
Average test_bit loop took: 98090.000 usec (+- 1120.435 usec)
100000 operations 2 bits set of 512 bits
Average for_each_set_bit took: 1937.700 usec (+- 148.902 usec)
Average test_bit loop took: 100364.100 usec (+- 1433.219 usec)
100000 operations 4 bits set of 512 bits
Average for_each_set_bit took: 2528.000 usec (+- 243.654 usec)
Average test_bit loop took: 99932.067 usec (+- 955.868 usec)
100000 operations 8 bits set of 512 bits
Average for_each_set_bit took: 3734.100 usec (+- 512.359 usec)
Average test_bit loop took: 98944.750 usec (+- 812.070 usec)
100000 operations 16 bits set of 512 bits
Average for_each_set_bit took: 5551.400 usec (+- 846.605 usec)
Average test_bit loop took: 98691.600 usec (+- 654.753 usec)
100000 operations 32 bits set of 512 bits
Average for_each_set_bit took: 8594.500 usec (+- 1446.072 usec)
Average test_bit loop took: 99176.867 usec (+- 579.990 usec)
100000 operations 64 bits set of 512 bits
Average for_each_set_bit took: 13840.743 usec (+- 2527.055 usec)
Average test_bit loop took: 100758.743 usec (+- 833.865 usec)
100000 operations 128 bits set of 512 bits
Average for_each_set_bit took: 23185.925 usec (+- 4532.910 usec)
Average test_bit loop took: 103786.700 usec (+- 1475.276 usec)
100000 operations 256 bits set of 512 bits
Average for_each_set_bit took: 40322.400 usec (+- 8341.802 usec)
Average test_bit loop took: 109433.378 usec (+- 2742.615 usec)
100000 operations 512 bits set of 512 bits
Average for_each_set_bit took: 71804.540 usec (+- 15436.546 usec)
Average test_bit loop took: 120255.440 usec (+- 5252.777 usec)
100000 operations 1 bits set of 1024 bits
Average for_each_set_bit took: 1859.600 usec (+- 27.969 usec)
Average test_bit loop took: 187676.000 usec (+- 1337.770 usec)
100000 operations 2 bits set of 1024 bits
Average for_each_set_bit took: 2273.600 usec (+- 139.420 usec)
Average test_bit loop took: 188176.000 usec (+- 684.357 usec)
100000 operations 4 bits set of 1024 bits
Average for_each_set_bit took: 2940.400 usec (+- 268.213 usec)
Average test_bit loop took: 189172.600 usec (+- 593.295 usec)
100000 operations 8 bits set of 1024 bits
Average for_each_set_bit took: 4224.200 usec (+- 547.933 usec)
Average test_bit loop took: 190257.250 usec (+- 621.021 usec)
100000 operations 16 bits set of 1024 bits
Average for_each_set_bit took: 6090.560 usec (+- 877.975 usec)
Average test_bit loop took: 190143.880 usec (+- 503.753 usec)
100000 operations 32 bits set of 1024 bits
Average for_each_set_bit took: 9178.800 usec (+- 1475.136 usec)
Average test_bit loop took: 190757.100 usec (+- 494.757 usec)
100000 operations 64 bits set of 1024 bits
Average for_each_set_bit took: 14441.457 usec (+- 2545.497 usec)
Average test_bit loop took: 192299.486 usec (+- 795.251 usec)
100000 operations 128 bits set of 1024 bits
Average for_each_set_bit took: 23623.825 usec (+- 4481.182 usec)
Average test_bit loop took: 194885.550 usec (+- 1300.817 usec)
100000 operations 256 bits set of 1024 bits
Average for_each_set_bit took: 40194.956 usec (+- 8109.056 usec)
Average test_bit loop took: 200259.311 usec (+- 2566.085 usec)
100000 operations 512 bits set of 1024 bits
Average for_each_set_bit took: 70983.560 usec (+- 15074.982 usec)
Average test_bit loop took: 210527.460 usec (+- 4968.980 usec)
100000 operations 1024 bits set of 1024 bits
Average for_each_set_bit took: 136530.345 usec (+- 31584.400 usec)
Average test_bit loop took: 233329.691 usec (+- 10814.036 usec)
100000 operations 1 bits set of 2048 bits
Average for_each_set_bit took: 3077.600 usec (+- 76.376 usec)
Average test_bit loop took: 402154.400 usec (+- 518.571 usec)
100000 operations 2 bits set of 2048 bits
Average for_each_set_bit took: 3508.600 usec (+- 148.350 usec)
Average test_bit loop took: 403814.500 usec (+- 1133.027 usec)
100000 operations 4 bits set of 2048 bits
Average for_each_set_bit took: 4219.333 usec (+- 285.844 usec)
Average test_bit loop took: 404312.533 usec (+- 985.751 usec)
100000 operations 8 bits set of 2048 bits
Average for_each_set_bit took: 5670.550 usec (+- 615.238 usec)
Average test_bit loop took: 405321.800 usec (+- 1038.487 usec)
100000 operations 16 bits set of 2048 bits
Average for_each_set_bit took: 7785.080 usec (+- 992.522 usec)
Average test_bit loop took: 406746.160 usec (+- 1015.478 usec)
100000 operations 32 bits set of 2048 bits
Average for_each_set_bit took: 11163.800 usec (+- 1627.320 usec)
Average test_bit loop took: 406124.267 usec (+- 898.785 usec)
100000 operations 64 bits set of 2048 bits
Average for_each_set_bit took: 16964.629 usec (+- 2806.130 usec)
Average test_bit loop took: 406618.514 usec (+- 798.356 usec)
100000 operations 128 bits set of 2048 bits
Average for_each_set_bit took: 27219.625 usec (+- 4988.458 usec)
Average test_bit loop took: 410149.325 usec (+- 1705.641 usec)
100000 operations 256 bits set of 2048 bits
Average for_each_set_bit took: 45138.578 usec (+- 8831.021 usec)
Average test_bit loop took: 415462.467 usec (+- 2725.418 usec)
100000 operations 512 bits set of 2048 bits
Average for_each_set_bit took: 77450.540 usec (+- 15962.238 usec)
Average test_bit loop took: 426089.180 usec (+- 5171.788 usec)
100000 operations 1024 bits set of 2048 bits
Average for_each_set_bit took: 138023.636 usec (+- 29826.959 usec)
Average test_bit loop took: 446346.636 usec (+- 9904.417 usec)
100000 operations 2048 bits set of 2048 bits
Average for_each_set_bit took: 251072.600 usec (+- 55947.692 usec)
Average test_bit loop took: 484855.983 usec (+- 18970.431 usec)
#

Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lore.kernel.org/lkml/20200729220034.1337168-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>