#
bd2cdf26 |
|
06-Feb-2024 |
Yang Jihong <yangjihong1@huawei.com> |
perf sched: Move curr_pid and cpu_last_switched initialization to perf_sched__{lat|map|replay}() The curr_pid and cpu_last_switched are used only for the 'perf sched replay/latency/map'. Put their initialization in perf_sched__{lat|map|replay () to reduce unnecessary actions in other commands. Simple functional testing: # perf sched record perf bench sched messaging # Running 'sched/messaging' benchmark: # 20 sender and receiver processes per group # 10 groups == 400 processes run Total time: 0.209 [sec] [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 16.456 MB perf.data (147907 samples) ] # perf sched lat ------------------------------------------------------------------------------------------------------------------------------------------- Task | Runtime ms | Switches | Avg delay ms | Max delay ms | Max delay start | Max delay end | ------------------------------------------------------------------------------------------------------------------------------------------- sched-messaging:(401) | 2990.699 ms | 38705 | avg: 0.661 ms | max: 67.046 ms | max start: 456532.624830 s | max end: 456532.691876 s qemu-system-x86:(7) | 179.764 ms | 2191 | avg: 0.152 ms | max: 21.857 ms | max start: 456532.576434 s | max end: 456532.598291 s sshd:48125 | 0.522 ms | 2 | avg: 0.037 ms | max: 0.046 ms | max start: 456532.514610 s | max end: 456532.514656 s <SNIP> ksoftirqd/11:82 | 0.063 ms | 1 | avg: 0.005 ms | max: 0.005 ms | max start: 456532.769366 s | max end: 456532.769371 s kworker/9:0-mm_:34624 | 0.233 ms | 20 | avg: 0.004 ms | max: 0.007 ms | max start: 456532.690804 s | max end: 456532.690812 s migration/13:93 | 0.000 ms | 1 | avg: 0.004 ms | max: 0.004 ms | max start: 456532.512669 s | max end: 456532.512674 s ----------------------------------------------------------------------------------------------------------------- TOTAL: | 3180.750 ms | 41368 | --------------------------------------------------- # echo $? 0 # perf sched map *A0 456532.510141 secs A0 => migration/0:15 *. 456532.510171 secs . => swapper:0 . *B0 456532.510261 secs B0 => migration/1:21 . *. 456532.510279 secs <SNIP> L7 L7 L7 L7 L7 L7 L7 L7 L7 L7 L7 *L7 . . . . 456532.785979 secs L7 L7 L7 L7 L7 L7 L7 L7 L7 L7 L7 L7 *L7 . . . 456532.786054 secs L7 L7 L7 L7 L7 L7 L7 L7 L7 L7 L7 L7 L7 *L7 . . 456532.786127 secs L7 L7 L7 L7 L7 L7 L7 L7 L7 L7 L7 L7 L7 L7 *L7 . 456532.786197 secs L7 L7 L7 L7 L7 L7 L7 L7 L7 L7 L7 L7 L7 L7 L7 *L7 456532.786270 secs # echo $? 0 # perf sched replay run measurement overhead: 108 nsecs sleep measurement overhead: 66473 nsecs the run test took 1000002 nsecs the sleep test took 1082686 nsecs nr_run_events: 49334 nr_sleep_events: 50054 nr_wakeup_events: 34701 target-less wakeups: 165 multi-target wakeups: 766 task 0 ( swapper: 0), nr_events: 15419 task 1 ( swapper: 1), nr_events: 1 task 2 ( swapper: 2), nr_events: 1 <SNIP> task 715 ( sched-messaging: 110248), nr_events: 1438 task 716 ( sched-messaging: 110249), nr_events: 512 task 717 ( sched-messaging: 110250), nr_events: 500 task 718 ( sched-messaging: 110251), nr_events: 537 task 719 ( sched-messaging: 110252), nr_events: 823 ------------------------------------------------------------ #1 : 1325.288, ravg: 1325.29, cpu: 7823.35 / 7823.35 #2 : 1363.606, ravg: 1329.12, cpu: 7655.53 / 7806.56 #3 : 1349.494, ravg: 1331.16, cpu: 7544.80 / 7780.39 #4 : 1311.488, ravg: 1329.19, cpu: 7495.13 / 7751.86 #5 : 1309.902, ravg: 1327.26, cpu: 7266.65 / 7703.34 #6 : 1309.535, ravg: 1325.49, cpu: 7843.86 / 7717.39 #7 : 1316.482, ravg: 1324.59, cpu: 7854.41 / 7731.09 #8 : 1366.604, ravg: 1328.79, cpu: 7955.81 / 7753.57 #9 : 1326.286, ravg: 1328.54, cpu: 7466.86 / 7724.90 #10 : 1356.653, ravg: 1331.35, cpu: 7566.60 / 7709.07 # echo $? 0 Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240206083228.172607-5-yangjihong1@huawei.com
|
#
5e895278 |
|
06-Feb-2024 |
Yang Jihong <yangjihong1@huawei.com> |
perf sched: Move curr_thread initialization to perf_sched__map() The curr_thread is used only for the 'perf sched map'. Put initialization in perf_sched__map() to reduce unnecessary actions in other commands. Simple functional testing: # perf sched record perf bench sched messaging # Running 'sched/messaging' benchmark: # 20 sender and receiver processes per group # 10 groups == 400 processes run Total time: 0.197 [sec] [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 15.526 MB perf.data (140095 samples) ] # perf sched map *A0 451264.532445 secs A0 => migration/0:15 *. 451264.532468 secs . => swapper:0 . *B0 451264.532537 secs B0 => migration/1:21 . *. 451264.532560 secs . . *C0 451264.532644 secs C0 => migration/2:27 . . *. 451264.532668 secs . . . *D0 451264.532753 secs D0 => migration/3:33 . . . *. 451264.532778 secs . . . . *E0 451264.532861 secs E0 => migration/4:39 . . . . *. 451264.532886 secs . . . . . *F0 451264.532973 secs F0 => migration/5:45 <SNIP> A7 A7 A7 A7 A7 *A7 . . . . . . . . . . 451264.790785 secs A7 A7 A7 A7 A7 A7 *A7 . . . . . . . . . 451264.790858 secs A7 A7 A7 A7 A7 A7 A7 *A7 . . . . . . . . 451264.790934 secs A7 A7 A7 A7 A7 A7 A7 A7 *A7 . . . . . . . 451264.791004 secs A7 A7 A7 A7 A7 A7 A7 A7 A7 *A7 . . . . . . 451264.791075 secs A7 A7 A7 A7 A7 A7 A7 A7 A7 A7 *A7 . . . . . 451264.791143 secs A7 A7 A7 A7 A7 A7 A7 A7 A7 A7 A7 *A7 . . . . 451264.791232 secs A7 A7 A7 A7 A7 A7 A7 A7 A7 A7 A7 A7 *A7 . . . 451264.791336 secs A7 A7 A7 A7 A7 A7 A7 A7 A7 A7 A7 A7 A7 *A7 . . 451264.791407 secs A7 A7 A7 A7 A7 A7 A7 A7 A7 A7 A7 A7 A7 A7 *A7 . 451264.791484 secs A7 A7 A7 A7 A7 A7 A7 A7 A7 A7 A7 A7 A7 A7 A7 *A7 451264.791553 secs # echo $? 0 Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240206083228.172607-4-yangjihong1@huawei.com
|
#
ef76a5af |
|
06-Feb-2024 |
Yang Jihong <yangjihong1@huawei.com> |
perf sched: Fix memory leak in perf_sched__map() perf_sched__map() needs to free memory of map_cpus, color_pids and color_cpus in normal path and rollback allocated memory in error path. Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240206083228.172607-3-yangjihong1@huawei.com
|
#
c6907863 |
|
06-Feb-2024 |
Yang Jihong <yangjihong1@huawei.com> |
perf sched: Move start_work_mutex and work_done_wait_mutex initialization to perf_sched__replay() The start_work_mutex and work_done_wait_mutex are used only for the 'perf sched replay'. Put their initialization in perf_sched__replay () to reduce unnecessary actions in other commands. Simple functional testing: # perf sched record perf bench sched messaging # Running 'sched/messaging' benchmark: # 20 sender and receiver processes per group # 10 groups == 400 processes run Total time: 0.197 [sec] [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 14.952 MB perf.data (134165 samples) ] # perf sched replay run measurement overhead: 108 nsecs sleep measurement overhead: 65658 nsecs the run test took 999991 nsecs the sleep test took 1079324 nsecs nr_run_events: 42378 nr_sleep_events: 43102 nr_wakeup_events: 31852 target-less wakeups: 17 multi-target wakeups: 712 task 0 ( swapper: 0), nr_events: 10451 task 1 ( swapper: 1), nr_events: 3 task 2 ( swapper: 2), nr_events: 1 <SNIP> task 717 ( sched-messaging: 74483), nr_events: 152 task 718 ( sched-messaging: 74484), nr_events: 1944 task 719 ( sched-messaging: 74485), nr_events: 73 task 720 ( sched-messaging: 74486), nr_events: 163 task 721 ( sched-messaging: 74487), nr_events: 942 task 722 ( sched-messaging: 74488), nr_events: 78 task 723 ( sched-messaging: 74489), nr_events: 1090 ------------------------------------------------------------ #1 : 1366.507, ravg: 1366.51, cpu: 7682.70 / 7682.70 #2 : 1410.072, ravg: 1370.86, cpu: 7723.88 / 7686.82 #3 : 1396.296, ravg: 1373.41, cpu: 7568.20 / 7674.96 #4 : 1381.019, ravg: 1374.17, cpu: 7531.81 / 7660.64 #5 : 1393.826, ravg: 1376.13, cpu: 7725.25 / 7667.11 #6 : 1401.581, ravg: 1378.68, cpu: 7594.82 / 7659.88 #7 : 1381.337, ravg: 1378.94, cpu: 7371.22 / 7631.01 #8 : 1373.842, ravg: 1378.43, cpu: 7894.92 / 7657.40 #9 : 1364.697, ravg: 1377.06, cpu: 7324.91 / 7624.15 #10 : 1363.613, ravg: 1375.72, cpu: 7209.55 / 7582.69 # echo $? 0 Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240206083228.172607-2-yangjihong1@huawei.com
|
#
68f87f24 |
|
22-Jan-2024 |
Ze Gao <zegao2021@gmail.com> |
perf sched: Commit to evsel__taskstate() to parse task state info Now that we have evsel__taskstate() which no longer relies on the hardcoded task state string and has good backward compatibility, we have a good reason to use it. Note TASK_STATE_TO_CHAR_STR and task bitmasks are useless now so we remove them for good. And now we pass the state info back and forth in a symbolic char which explains itself well instead. Signed-off-by: Ze Gao <zegao@tencent.com> Cc: Steven Rostedt <rostedt@goodmis.org> Link: https://lore.kernel.org/r/20240123022425.1611483-1-zegao@tencent.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
#
ccc606a7 |
|
22-Jan-2024 |
Ze Gao <zegao2021@gmail.com> |
perf sched: Sync state char array with the kernel Update state char array to match the latest kernel definitions and remove unused state mapping macros. Note this is the preparing patch for get rid of the way to parse process state from raw bitmask value. Instead we are going to parse it from the recorded tracepoint print format, and this change marks why we're doing it. Signed-off-by: Ze Gao <zegao@tencent.com> Cc: Steven Rostedt <rostedt@goodmis.org> Link: https://lore.kernel.org/r/20240122070859.1394479-3-zegao@tencent.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
#
78c32f4c |
|
24-Oct-2023 |
Ian Rogers <irogers@google.com> |
libperf rc_check: Add RC_CHK_EQUAL Comparing pointers with reference count checking is tricky to avoid a SEGV. Add a convenience macro to simplify and use. Signed-off-by: Ian Rogers <irogers@google.com> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: German Gomez <german.gomez@arm.com> Cc: James Clark <james.clark@arm.com> Cc: Nick Terrell <terrelln@fb.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: liuwenyu <liuwenyu7@huawei.com> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Song Liu <song@kernel.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Yanteng Si <siyanteng@loongson.cn> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Link: https://lore.kernel.org/r/20231024222353.3024098-5-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
#
232418a0 |
|
26-May-2023 |
Ian Rogers <irogers@google.com> |
perf sched: Avoid large stack allocations Commit 5ded57ac1bdb ("perf inject: Remove static variables") moved static variables to local, however, in this case 3 MAX_CPUS (4096) sized arrays were moved onto the stack making the stack frame quite large. Avoid the stack usage by dynamically allocating the arrays. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20230527034324.2597593-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
8ab12a20 |
|
08-Jun-2023 |
Ian Rogers <irogers@google.com> |
perf callchain: Use pthread keys for tls callchain_cursor Pthread keys are more portable than __thread and allow the association of a destructor with the key. Use the destructor to clean up TLS callchain cursors to aid understanding memory leaks. Committer notes: Had to fixup a series of unconverted places and also check for the return of get_tls_callchain_cursor() as it may fail and return NULL. In that unlikely case we now either print something to a file, if the caller was expecting to print a callchain, or return an error code to state that resolving the callchain isn't possible. In some cases this was made easier because thread__resolve_callchain() already can fail for other reasons, so this new one (cursor == NULL) can be added and the callers don't have to explicitely check for this new condition. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ali Saidi <alisaidi@amazon.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Brian Robbins <brianrob@linux.microsoft.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Dmitrii Dolgov <9erthalion6@gmail.com> Cc: Fangrui Song <maskray@google.com> Cc: German Gomez <german.gomez@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ivan Babrou <ivan@cloudflare.com> Cc: James Clark <james.clark@arm.com> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Wenyu Liu <liuwenyu7@huawei.com> Cc: Will Deacon <will@kernel.org> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Ye Xingchen <ye.xingchen@zte.com.cn> Cc: Yuan Can <yuancan@huawei.com> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20230608232823.4027869-25-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
f6005caf |
|
08-Jun-2023 |
Ian Rogers <irogers@google.com> |
perf thread: Add reference count checking Modify struct declaration and accessor functions for the reference count checkers additional layer of indirection. Make sure pid_cmp in builtin-sched.c uses the underlying/original struct in pointer arithmetic, and not the temporary get/put indirection. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ali Saidi <alisaidi@amazon.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Brian Robbins <brianrob@linux.microsoft.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Dmitrii Dolgov <9erthalion6@gmail.com> Cc: Fangrui Song <maskray@google.com> Cc: German Gomez <german.gomez@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ivan Babrou <ivan@cloudflare.com> Cc: James Clark <james.clark@arm.com> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Wenyu Liu <liuwenyu7@huawei.com> Cc: Will Deacon <will@kernel.org> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Ye Xingchen <ye.xingchen@zte.com.cn> Cc: Yuan Can <yuancan@huawei.com> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20230608232823.4027869-8-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
0dd5041c |
|
08-Jun-2023 |
Ian Rogers <irogers@google.com> |
perf addr_location: Add init/exit/copy functions struct addr_location holds references to multiple reference counted objects. Add init/exit functions to make maintenance of those more consistent with the rest of the code and to try to avoid leaks. Modification of thread reference counts isn't included in this change. Committer notes: I needed to initialize result to sample->ip to make sure is set to something, fixing a compile time error, mostly keeping the previous logic as build_alloc_func_list() already does debugging/error prints about what went wrong if it takes the 'goto out'. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ali Saidi <alisaidi@amazon.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Brian Robbins <brianrob@linux.microsoft.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Dmitrii Dolgov <9erthalion6@gmail.com> Cc: Fangrui Song <maskray@google.com> Cc: German Gomez <german.gomez@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ivan Babrou <ivan@cloudflare.com> Cc: James Clark <james.clark@arm.com> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Wenyu Liu <liuwenyu7@huawei.com> Cc: Will Deacon <will@kernel.org> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Ye Xingchen <ye.xingchen@zte.com.cn> Cc: Yuan Can <yuancan@huawei.com> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20230608232823.4027869-7-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
ee84a303 |
|
08-Jun-2023 |
Ian Rogers <irogers@google.com> |
perf thread: Add accessor functions for thread Using accessors will make it easier to add reference count checking in later patches. Committer notes: thread->nsinfo wasn't wrapped as it is used together with nsinfo__zput(), where does a trick to set the field with a refcount being dropped to NULL, and that doesn't work well with using thread__nsinfo(thread), that loses the &thread->nsinfo pointer. When refcount checking is added to 'struct thread', later in this series, nsinfo__zput(RC_CHK_ACCESS(thread)->nsinfo) will be used to check the thread pointer. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ali Saidi <alisaidi@amazon.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Brian Robbins <brianrob@linux.microsoft.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Dmitrii Dolgov <9erthalion6@gmail.com> Cc: Fangrui Song <maskray@google.com> Cc: German Gomez <german.gomez@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ivan Babrou <ivan@cloudflare.com> Cc: James Clark <james.clark@arm.com> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Wenyu Liu <liuwenyu7@huawei.com> Cc: Will Deacon <will@kernel.org> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Ye Xingchen <ye.xingchen@zte.com.cn> Cc: Yuan Can <yuancan@huawei.com> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20230608232823.4027869-4-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
40826c45 |
|
08-Jun-2023 |
Ian Rogers <irogers@google.com> |
perf thread: Remove notion of dead threads The dead thread list is best effort. Threads live on it until the reference count hits zero and they are removed. With correct reference counting this should never happen. It is, however, part of the 'perf sched' output that is now removed. If this is an issue we should implement tracking of dead threads in a robust not best-effort way. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ali Saidi <alisaidi@amazon.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Brian Robbins <brianrob@linux.microsoft.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Dmitrii Dolgov <9erthalion6@gmail.com> Cc: Fangrui Song <maskray@google.com> Cc: German Gomez <german.gomez@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ivan Babrou <ivan@cloudflare.com> Cc: James Clark <james.clark@arm.com> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Wenyu Liu <liuwenyu7@huawei.com> Cc: Will Deacon <will@kernel.org> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Ye Xingchen <ye.xingchen@zte.com.cn> Cc: Yuan Can <yuancan@huawei.com> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20230608232823.4027869-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
4453deac |
|
28-Mar-2023 |
Chunxin Zang <zangchunxin@lixiang.com> |
perf sched: Fix sched latency analysis incorrection when using 'sched:sched_wakeup' 'perf sched latency' is incorrect to get process schedule latency when it used 'sched:sched_wakeup' to analysis perf.data. Because 'perf record' prefers to use 'sched:sched_waking' to 'sched:sched_wakeup' since commit d566a9c2d482 ("perf sched: Prefer sched_waking event when it exists"). It's very reasonable to evaluate process schedule latency. Similarly, update sched latency/map/replay to use sched_waking events. Signed-off-by: Chunxin Zang <zangchunxin@lixiang.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20230328060038.2346935-1-zangchunxin@lixiang.com Signed-off-by: Jerry Zhou <zhouchunhua@lixiang.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
f12ad272 |
|
10-Apr-2023 |
Ian Rogers <irogers@google.com> |
perf util: Move input_name to util 'input_name' is the name of the input perf.data file, it is used by data convert and ui code. Move it to util to make it more consistent with other global state. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Chengdong Li <chengdongli@tencent.com> Cc: Denis Nikitin <denik@chromium.org> Cc: Florian Fischer <florian.fischer@muhq.space> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Martin Liška <mliska@suse.cz> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Raul Silvera <rsilvera@google.com> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Rob Herring <robh@kernel.org> Cc: Sean Christopherson <seanjc@google.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20230410162511.3055900-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
49bd97c2 |
|
18-Nov-2022 |
Sean Christopherson <seanjc@google.com> |
perf tools: Use dedicated non-atomic clear/set bit helpers Use the dedicated non-atomic helpers for {clear,set}_bit() and their test variants, i.e. the double-underscore versions. Depsite being defined in atomic.h, and despite the kernel versions being atomic in the kernel, tools' {clear,set}_bit() helpers aren't actually atomic. Move to the double-underscore versions so that the versions that are expected to be atomic (for kernel developers) can be made atomic without affecting users that don't want atomic operations. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: James Morse <james.morse@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Marc Zyngier <maz@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Sean Christopherson <seanjc@google.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Yury Norov <yury.norov@gmail.com> Cc: alexandru elisei <alexandru.elisei@arm.com> Cc: kvm@vger.kernel.org Cc: kvmarm@lists.cs.columbia.edu Cc: kvmarm@lists.linux.dev Cc: linux-arm-kernel@lists.infradead.org Link: http://lore.kernel.org/lkml/20221119013450.2643007-6-seanjc@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
75d7ba32 |
|
18-Nov-2022 |
Sean Christopherson <seanjc@google.com> |
perf tools: Use dedicated non-atomic clear/set bit helpers Use the dedicated non-atomic helpers for {clear,set}_bit() and their test variants, i.e. the double-underscore versions. Depsite being defined in atomic.h, and despite the kernel versions being atomic in the kernel, tools' {clear,set}_bit() helpers aren't actually atomic. Move to the double-underscore versions so that the versions that are expected to be atomic (for kernel developers) can be made atomic without affecting users that don't want atomic operations. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Message-Id: <20221119013450.2643007-6-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
165da802 |
|
08-Sep-2022 |
Namhyung Kim <namhyung@kernel.org> |
perf sched: Factor out destroy_tasks() Add destroy_tasks() as a counterpart of create_tasks() and put the thread safety notations there. After join, it destroys semaphores too. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20220908225448.4105056-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
59c26660 |
|
26-Aug-2022 |
Ian Rogers <irogers@google.com> |
perf sched: Fixes for thread safety analysis Add annotations to describe lock behavior. Add unlocks so that mutexes aren't conditionally held on exit from perf_sched__replay. Add an exit variable so that thread_func can terminate, rather than leaving the threads blocked on mutexes. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexandre Truong <alexandre.truong@arm.com> Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Andres Freund <andres@anarazel.de> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: André Almeida <andrealmeid@igalia.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Cc: Colin Ian King <colin.king@intel.com> Cc: Dario Petrillo <dario.pk1@gmail.com> Cc: Darren Hart <dvhart@infradead.org> Cc: Dave Marchevsky <davemarchevsky@fb.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Fangrui Song <maskray@google.com> Cc: Hewenliang <hewenliang4@huawei.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jason Wang <wangborong@cdjrlc.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kim Phillips <kim.phillips@amd.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Martin Liška <mliska@suse.cz> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Pavithra Gurushankar <gpavithrasha@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Quentin Monnet <quentin@isovalent.com> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Remi Bernon <rbernon@codeweavers.com> Cc: Riccardo Mancini <rickyman7@gmail.com> Cc: Song Liu <songliubraving@fb.com> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Tom Rix <trix@redhat.com> Cc: Weiguo Li <liwg06@foxmail.com> Cc: Wenyu Liu <liuwenyu7@huawei.com> Cc: William Cohen <wcohen@redhat.com> Cc: Zechuan Chen <chenzechuan1@huawei.com> Cc: bpf@vger.kernel.org Cc: llvm@lists.linux.dev Cc: yaowenbin <yaowenbin1@huawei.com> Link: https://lore.kernel.org/r/20220826164242.43412-17-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
0bd14ac2 |
|
26-Aug-2022 |
Ian Rogers <irogers@google.com> |
perf sched: Update use of pthread mutex Switch to the use of mutex wrappers that provide better error checking. Update cmd_sched so that we always explicitly destroy the mutexes. Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexandre Truong <alexandre.truong@arm.com> Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Andres Freund <andres@anarazel.de> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: André Almeida <andrealmeid@igalia.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Cc: Colin Ian King <colin.king@intel.com> Cc: Dario Petrillo <dario.pk1@gmail.com> Cc: Darren Hart <dvhart@infradead.org> Cc: Dave Marchevsky <davemarchevsky@fb.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Fangrui Song <maskray@google.com> Cc: Hewenliang <hewenliang4@huawei.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jason Wang <wangborong@cdjrlc.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kim Phillips <kim.phillips@amd.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Martin Liška <mliska@suse.cz> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Pavithra Gurushankar <gpavithrasha@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Quentin Monnet <quentin@isovalent.com> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Remi Bernon <rbernon@codeweavers.com> Cc: Riccardo Mancini <rickyman7@gmail.com> Cc: Song Liu <songliubraving@fb.com> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Tom Rix <trix@redhat.com> Cc: Weiguo Li <liwg06@foxmail.com> Cc: Wenyu Liu <liuwenyu7@huawei.com> Cc: William Cohen <wcohen@redhat.com> Cc: Zechuan Chen <chenzechuan1@huawei.com> Cc: bpf@vger.kernel.org Cc: llvm@lists.linux.dev Cc: yaowenbin <yaowenbin1@huawei.com> Link: https://lore.kernel.org/r/20220826164242.43412-9-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
d72e5cf3 |
|
24-Aug-2022 |
Ian Rogers <irogers@google.com> |
perf sched: Fix memory leaks in __cmd_record detected with -fsanitize=address An array of strings is passed to cmd_record but not freed. As cmd_record modifies the array, add another array as a copy that can be mutated allowing the original array contents to all be freed. Detected with -fsanitize=address. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20220824145733.409005-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
628881ee |
|
08-Aug-2022 |
Yang Jihong <yangjihong1@huawei.com> |
perf sched latency: Fix subcommand matching error perf sched latency use strncmp to match subcommands which matching does not meet expectation. Before: # perf sched lat1234 >/dev/null # echo $? 0 # Solution: Use strstarts to match subcommand. After: # perf sched lat1234 Usage: perf sched [<options>] {record|latency|map|replay|script|timehist} -D, --dump-raw-trace dump raw trace in ASCII -f, --force don't complain, do it -i, --input <file> input file name -v, --verbose be more verbose (show symbol address, etc) # echo $? 129 # # perf sched lat >/dev/null # echo $? 0 # Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20220808092408.107399-3-yangjihong1@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
ae0f4eb3 |
|
25-Mar-2022 |
Wei Li <liwei391@huawei.com> |
perf tools: Enhance the matching of sub-commands abbreviations We support short command 'rec*' for 'record' and 'rep*' for 'report' in lots of sub-commands, but the matching is not quite strict currnetly. It may be puzzling sometime, like we mis-type a 'recport' to report but it will perform 'record' in fact without any message. To fix this, add a check to ensure that the short cmd is valid prefix of the real command. Committer testing: [root@quaco ~]# perf c2c re sleep 1 Usage: perf c2c {record|report} -v, --verbose be more verbose (show counter open errors, etc) # perf c2c rec sleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.038 MB perf.data (16 samples) ] # perf c2c recport sleep 1 Usage: perf c2c {record|report} -v, --verbose be more verbose (show counter open errors, etc) # perf c2c record sleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.038 MB perf.data (15 samples) ] # perf c2c records sleep 1 Usage: perf c2c {record|report} -v, --verbose be more verbose (show counter open errors, etc) # Signed-off-by: Wei Li <liwei391@huawei.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Hanjun Guo <guohanjun@huawei.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Rui Xiang <rui.xiang@huawei.com> Link: http://lore.kernel.org/lkml/20220325092032.2956161-1-liwei391@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
6d18804b |
|
04-Jan-2022 |
Ian Rogers <irogers@google.com> |
perf cpumap: Give CPUs their own type A common problem is confusing CPU map indices with the CPU, by wrapping the CPU with a struct then this is avoided. This approach is similar to atomic_t. Committer notes: To make it build with BUILD_BPF_SKEL=1 these files needed the conversions to 'struct perf_cpu' usage: tools/perf/util/bpf_counter.c tools/perf/util/bpf_counter_cgroup.c tools/perf/util/bpf_ftrace.c Also perf_env__get_cpu() was removed back in "perf cpumap: Switch cpu_map__build_map to cpu function". Additionally these needed to be fixed for the ARM builds to complete: tools/perf/arch/arm/util/cs-etm.c tools/perf/arch/arm64/util/pmu.c Suggested-by: John Garry <john.garry@huawei.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Clarke <pc@us.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Riccardo Mancini <rickyman7@gmail.com> Cc: Stephane Eranian <eranian@google.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Vineet Singh <vineet.singh@intel.com> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Cc: zhengjun.xing@intel.com Link: https://lore.kernel.org/r/20220105061351.120843-49-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
dfc66bef |
|
04-Jan-2022 |
Ian Rogers <irogers@google.com> |
perf cpumap: Move 'has' function to libperf Make the cpu map argument const for consistency with the rest of the API. Modify cpu_map__idx accordingly. Reviewed-by: James Clark <james.clark@arm.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Garry <john.garry@huawei.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Clarke <pc@us.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Riccardo Mancini <rickyman7@gmail.com> Cc: Stephane Eranian <eranian@google.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Vineet Singh <vineet.singh@intel.com> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Cc: zhengjun.xing@intel.com Link: https://lore.kernel.org/r/20220105061351.120843-21-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
7cc72553 |
|
18-Oct-2021 |
James Clark <james.clark@arm.com> |
perf tools: Check vmlinux/kallsyms arguments in all tools Only perf report checked the validity of these arguments so apply the same check to all tools that read them for consistency. Signed-off-by: James Clark <james.clark@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Denis Nikitin <denik@chromium.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20211018134844.2627174-3-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
2681bd85 |
|
19-Jul-2021 |
Namhyung Kim <namhyung@kernel.org> |
perf tools: Remove repipe argument from perf_session__new() The repipe argument is only used by perf inject and the all others passes 'false'. Let's remove it from the function signature and add __perf_session__new() to be called from perf inject directly. This is a preparation of the change the pipe input/output. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20210719223153.1618812-2-namhyung@kernel.org [ Fixed up some trivial conflicts as this patchset fell thru the cracks ;-( ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
b0f00855 |
|
13-Jul-2021 |
Yang Jihong <yangjihong1@huawei.com> |
perf sched: Fix record failure when CONFIG_SCHEDSTATS is not set The tracepoints trace_sched_stat_{wait, sleep, iowait} are not exposed to user if CONFIG_SCHEDSTATS is not set, "perf sched record" records the three events. As a result, the command fails. Before: #perf sched record sleep 1 event syntax error: 'sched:sched_stat_wait' \___ unknown tracepoint Error: File /sys/kernel/tracing/events/sched/sched_stat_wait not found. Hint: Perhaps this kernel misses some CONFIG_ setting to enable this feature?. Run 'perf list' for a list of valid events Usage: perf record [<options>] [<command>] or: perf record [<options>] -- <command> [<options>] -e, --event <event> event selector. use 'perf list' to list available events Solution: Check whether schedstat tracepoints are exposed. If no, these events are not recorded. After: # perf sched record sleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.163 MB perf.data (1091 samples) ] # perf sched report run measurement overhead: 4736 nsecs sleep measurement overhead: 9059979 nsecs the run test took 999854 nsecs the sleep test took 8945271 nsecs nr_run_events: 716 nr_sleep_events: 785 nr_wakeup_events: 0 ... ------------------------------------------------------------ Fixes: 2a09b5de235a6 ("sched/fair: do not expose some tracepoints to user if CONFIG_SCHEDSTATS is not set") Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Yafang Shao <laoar.shao@gmail.com> Link: http://lore.kernel.org/lkml/20210713112358.194693-1-yangjihong1@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
d08c84e0 |
|
14-Jul-2021 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
perf sched: Cast PTHREAD_STACK_MIN to int as it may turn into sysconf(__SC_THREAD_STACK_MIN_VALUE) In fedora rawhide the PTHREAD_STACK_MIN define may end up expanded to a sysconf() call, and that will return 'long int', breaking the build: 45 fedora:rawhide : FAIL gcc version 11.1.1 20210623 (Red Hat 11.1.1-6) (GCC) builtin-sched.c: In function 'create_tasks': /git/perf-5.14.0-rc1/tools/include/linux/kernel.h:43:24: error: comparison of distinct pointer types lacks a cast [-Werror] 43 | (void) (&_max1 == &_max2); \ | ^~ builtin-sched.c:673:34: note: in expansion of macro 'max' 673 | (size_t) max(16 * 1024, PTHREAD_STACK_MIN)); | ^~~ cc1: all warnings being treated as errors $ grep __sysconf /usr/include/*/*.h /usr/include/bits/pthread_stack_min-dynamic.h:extern long int __sysconf (int __name) __THROW; /usr/include/bits/pthread_stack_min-dynamic.h:# define PTHREAD_STACK_MIN __sysconf (__SC_THREAD_STACK_MIN_VALUE) /usr/include/bits/time.h:extern long int __sysconf (int); /usr/include/bits/time.h:# define CLK_TCK ((__clock_t) __sysconf (2)) /* 2 is _SC_CLK_TCK */ $ So cast it to int to cope with that. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
4d39c89f |
|
23-Mar-2021 |
Ingo Molnar <mingo@kernel.org> |
perf tools: Fix various typos in comments Fix ~124 single-word typos and a few spelling errors in the perf tooling code, accumulated over the years. Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20210321113734.GA248990@gmail.com Link: http://lore.kernel.org/lkml/20210323160915.GA61903@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
b02736f7 |
|
30-Nov-2020 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
perf evlist: Use the right prefix for 'struct evlist' 'find' methods perf_evlist__ is for 'struct perf_evlist' methods, in tools/lib/perf/, go on completing this split. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
dc000c45 |
|
25-Sep-2020 |
Joel Fernandes (Google) <joel@joelfernandes.org> |
perf sched: Show start of latency as well The 'perf sched latency' tool is really useful at showing worst-case latencies that task encountered since wakeup. However it shows only the end of the latency. Often times the start of a latency is interesting as it can show what else was going on at the time to cause the latency. I certainly myself spending a lot of time backtracking to the start of the latency in "perf sched script" which wastes a lot of time. This patch therefore adds a new column "Max delay start". Considering this, also rename "Maximum delay at" to "Max delay end" as its easier to understand. Example of the new output: ---------------------------------------------------------------------------------------------------------------------------------- Task | Runtime ms | Switches | Avg delay ms | Max delay ms | Max delay start | Max delay end | ---------------------------------------------------------------------------------------------------------------------------------- MediaScannerSer:11936 | 651.296 ms | 67978 | avg: 0.113 ms | max: 77.250 ms | max start: 477.691360 s | max end: 477.768610 s audio@2.0-servi:(3) | 0.000 ms | 3440 | avg: 0.034 ms | max: 72.267 ms | max start: 477.697051 s | max end: 477.769318 s AudioOut_1D:8112 | 0.000 ms | 2588 | avg: 0.083 ms | max: 64.020 ms | max start: 477.710740 s | max end: 477.774760 s Time-limited te:14973 | 7966.090 ms | 24807 | avg: 0.073 ms | max: 15.563 ms | max start: 477.162746 s | max end: 477.178309 s surfaceflinger:8049 | 9.680 ms | 603 | avg: 0.063 ms | max: 13.275 ms | max start: 476.931791 s | max end: 476.945067 s HeapTaskDaemon:(3) | 1588.830 ms | 7040 | avg: 0.065 ms | max: 6.880 ms | max start: 473.666043 s | max end: 473.672922 s mount-passthrou:(3) | 1370.809 ms | 68904 | avg: 0.011 ms | max: 6.524 ms | max start: 478.090630 s | max end: 478.097154 s ReferenceQueueD:(3) | 11.794 ms | 1725 | avg: 0.014 ms | max: 6.521 ms | max start: 476.119782 s | max end: 476.126303 s writer:14077 | 18.410 ms | 1427 | avg: 0.036 ms | max: 6.131 ms | max start: 474.169675 s | max end: 474.175805 s Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Acked-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20200925235634.4089867-1-joel@joelfernandes.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
a74eaf16 |
|
17-Aug-2020 |
David Ahern <dsahern@kernel.org> |
perf sched timehist: Fix use of CPU list with summary option Do not update thread stats or show idle summary unless CPU is in the list of interest. Fixes: c30d630d1bcfad8d ("perf sched timehist: Add support for filtering on CPU") Signed-off-by: David Ahern <dsahern@kernel.org> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Link: http://lore.kernel.org/lkml/20200817170943.1486-1-dsahern@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
d566a9c2 |
|
07-Aug-2020 |
David Ahern <dsahern@kernel.org> |
perf sched: Prefer sched_waking event when it exists Commit fbd705a0c618 ("sched: Introduce the 'trace_sched_waking' tracepoint") added sched_waking tracepoint which should be preferred over sched_wakeup when analyzing scheduling delays. Update 'perf sched record' to collect sched_waking events if it exists and fallback to sched_wakeup if it does not. Similarly, update timehist command to skip sched_wakeup events if the session includes sched_waking (ie., sched_waking is preferred over sched_wakeup). Signed-off-by: David Ahern <dsahern@kernel.org> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Link: http://lore.kernel.org/lkml/20200807164844.44870-1-dsahern@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
3b7313f2 |
|
04-May-2020 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
perf sched: Rename perf_evsel__*() operating on 'struct evsel *' to evsel__*() As those is a 'struct evsel' methods, not part of tools/lib/perf/, aka libperf, to whom the perf_ prefix belongs. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
efc0cdc9 |
|
29-Apr-2020 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
perf evsel: Rename perf_evsel__{str,int}val() and other tracepoint field metehods to to evsel__*() As those are not 'struct evsel' methods, not part of tools/lib/perf/, aka libperf, to whom the perf_ prefix belongs. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
8ab2e96d |
|
29-Apr-2020 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
perf evsel: Rename *perf_evsel__*name() to *evsel__*name() As they are 'struct evsel' methods or related routines, not part of tools/lib/perf/, aka libperf, to whom the perf_ prefix belongs. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
c30d630d |
|
04-Dec-2019 |
David Ahern <dsahern@gmail.com> |
perf sched timehist: Add support for filtering on CPU Allow user to limit output to one or more CPUs. Really helpful on systems with a large number of cpus. Committer testing: # perf sched record -a sleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 1.765 MB perf.data (1412 samples) ] [root@quaco ~]# perf sched timehist | head Samples do not have callchains. time cpu task name wait time sch delay run time [tid/pid] (msec) (msec) (msec) --------------- ------ ------------------------------ --------- --------- --------- 66307.802686 [0000] perf[13086] 0.000 0.000 0.000 66307.802700 [0000] migration/0[12] 0.000 0.001 0.014 66307.802766 [0001] perf[13086] 0.000 0.000 0.000 66307.802774 [0001] migration/1[15] 0.000 0.001 0.007 66307.802841 [0002] perf[13086] 0.000 0.000 0.000 66307.802849 [0002] migration/2[20] 0.000 0.001 0.008 66307.802913 [0003] perf[13086] 0.000 0.000 0.000 # # perf sched timehist --cpu 2 | head Samples do not have callchains. time cpu task name wait time sch delay run time [tid/pid] (msec) (msec) (msec) --------------- ------ ------------------------------ --------- --------- --------- 66307.802841 [0002] perf[13086] 0.000 0.000 0.000 66307.802849 [0002] migration/2[20] 0.000 0.001 0.008 66307.964485 [0002] <idle> 0.000 0.000 161.635 66307.964811 [0002] CPU 0/KVM[3589/3561] 0.000 0.056 0.325 66307.965477 [0002] <idle> 0.325 0.000 0.666 66307.965553 [0002] CPU 0/KVM[3589/3561] 0.666 0.024 0.076 66307.966456 [0002] <idle> 0.076 0.000 0.903 # Signed-off-by: David Ahern <dsahern@gmail.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lore.kernel.org/lkml/20191204173925.66976-1-dsahern@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
5f0fef8a |
|
03-Nov-2019 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
perf callchain: Use 'struct map_symbol' in 'struct callchain_cursor_node' To ease passing around map+symbol, just like done for other parts of the tree recently. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
ca125277 |
|
24-Sep-2019 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
perf evsel: Introduce evsel_fprintf.h We already had evsel_fprintf.c, add its counterpart, so that we can reduce evsel.h a bit more. We needed a new perf_event_attr_fprintf.c file so as to have a separate object to link with the python binding in tools/perf/util/python-ext-sources and not drag symbol_conf, etc into the python binding. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-06bdmt1062d9unzgqmxwlv88@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
9620bc36 |
|
25-Sep-2019 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
perf evsel: Remove need for symbol_conf in evsel_fprintf.c So that we an later link it to the python binding without having to drag the symbol object files. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-8823tveyasocnuoelq4qopwf@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
6ef81c55 |
|
21-Aug-2019 |
Mamatha Inamdar <mamatha4@linux.vnet.ibm.com> |
perf session: Return error code for perf_session__new() function on failure This patch is to return error code of perf_new_session function on failure instead of NULL. Test Results: Before Fix: $ perf c2c report -input failed to open nput: No such file or directory $ echo $? 0 $ After Fix: $ perf c2c report -input failed to open nput: No such file or directory $ echo $? 254 $ Committer notes: Fix 'perf tests topology' case, where we use that TEST_ASSERT_VAL(..., session), i.e. we need to pass zero in case of failure, which was the case before when NULL was returned by perf_session__new() for failure, but now we need to negate the result of IS_ERR(session) to respect that TEST_ASSERT_VAL) expectation of zero meaning failure. Reported-by: Nageswara R Sastry <rnsastry@linux.vnet.ibm.com> Signed-off-by: Mamatha Inamdar <mamatha4@linux.vnet.ibm.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Nageswara R Sastry <rnsastry@linux.vnet.ibm.com> Acked-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Reviewed-by: Jiri Olsa <jolsa@redhat.com> Reviewed-by: Mukesh Ojha <mojha@codeaurora.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexey Budankov <alexey.budankov@linux.intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jeremie Galarneau <jeremie.galarneau@efficios.com> Cc: Kate Stewart <kstewart@linuxfoundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Shawn Landden <shawn@git.icu> Cc: Song Liu <songliubraving@fb.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tzvetomir Stoyanov <tstoyanov@vmware.com> Link: http://lore.kernel.org/lkml/20190822071223.17892.45782.stgit@localhost.localdomain Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
f12be047 |
|
18-Sep-2019 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
perf sched: Add missing event.h include directive We use what is defined there, were getting it by luck, indirectly, fix it. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-e1cdt9557ctpvs3jb9c16qe6@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
87ffb6c6 |
|
10-Sep-2019 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
perf env: Remove needless cpumap.h header Only a 'struct perf_cmp_map' forward allocation is necessary, fix the places that need the header but were getting it indirectly, by luck, from env.h. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-3sj3n534zghxhk7ygzeaqlx9@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
fa0d9846 |
|
29-Aug-2019 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
perf tools: Remove needless evlist.h include directives Remove the last unneeded use of cache.h in a header, we can check where it is really needed, i.e. we can remove it and be sure that it isn't being obtained indirectly. This is an old file, used by now incorrectly in many places, so it was providing includes needed indirectly, fixup this fallout. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-3x3l8gihoaeh7714os861ia7@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
91854f9a |
|
29-Aug-2019 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
perf tools: Move everything related to sys_perf_event_open() to perf-sys.h And remove unneeded include directives from perf-sys.h to prune the header dependency tree. Fixup the fallout in places where definitions were being used without the needed include directives that were being satisfied because they were in perf-sys.h. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-7b1zvugiwak4ibfa3j6ott7f@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
5290ed69 |
|
25-Aug-2019 |
Jiri Olsa <jolsa@kernel.org> |
libperf: Add PERF_RECORD_LOST 'struct lost_event' to perf/event.h Move the lost_event event definition to libperf's event.h header include. In order to keep libperf simple, we switch 'u64/u32/u16/u8' types used events to their generic '__u*' versions. Perf added 'u*' types mainly to ease up printing __u64 values as stated in the linux/types.h comment: /* * We define u64 as uint64_t for every architecture * so that we can print it with "%"PRIx64 without getting warnings. * * typedef __u64 u64; * typedef __s64 s64; */ Add and use new PRI_lu64 and PRI_lx64 macros for that. Use extra '_' to ease up the reading and differentiate them from standard PRI*64 macros. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20190825181752.722-7-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
9c3516d1 |
|
21-Jul-2019 |
Jiri Olsa <jolsa@kernel.org> |
libperf: Add perf_cpu_map__new()/perf_cpu_map__read() functions Moving the following functions from tools/perf: cpu_map__new() cpu_map__read() to libperf with the following names: perf_cpu_map__new() perf_cpu_map__read() Committer notes: Fixed up this one: tools/perf/arch/arm/util/cs-etm.c Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexey Budankov <alexey.budankov@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20190721112506.12306-44-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
ce9036a6 |
|
21-Jul-2019 |
Jiri Olsa <jolsa@kernel.org> |
libperf: Include perf_evlist in evlist object Include perf_evlist in the evlist object, will continue to move other generic things into libperf's perf_evlist. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexey Budankov <alexey.budankov@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20190721112506.12306-37-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
b27c4ece |
|
21-Jul-2019 |
Jiri Olsa <jolsa@kernel.org> |
libperf: Include perf_evsel in evsel object Including perf_evsel in evsel object, will continue to move other generic things into libperf's perf_evsel struct. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexey Budankov <alexey.budankov@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20190721112506.12306-36-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
63503dba |
|
21-Jul-2019 |
Jiri Olsa <jolsa@kernel.org> |
perf evlist: Rename struct perf_evlist to struct evlist Rename struct perf_evlist to struct evlist, so we don't have a name clash when we add struct perf_evlist in libperf. Committer notes: Added fixes to build on arm64, from Jiri and from me (tools/perf/util/cs-etm.c) Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexey Budankov <alexey.budankov@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20190721112506.12306-6-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
32dcd021 |
|
21-Jul-2019 |
Jiri Olsa <jolsa@kernel.org> |
perf evsel: Rename struct perf_evsel to struct evsel Rename struct perf_evsel to struct evsel, so we don't have a name clash when we add struct perf_evsel in libperf. Committer notes: Added fixes for arm64, provided by Jiri. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexey Budankov <alexey.budankov@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20190721112506.12306-5-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
9749b90e |
|
21-Jul-2019 |
Jiri Olsa <jolsa@kernel.org> |
perf tools: Rename struct thread_map to struct perf_thread_map Rename struct thread_map to struct perf_thread_map, so it could be part of libperf. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexey Budankov <alexey.budankov@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20190721112506.12306-4-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
f854839b |
|
21-Jul-2019 |
Jiri Olsa <jolsa@kernel.org> |
perf cpu_map: Rename struct cpu_map to struct perf_cpu_map Rename struct cpu_map to struct perf_cpu_map, so it could be part of libperf. Committer notes: Added fixes for arm64, provided by Jiri. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexey Budankov <alexey.budankov@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20190721112506.12306-3-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
7f7c536f |
|
04-Jul-2019 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
tools lib: Adopt zalloc()/zfree() from tools/perf Eroding a bit more the tools/perf/util/util.h hodpodge header. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-natazosyn9rwjka25tvcnyi0@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
3052ba56 |
|
25-Jun-2019 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
tools perf: Move from sane_ctype.h obtained from git to the Linux's original We got the sane_ctype.h headers from git and kept using it so far, but since that code originally came from the kernel sources to the git sources, perhaps its better to just use the one in the kernel, so that we can leverage tools/perf/check_headers.sh to be notified when our copy gets out of sync, i.e. when fixes or goodies are added to the code we've copied. This will help with things like tools/lib/string.c where we want to have more things in common with the kernel, such as strim(), skip_spaces(), etc so as to go on removing the things that we have in tools/perf/util/ and instead using the code in the kernel, indirectly and removing things like EXPORT_SYMBOL(), etc, getting notified when fixes and improvements are made to the original code. Hopefully this also should help with reducing the difference of code hosted in tools/ to the one in the kernel proper. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-7k9868l713wqtgo01xxygn12@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
6a9fa4e3 |
|
25-Jun-2019 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
perf string: Move 'dots' and 'graph_dotted_line' out of sane_ctype.h Those are not in that file in the git repo, lets move it from there so that we get that sane ctype code fully isolated to allow getting it in sync either with the git sources or better with the kernel sources (include/linux/ctype.h + lib/ctype.h), that way we can use check_headers.h to get notified when changes are made in the original code so that we can cherry-pick. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-ioh5sghn3943j0rxg6lb2dgs@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
2d4f2799 |
|
21-Feb-2019 |
Jiri Olsa <jolsa@kernel.org> |
perf data: Add global path holder Add a 'path' member to 'struct perf_data'. It will keep the configured path for the data (const char *). The path in struct perf_data_file is now dynamically allocated (duped) from it. This scheme is useful/used in following patches where struct perf_data::path holds the 'configure' directory path and struct perf_data_file::path holds the allocated path for specific files. Also it actually makes the code little simpler. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexey Budankov <alexey.budankov@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/20190221094145.9151-3-jolsa@kernel.org [ Fixup data-convert-bt.c missing conversion ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
cb4c13a5 |
|
06-Dec-2018 |
Davidlohr Bueso <dave@stgolabs.net> |
perf sched: Use cached rbtrees At the cost of an extra pointer, we can avoid the O(logN) cost of finding the first element in the tree (smallest node), which is something heavily required for perf-sched. Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lkml.kernel.org/r/20181206191819.30182-8-dave@stgolabs.net Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
49b8e2be |
|
02-Nov-2018 |
Rasmus Villemoes <linux@rasmusvillemoes.dk> |
perf tools: Replace automatic const char[] variables by statics An automatic const char[] variable gets initialized at runtime, just like any other automatic variable. For long strings, that uses a lot of stack and wastes time building the string; e.g. for the "No %s allocation events..." case one has: 444516: 48 b8 4e 6f 20 25 73 20 61 6c movabs $0x6c61207325206f4e,%rax # "No %s al" ... 444674: 48 89 45 80 mov %rax,-0x80(%rbp) 444678: 48 b8 6c 6f 63 61 74 69 6f 6e movabs $0x6e6f697461636f6c,%rax # "location" 444682: 48 89 45 88 mov %rax,-0x78(%rbp) 444686: 48 b8 20 65 76 65 6e 74 73 20 movabs $0x2073746e65766520,%rax # " events " 444690: 66 44 89 55 c4 mov %r10w,-0x3c(%rbp) 444695: 48 89 45 90 mov %rax,-0x70(%rbp) 444699: 48 b8 66 6f 75 6e 64 2e 20 20 movabs $0x20202e646e756f66,%rax Make them all static so that the compiler just references objects in .rodata. Committer testing: Ok, using dwarves's codiff tool: $ codiff --functions /tmp/perf.before ~/bin/perf builtin-sched.c: cmd_sched | -48 1 function changed, 48 bytes removed, diff: -48 builtin-report.c: cmd_report | -32 1 function changed, 32 bytes removed, diff: -32 builtin-kmem.c: cmd_kmem | -64 build_alloc_func_list | -50 2 functions changed, 114 bytes removed, diff: -114 builtin-c2c.c: perf_c2c__report | -390 1 function changed, 390 bytes removed, diff: -390 ui/browsers/header.c: tui__header_window | -104 1 function changed, 104 bytes removed, diff: -104 /home/acme/bin/perf: 9 functions changed, 688 bytes removed, diff: -688 Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Acked-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20181102230624.20064-1-linux@rasmusvillemoes.dk Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
4c50563d |
|
28-May-2018 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
perf sched: Use sched->show_callchain where appropriate Instead of using symbol_conf.use_callchain, reducing its usage a bit more. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-edgwb1b2mpbrdeg0w64wp7ms@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
27de9b2b |
|
28-May-2018 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
perf evsel: Add has_callchain() helper to make code more compact/clear Its common to have the (evsel->attr.sample_type & PERF_SAMPLE_CALLCHAIN), so add an evsel__has_callchain(evsel) helper. This will actually get more uses as we check that instead of symbol_conf.use_callchain in places where that produces the same result but makes this decision to be more fine grained, per evsel. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-145340oytbthatpfeaq1do18@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
99a3c3a9 |
|
05-Mar-2018 |
Changbin Du <changbin.du@intel.com> |
perf sched map: Re-annotate shortname if thread comm changed This is to show the real name of thread that created via fork-exec. See below example for shortname *A0*. $ sudo ./perf sched map *A0 80393.050639 secs A0 => perf:22368 *. A0 80393.050748 secs . => swapper:0 . *. 80393.050887 secs *B0 . . 80393.052735 secs B0 => rcu_sched:8 *. . . 80393.052743 secs . *C0 . 80393.056264 secs C0 => kworker/2:1H:287 . *A0 . 80393.056270 secs . *D0 . 80393.056769 secs D0 => ksoftirqd/2:22 - . *A0 . 80393.056804 secs + . *A0 . 80393.056804 secs A0 => pi:22368 . *. . 80393.056854 secs *B0 . . 80393.060727 secs ... Signed-off-by: Changbin Du <changbin.du@intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1520307457-23668-3-git-send-email-changbin.du@intel.com [ Optimally pack struct thread_runtime when adding the new bool member ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
8640da9f |
|
05-Mar-2018 |
Changbin Du <changbin.du@intel.com> |
perf sched: Move thread::shortname to thread_runtime The thread::shortname only used by sched command, so move it to sched private structure. Signed-off-by: Changbin Du <changbin.du@intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1520307457-23668-2-git-send-email-changbin.du@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
b2441318 |
|
01-Nov-2017 |
Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
License cleanup: add SPDX GPL-2.0 license identifier to files with no license Many source files in the tree are missing licensing information, which makes it harder for compliance tools to determine the correct license. By default all files without license information are under the default license of the kernel, which is GPL version 2. Update the files which contain no license information with the 'GPL-2.0' SPDX license identifier. The SPDX identifier is a legally binding shorthand, which can be used instead of the full boiler plate text. This patch is based on work done by Thomas Gleixner and Kate Stewart and Philippe Ombredanne. How this work was done: Patches were generated and checked against linux-4.14-rc6 for a subset of the use cases: - file had no licensing information it it. - file was a */uapi/* one with no licensing information in it, - file was a */uapi/* one with existing licensing information, Further patches will be generated in subsequent months to fix up cases where non-standard license headers were used, and references to license had to be inferred by heuristics based on keywords. The analysis to determine which SPDX License Identifier to be applied to a file was done in a spreadsheet of side by side results from of the output of two independent scanners (ScanCode & Windriver) producing SPDX tag:value files created by Philippe Ombredanne. Philippe prepared the base worksheet, and did an initial spot review of a few 1000 files. The 4.13 kernel was the starting point of the analysis with 60,537 files assessed. Kate Stewart did a file by file comparison of the scanner results in the spreadsheet to determine which SPDX license identifier(s) to be applied to the file. She confirmed any determination that was not immediately clear with lawyers working with the Linux Foundation. Criteria used to select files for SPDX license identifier tagging was: - Files considered eligible had to be source code files. - Make and config files were included as candidates if they contained >5 lines of source - File already had some variant of a license header in it (even if <5 lines). All documentation files were explicitly excluded. The following heuristics were used to determine which SPDX license identifiers to apply. - when both scanners couldn't find any license traces, file was considered to have no license information in it, and the top level COPYING file license applied. For non */uapi/* files that summary was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 11139 and resulted in the first patch in this series. If that file was a */uapi/* path one, it was "GPL-2.0 WITH Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 WITH Linux-syscall-note 930 and resulted in the second patch in this series. - if a file had some form of licensing information in it, and was one of the */uapi/* ones, it was denoted with the Linux-syscall-note if any GPL family license was found in the file or had no licensing in it (per prior point). Results summary: SPDX license identifier # files ---------------------------------------------------|------ GPL-2.0 WITH Linux-syscall-note 270 GPL-2.0+ WITH Linux-syscall-note 169 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17 LGPL-2.1+ WITH Linux-syscall-note 15 GPL-1.0+ WITH Linux-syscall-note 14 ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5 LGPL-2.0+ WITH Linux-syscall-note 4 LGPL-2.1 WITH Linux-syscall-note 3 ((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3 ((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1 and that resulted in the third patch in this series. - when the two scanners agreed on the detected license(s), that became the concluded license(s). - when there was disagreement between the two scanners (one detected a license but the other didn't, or they both detected different licenses) a manual inspection of the file occurred. - In most cases a manual inspection of the information in the file resulted in a clear resolution of the license that should apply (and which scanner probably needed to revisit its heuristics). - When it was not immediately clear, the license identifier was confirmed with lawyers working with the Linux Foundation. - If there was any question as to the appropriate license identifier, the file was flagged for further research and to be revisited later in time. In total, over 70 hours of logged manual review was done on the spreadsheet to determine the SPDX license identifiers to apply to the source files by Kate, Philippe, Thomas and, in some cases, confirmation by lawyers working with the Linux Foundation. Kate also obtained a third independent scan of the 4.13 code base from FOSSology, and compared selected files where the other two scanners disagreed against that SPDX file, to see if there was new insights. The Windriver scanner is based on an older version of FOSSology in part, so they are related. Thomas did random spot checks in about 500 files from the spreadsheets for the uapi headers and agreed with SPDX license identifier in the files he inspected. For the non-uapi files Thomas did random spot checks in about 15000 files. In initial set of patches against 4.14-rc6, 3 files were found to have copy/paste license identifier errors, and have been fixed to reflect the correct identifier. Additionally Philippe spent 10 hours this week doing a detailed manual inspection and review of the 12,461 patched files from the initial patch version early this week with: - a full scancode scan run, collecting the matched texts, detected license ids and scores - reviewing anything where there was a license detected (about 500+ files) to ensure that the applied SPDX license was correct - reviewing anything where there was no detection but the patch license was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied SPDX license was correct This produced a worksheet with 20 files needing minor correction. This worksheet was then exported into 3 different .csv files for the different types of files to be modified. These .csv files were then reviewed by Greg. Thomas wrote a script to parse the csv files and add the proper SPDX tag to the file, in the format that the file expected. This script was further refined by Greg based on the output to detect more types of files automatically and to distinguish between header and source .c files (which need different comment types.) Finally Greg ran the script using the .csv files to generate the patches. Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
#
eae8ad80 |
|
23-Jan-2017 |
Jiri Olsa <jolsa@kernel.org> |
perf tools: Add struct perf_data_file Add struct perf_data_file to represent a single file within a perf_data struct. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Andi Kleen <andi@firstfloor.org> Cc: Changbin Du <changbin.du@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-c3f9p4xzykr845ktqcek6p4t@git.kernel.org [ Fixup recent changes in 'perf script --per-event-dump' ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
8ceb41d7 |
|
23-Jan-2017 |
Jiri Olsa <jolsa@kernel.org> |
perf tools: Rename struct perf_data_file to perf_data Rename struct perf_data_file to perf_data, because we will add the possibility to have multiple files under perf.data, so the 'perf_data' name fits better. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Andi Kleen <andi@firstfloor.org> Cc: Changbin Du <changbin.du@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-39wn4d77phel3dgkzo3lyan0@git.kernel.org [ Fixup recent changes in 'perf script --per-event-dump' ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
0f59d7a3 |
|
01-Sep-2017 |
David Ahern <dsahern@gmail.com> |
perf sched timehist: Add pid and tid options Add options to only show event for specific pid(s) and tid(s). Signed-off-by: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lkml.kernel.org/r/1504288152-19690-1-git-send-email-dsahern@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
62d94b00 |
|
27-Jun-2017 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
perf tools: Replace error() with pr_err() To consolidate the error reporting facility. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-b41iot1094katoffdf19w9zk@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
a43783ae |
|
18-Apr-2017 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
perf tools: Include errno.h where needed Removing it from util.h, part of an effort to disentangle the includes hell, that makes changes to util.h or something included by it to cause a complete rebuild of the tools. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-ztrjy52q1rqcchuy3rubfgt2@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
3d689ed6 |
|
17-Apr-2017 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
perf tools: Move sane ctype stuff from util.h to sane_ctype.h More stuff that came from git, out of the hodge-podge that is util.h Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-e3lana4gctz3ub4hn4y29hkw@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
fd20e811 |
|
17-Apr-2017 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
perf tools: Including missing inttypes.h header Needed to use the PRI[xu](32,64) formatting macros. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-wkbho8kaw24q67dd11q0j39f@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
877a7a11 |
|
17-Apr-2017 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
perf tools: Add include <linux/kernel.h> where ARRAY_SIZE() is used To pave the way for further cleanups where linux/kernel.h may stop being included in some header. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-qqxan6tfsl6qx3l0v3nwgjvk@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
b0ad8ea6 |
|
27-Mar-2017 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
perf tools: Remove unused 'prefix' from builtin functions We got it from the git sources but never used it for anything, with the place where this would be somehow used remaining: static int run_builtin(struct cmd_struct *p, int argc, const char **argv) { prefix = NULL; if (p->option & RUN_SETUP) prefix = NULL; /* setup_perf_directory(); */ Ditch it. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-uw5swz05vol0qpr32c5lpvus@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
292c4a8f |
|
13-Mar-2017 |
Brendan Gregg <bgregg@netflix.com> |
perf sched timehist: Add --next option The --next option shows the next task for each context switch, providing more context for the sequence of scheduler events. $ perf sched timehist --next | head Samples do not have callchains. time cpu task name waittime schdelay run time [tid/pid] (msec) (msec) (msec) ---------- --- ---------- --------- ------ ----- 374.793792 [0] <idle> 0.000 0.000 0.000 next: rngd[1524] 374.793801 [0] rngd[1524] 0.000 0.000 0.009 next: swapper/0[0] 374.794048 [7] <idle> 0.000 0.000 0.000 next: yes[30884] 374.794066 [7] yes[30884] 0.000 0.000 0.018 next: swapper/7[0] 374.794126 [2] <idle> 0.000 0.000 0.000 next: rngd[1524] 374.794140 [2] rngd[1524] 0.325 0.006 0.013 next: swapper/2[0] 374.794281 [3] <idle> 0.000 0.000 0.000 next: perf[31070] Signed-off-by: Brendan Gregg <bgregg@netflix.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1489456589-32555-1-git-send-email-bgregg@netflix.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
f3b3614a |
|
07-Mar-2017 |
Hari Bathini <hbathini@linux.vnet.ibm.com> |
perf tools: Add PERF_RECORD_NAMESPACES to include namespaces related info Introduce a new option to record PERF_RECORD_NAMESPACES events emitted by the kernel when fork, clone, setns or unshare are invoked. And update perf-record documentation with the new option to record namespace events. Committer notes: Combined it with a later patch to allow printing it via 'perf report -D' and be able to test the feature introduced in this patch. Had to move here also perf_ns__name(), that was introduced in another later patch. Also used PRIu64 and PRIx64 to fix the build in some enfironments wrt: util/event.c:1129:39: error: format '%lx' expects argument of type 'long unsigned int', but argument 6 has type 'long long unsigned int' [-Werror=format=] ret += fprintf(fp, "%u/%s: %lu/0x%lx%s", idx ^ Testing it: # perf record --namespaces -a ^C[ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 1.083 MB perf.data (423 samples) ] # # perf report -D <SNIP> 3 2028902078892 0x115140 [0xa0]: PERF_RECORD_NAMESPACES 14783/14783 - nr_namespaces: 7 [0/net: 3/0xf0000081, 1/uts: 3/0xeffffffe, 2/ipc: 3/0xefffffff, 3/pid: 3/0xeffffffc, 4/user: 3/0xeffffffd, 5/mnt: 3/0xf0000000, 6/cgroup: 3/0xeffffffb] 0x1151e0 [0x30]: event: 9 . . ... raw event: size 48 bytes . 0000: 09 00 00 00 02 00 30 00 c4 71 82 68 0c 7f 00 00 ......0..q.h.... . 0010: a9 39 00 00 a9 39 00 00 94 28 fe 63 d8 01 00 00 .9...9...(.c.... . 0020: 03 00 00 00 00 00 00 00 ce c4 02 00 00 00 00 00 ................ <SNIP> NAMESPACES events: 1 <SNIP> # Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Starovoitov <ast@fb.com> Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com> Cc: Aravinda Prasad <aravinda@linux.vnet.ibm.com> Cc: Brendan Gregg <brendan.d.gregg@gmail.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sargun Dhillon <sargun@sargun.me> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/148891930386.25309.18412039920746995488.stgit@hbathini.in.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
bb963e16 |
|
17-Feb-2017 |
Namhyung Kim <namhyung@kernel.org> |
perf utils: Check verbose flag properly It now can have negative value to suppress the message entirely. So it needs to check it being positive. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: kernel-team@lge.com Link: http://lkml.kernel.org/r/20170217081742.17417-3-namhyung@kernel.org [ Adjust fuzz on tools/perf/util/pmu.c, add > 0 checks in many other places ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
a7c3899c |
|
13-Feb-2017 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
perf symbols: No need to check if sym->name is NULL As it is an array, so will always evaluate to 'true', as reported by clang: builtin-sched.c:2070:19: error: address of array 'sym->name' will always evaluate to 'true' [-Werror,-Wpointer-bool-conversion] if (sym && sym->name) { ~~ ~~~~~^~~~ 1 warning generated. So just ditch all those useless checks. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-ydpm927col06paixb775jjx5@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
587782c5 |
|
13-Jan-2017 |
Namhyung Kim <namhyung@kernel.org> |
perf sched timehist: Show total wait times for summary When --state option is given, the summary will show total run, sleep, iowait, preempt and delay time instead of statistics of runtime. $ perf sched timehist -s --state Wait-time summary comm parent sched-in run-time sleep iowait preempt delay (count) (msec) (msec) (msec) (msec) (msec) --------------------------------------------------------------------- systemd[1] 0 3 0.497 1.685 0.000 0.000 0.061 ksoftirqd/0[3] 2 21 0.434 989.948 0.000 0.000 0.325 rcu_preempt[7] 2 28 0.386 993.211 0.000 0.000 0.712 migration/0[10] 2 12 0.126 50.174 0.000 0.000 0.044 watchdog/0[11] 2 1 0.009 0.000 0.000 0.000 0.016 migration/1[13] 2 2 0.029 11.755 0.000 0.000 0.007 <SNIP> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20170113104523.31212-3-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
414e050c |
|
13-Jan-2017 |
Namhyung Kim <namhyung@kernel.org> |
perf sched timehist: Add --state option The --state option is to show task state when switched out. The state is printed as a single character like in the /proc but I added 'I' for idle state rather than 'R'. $ perf sched timehist --state | head Samples do not have callchains. time cpu task name wait time sch delay run time state [tid/pid] (msec) (msec) (msec) -------- --- ----------------------- -------- ------------------ ----- 1.753791 [3] <idle> 0.000 0.000 0.000 I 1.753834 [1] perf[27469] 0.000 0.000 0.000 S 1.753904 [3] perf[27470] 0.000 0.006 0.112 S 1.753914 [1] <idle> 0.000 0.000 0.079 I 1.753915 [3] migration/3[23] 0.000 0.002 0.011 S 1.754287 [2] <idle> 0.000 0.000 0.000 I 1.754335 [2] transmission[1773/1739] 0.000 0.004 0.047 S Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20170113104523.31212-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
941bdea7 |
|
13-Jan-2017 |
Namhyung Kim <namhyung@kernel.org> |
perf sched timehist: Account thread wait time separately Separate thread wait time into 3 parts - sleep, iowait and preempt based on the prev_state of the last event. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20170113104523.31212-1-namhyung@kernel.org [ Fix the build on centos:5 where 'wait' shadows a global declaration ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
9396c9cb |
|
21-Dec-2016 |
Namhyung Kim <namhyung@kernel.org> |
perf sched timehist: Show total scheduling time Show length of analyzed sample time and rate of idle task running. This also takes care of time range given by --time option. $ perf sched timehist -sI | tail Samples do not have callchains. Idle stats: CPU 0 idle for 930.316 msec ( 92.93%) CPU 1 idle for 963.614 msec ( 96.25%) CPU 2 idle for 885.482 msec ( 88.45%) CPU 3 idle for 938.635 msec ( 93.76%) Total number of unique tasks: 118 Total number of context switches: 2337 Total run time (msec): 3718.048 Total scheduling time (msec): 1001.131 (x 4) Suggested-by: David Ahern <dsahern@gmail.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20161222060350.17655-3-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
bdd75729 |
|
21-Dec-2016 |
Namhyung Kim <namhyung@kernel.org> |
perf sched timehist: Fix invalid period calculation When --time option is given with a value outside recorded time, the last sample time (tprev) was set to that value and run time calculation might be incorrect. This is a problem of the first samples for each cpus since it would skip the runtime update when tprev is 0. But with --time option it had non-zero (which is invalid) value so the calculation is also incorrect. For example, let's see the followging: $ perf sched timehist time cpu task name wait time sch delay run time [tid/pid] (msec) (msec) (msec) --------------- ------ ------------------------------ --------- --------- --------- 3195.968367 [0003] <idle> 0.000 0.000 0.000 3195.968386 [0002] Timer[4306/4277] 0.000 0.000 0.018 3195.968397 [0002] Web Content[4277] 0.000 0.000 0.000 3195.968595 [0001] JS Helper[4302/4277] 0.000 0.000 0.000 3195.969217 [0000] <idle> 0.000 0.000 0.621 3195.969251 [0001] kworker/1:1H[291] 0.000 0.000 0.033 The sample starts at 3195.968367 but when I gave a time interval from 3194 to 3196 (in sec) it will calculate the whole 2 second as runtime. In below, 2 cpus accounted it as runtime, other 2 cpus accounted it as idle time. Before: $ perf sched timehist --time 3194,3196 -s | tail Idle stats: CPU 0 idle for 1995.991 msec CPU 1 idle for 20.793 msec CPU 2 idle for 30.191 msec CPU 3 idle for 1999.852 msec Total number of unique tasks: 23 Total number of context switches: 128 Total run time (msec): 3724.940 After: $ perf sched timehist --time 3194,3196 -s | tail Idle stats: CPU 0 idle for 10.811 msec CPU 1 idle for 20.793 msec CPU 2 idle for 30.191 msec CPU 3 idle for 18.337 msec Total number of unique tasks: 23 Total number of context switches: 128 Total run time (msec): 18.139 Committer notes: Further testing: Before: Idle stats: CPU 0 idle for 229.785 msec CPU 1 idle for 937.944 msec CPU 2 idle for 188.931 msec CPU 3 idle for 986.185 msec After: # perf sched timehist --time 40602,40603 -s | tail Idle stats: CPU 0 idle for 229.785 msec CPU 1 idle for 175.407 msec CPU 2 idle for 188.931 msec CPU 3 idle for 223.657 msec Total number of unique tasks: 68 Total number of context switches: 814 Total run time (msec): 97.688 # for cpu in `seq 0 3` ; do echo -n "CPU $cpu idle for " ; perf sched timehist --time 40602,40603 | grep "\[000${cpu}\].*\<idle\>" | tr -s ' ' | cut -d' ' -f7 | awk '{entries++ ; s+=$1} END {print s " msec (entries: " entries ")"}' ; done CPU 0 idle for 229.721 msec (entries: 123) CPU 1 idle for 175.381 msec (entries: 65) CPU 2 idle for 188.903 msec (entries: 56) CPU 3 idle for 223.61 msec (entries: 102) Difference due to the idle stats being accounted at nanoseconds precision while the <idle> entries in 'perf sched timehist' are trucated at msec.usec. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Fixes: 853b74071110 ("perf sched timehist: Add option to specify time window of interest") Link: http://lkml.kernel.org/r/20161222060350.17655-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
4fa0d1aa |
|
21-Dec-2016 |
Namhyung Kim <namhyung@kernel.org> |
perf sched timehist: Remove hardcoded 'comm_width' check at print_summary Now that the default 'comm_width' value is 30, no need to check that at print_summary, Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20161222060350.17655-1-namhyung@kernel.org [ Split from a larger patch ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
9b8087d7 |
|
21-Dec-2016 |
Namhyung Kim <namhyung@kernel.org> |
perf sched timehist: Enlarge default 'comm_width' Current default value is 20 but it's easily changed to a bigger value as task has a long name and different tid and pid. And it makes the output not aligned. So change it to have a large value as summary shows. Committer notes: Before: # perf sched record ^C # perf sched timehist <SNIP> 40602.770537 [0001] rcuos/2[29] 7.970 0.002 0.020 40602.771512 [0003] <idle> 0.003 0.000 0.986 40602.771586 [0001] <idle> 0.020 0.000 1.049 40602.771606 [0001] qemu-system-x86[3593/3510] 0.000 0.002 0.020 40602.771629 [0003] qemu-system-x86[3510] 0.000 0.003 0.116 40602.771776 [0000] <idle> 0.001 0.000 1.892 <SNIP> After: # perf sched timehist <SNIP> 40602.770537 [0001] rcuos/2[29] 7.970 0.002 0.020 40602.771512 [0003] <idle> 0.003 0.000 0.986 40602.771586 [0001] <idle> 0.020 0.000 1.049 40602.771606 [0001] qemu-system-x86[3593/3510] 0.000 0.002 0.020 40602.771629 [0003] qemu-system-x86[3510] 0.000 0.003 0.116 <SNIP> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20161222060350.17655-1-namhyung@kernel.org [ Split from a larger patch ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
0e6758e8 |
|
21-Dec-2016 |
Namhyung Kim <namhyung@kernel.org> |
perf sched timehist: Honour 'comm_width' when aligning the headers Current default value is 20, but that may change in the future, so make places where we have 20 hardcoded use 'comm_width'. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20161222060350.17655-1-namhyung@kernel.org [ Split from a larger patch ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
ba957ebb |
|
08-Dec-2016 |
Namhyung Kim <namhyung@kernel.org> |
perf sched timehist: Show callchains for idle stat When --idle-hist option is used with --summary, it now shows idle stats with callchains like below: Idle stats by callchain: CPU 0: 902.195 msec Idle time (msec) Count Callchains ---------------- ------- -------------------------------------------------- 370.589 69 futex_wait_queue_me <- futex_wait <- do_futex <- sys_futex <- entry_SYSCALL_64_fastpath 178.799 17 worker_thread <- kthread <- ret_from_fork 128.352 17 schedule_timeout <- rcu_gp_kthread <- kthread <- ret_from_fork 125.111 19 schedule_hrtimeout_range_clock <- schedule_hrtimeout_range <- poll_schedule_timeout <- do_select <- core_sys_select 71.599 50 schedule_hrtimeout_range_clock <- schedule_hrtimeout_range <- poll_schedule_timeout <- do_sys_poll <- sys_poll 23.146 1 rcu_gp_kthread <- kthread <- ret_from_fork 4.510 1 schedule_hrtimeout_range_clock <- schedule_hrtimeout_range <- ep_poll <- sys_epoll_wait <- do_syscall_64 0.085 1 schedule_hrtimeout_range_clock <- schedule_hrtimeout_range <- poll_schedule_timeout <- do_sys_poll <- do_restart_poll ... Committer notes: Extra testing: # uname -a Linux jouet 4.8.8-300.fc25.x86_64 #1 SMP Tue Nov 15 18:10:06 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux 1) Run 'perf sched record -g' 2) Run 'perf sched timehist --idle --summary' <SNIP> Idle stats by callchain: CPU 0: 13456.840 msec Idle time (msec) Count Callchains ---------------- ----- -------------------------------------------------- 5386.637 3283 schedule_hrtimeout_range_clock <- schedule_hrtimeout_range <- poll_schedule_timeout <- do_sys_poll <- sys_poll 2750.238 2299 futex_wait_queue_me <- futex_wait <- do_futex <- sys_futex <- do_syscall_64 1275.672 1287 schedule_hrtimeout_range_clock <- schedule_hrtimeout_range <- ep_poll <- sys_epoll_wait <- entry_SYSCALL_64_fastpath 936.322 452 worker_thread <- kthread <- ret_from_fork 741.311 385 rcu_nocb_kthread <- kthread <- ret_from_fork 729.385 248 schedule_hrtimeout_range_clock <- schedule_hrtimeout_range <- poll_schedule_timeout <- do_sys_poll <- sys_ppoll 365.386 229 irq_thread <- kthread <- ret_from_fork 338.934 265 futex_wait_queue_me <- futex_wait <- do_futex <- sys_futex <- entry_SYSCALL_64_fastpath 219.488 201 schedule_timeout <- rcu_gp_kthread <- kthread <- ret_from_fork 186.839 410 schedule_hrtimeout_range_clock <- schedule_hrtimeout_range <- ep_poll <- sys_epoll_wait <- do_syscall_64 142.541 59 kvm_vcpu_block <- kvm_arch_vcpu_ioctl_run <- kvm_vcpu_ioctl <- do_vfs_ioctl <- sys_ioctl 83.887 92 smpboot_thread_fn <- kthread <- ret_from_fork 62.722 96 do_exit <- do_group_exit <- 0x2a5594 <- entry_SYSCALL_64_fastpath 47.894 83 pipe_wait <- pipe_read <- __vfs_read <- vfs_read <- sys_read 46.554 61 rcu_gp_kthread <- kthread <- ret_from_fork 34.337 21 schedule_timeout <- intel_fbc_work_fn <- process_one_work <- worker_thread <- kthread 29.521 14 schedule_hrtimeout_range_clock <- schedule_hrtimeout_range <- poll_schedule_timeout <- do_select <- core_sys_select 20.274 10 schedule_timeout <- io_schedule_timeout <- bit_wait_io <- __wait_on_bit <- out_of_line_wait_on_bit 15.085 55 schedule_timeout <- unix_stream_read_generic <- unix_stream_recvmsg <- sock_recvmsg <- SYSC_recvfrom <SNIP> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: David Ahern <dsahern@gmail.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20161208144755.16673-7-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
07235f84 |
|
08-Dec-2016 |
Namhyung Kim <namhyung@kernel.org> |
perf sched timehist: Add -I/--idle-hist option The --idle-hist option is to analyze system idle state so which process makes cpu to go idle. If this option is specified, non-idle events will be skipped and processes switching to/from idle will be shown. This option is mostly useful when used with --summary(-only) option. In the idle-time summary view, idle time is accounted to previous thread which is run before idle task. The example output looks like following: Idle-time summary comm parent sched-out idle-time min-idle avg-idle max-idle stddev migrations (count) (msec) (msec) (msec) (msec) % -------------------------------------------------------------------------------------------- rcu_preempt[7] 2 95 550.872 0.011 5.798 23.146 7.63 0 migration/1[16] 2 1 15.558 15.558 15.558 15.558 0.00 0 khugepaged[39] 2 1 3.062 3.062 3.062 3.062 0.00 0 kworker/0:1H[124] 2 2 4.728 0.611 2.364 4.116 74.12 0 systemd-journal[167] 1 1 4.510 4.510 4.510 4.510 0.00 0 kworker/u16:3[558] 2 13 74.737 0.080 5.749 12.960 21.96 0 irq/34-iwlwifi[628] 2 21 118.403 0.032 5.638 23.990 24.00 0 kworker/u17:0[673] 2 1 3.523 3.523 3.523 3.523 0.00 0 dbus-daemon[722] 1 1 6.743 6.743 6.743 6.743 0.00 0 ifplugd[741] 1 1 58.826 58.826 58.826 58.826 0.00 0 wpa_supplicant[1490] 1 1 13.302 13.302 13.302 13.302 0.00 0 wpa_actiond[1492] 1 2 4.064 0.168 2.032 3.896 91.72 0 dockerd[1500] 1 1 0.055 0.055 0.055 0.055 0.00 0 ... Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: David Ahern <dsahern@gmail.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20161208144755.16673-6-namhyung@kernel.org Link: http://lkml.kernel.org/r/20161213080632.19099-2-namhyung@kernel.org [ Merged fix sent by Namhyumg, as posted in the second Link: tag ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
a4b2b6f5 |
|
08-Dec-2016 |
Namhyung Kim <namhyung@kernel.org> |
perf sched timehist: Skip non-idle events when necessary Sometimes it only focuses on idle-related events like upcoming idle-hist feature. In this case we don't want to see other event to reduce noise. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: David Ahern <dsahern@gmail.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20161208144755.16673-5-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
699b5b92 |
|
08-Dec-2016 |
Namhyung Kim <namhyung@kernel.org> |
perf sched timehist: Save callchain when entering idle In order to investigate the idleness reason, it is necessary to keep the callchains when entering idle. This can be identified by the sched:sched_switch event having the next_pid field as 0. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: David Ahern <dsahern@gmail.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20161208144755.16673-4-namhyung@kernel.org Link: http://lkml.kernel.org/r/20161213080632.19099-1-namhyung@kernel.org [ Merged fix from Namhyung, see second Link: tag ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
3bc2fa9c |
|
08-Dec-2016 |
Namhyung Kim <namhyung@kernel.org> |
perf sched timehist: Introduce struct idle_time_data The struct idle_time_data is to keep idle stats with callchains entering to the idle task. The normal thread_runtime calculation is done transparently since it extends the struct thread_runtime. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: David Ahern <dsahern@gmail.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20161208144755.16673-3-namhyung@kernel.org [ Align struct field names ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
96039c7c |
|
08-Dec-2016 |
Namhyung Kim <namhyung@kernel.org> |
perf sched timehist: Split is_idle_sample() The is_idle_sample() function actually does more than determining whether sample come from idle task. Split the callchain part into save_task_callchain() to make it clearer. Also checking prev_pid from trace data looks preferred than just checking sample->pid since it's possible, although rare, to have invalid 0 pid/tid on scheduling an exiting task. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: David Ahern <dsahern@gmail.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20161208144755.16673-2-namhyung@kernel.org [ Remove some needless () in some return statements ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
b336352b |
|
05-Dec-2016 |
Namhyung Kim <namhyung@kernel.org> |
perf sched timehist: Cleanup idle_max_cpu handling It treats the idle_max_cpu little bit confusingly IMHO. Let's make it more straight forward. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: David Ahern <dsahern@gmail.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20161206034010.6499-6-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
5d92d96a |
|
05-Dec-2016 |
Namhyung Kim <namhyung@kernel.org> |
perf sched timehist: Handle zero sample->tid properly Sometimes samples have tid of 0 but non-0 pid. It ends up having a new thread of 0 tid/pid (instead of referring idle task) since tid is used to search matching task. But I guess it's wrong to use 0 as a tid when pid is set. This patch uses tid only if it has a non-zero value or same as pid (of 0). Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20161206034010.6499-4-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
6fa94258 |
|
05-Dec-2016 |
Namhyung Kim <namhyung@kernel.org> |
perf sched: Cleanup option processing The -D/--dump-raw-trace option is in the parent option so no need to repeat it. Also move -f/--force option to parent as it's common to handle data file. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: David Ahern <dsahern@gmail.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20161206034010.6499-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
f45bf8d3 |
|
29-Nov-2016 |
David Ahern <dsahern@gmail.com> |
perf sched timehist: Improve error message when analyzing wrong file Arnaldo reported an unhelpful error message when running perf sched timehist on a file that did not contain sched tracepoints: [root@jouet ~]# perf sched timehist No trace sample to read. Did you call 'perf record -R'? [root@jouet ~]# perf evlist -v cycles:ppp: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|CALLCHAIN|CPU|PERIOD, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1 Change the has_traces check to look for the sched_switch event. Analysis for perf sched timehist requires at least this event. Now when analyzing a file without sched tracepoints you get: root@f21-vbox:/tmp$ perf sched timehist No sched_switch events found. Have you run 'perf sched record'? Signed-off-by: David Ahern <dsahern@gmail.com> Reported-and-Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1480451988-43673-1-git-send-email-dsahern@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
853b7407 |
|
29-Nov-2016 |
David Ahern <dsa@cumulusnetworks.com> |
perf sched timehist: Add option to specify time window of interest Add option to allow user to control analysis window. e.g., collect data for time window and analyze a segment of interest within that window. Committer notes: Testing it: # perf sched record -a usleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 1.593 MB perf.data (25 samples) ] # # perf sched timehist | head -18 Samples do not have callchains. time cpu task name wait time sch delay run time [tid/pid] (msec) (msec) (msec) ------------- ------ --------------- --------- --------- -------- 19818.635579 [0002] <idle> 0.000 0.000 0.000 19818.635613 [0000] perf[9116] 0.000 0.000 0.000 19818.635676 [0000] <idle> 0.000 0.000 0.063 19818.635678 [0000] rcuos/2[29] 0.000 0.002 0.001 19818.635696 [0002] perf[9117] 0.000 0.004 0.116 19818.635702 [0000] <idle> 0.001 0.000 0.024 19818.635709 [0002] migration/2[25] 0.000 0.003 0.012 19818.636263 [0000] usleep[9117] 0.005 0.000 0.560 19818.636316 [0000] <idle> 0.560 0.000 0.053 19818.636358 [0002] <idle> 0.129 0.000 0.649 19818.636358 [0000] usleep[9117] 0.053 0.002 0.042 # # perf sched timehist --time 19818.635696, Samples do not have callchains. time cpu task name wait time sch delay run time [tid/pid] (msec) (msec) (msec) ------------- ------ --------------- -------- --------- --------- 19818.635696 [0002] perf[9117] 0.000 0.120 0.000 19818.635702 [0000] <idle> 0.019 0.000 0.006 19818.635709 [0002] migration/2[25] 0.000 0.003 0.012 19818.636263 [0000] usleep[9117] 0.005 0.000 0.560 19818.636316 [0000] <idle> 0.560 0.000 0.053 19818.636358 [0002] <idle> 0.129 0.000 0.649 19818.636358 [0000] usleep[9117] 0.053 0.002 0.042 # # perf sched timehist --time 19818.635696,19818.635709 Samples do not have callchains. time cpu task name wait time sch delay run time [tid/pid] (msec) (msec) (msec) ------------- ------ --------------- --------- --------- --------- 19818.635696 [0002] perf[9117] 0.000 0.120 0.000 19818.635702 [0000] <idle> 0.019 0.000 0.006 19818.635709 [0002] migration/2[25] 0.000 0.003 0.012 19818.635709 [0000] usleep[9117] 0.005 0.000 0.006 # Signed-off-by: David Ahern <dsahern@gmail.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1480439746-42695-5-git-send-email-dsahern@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
350f54fa |
|
25-Nov-2016 |
David Ahern <dsahern@gmail.com> |
perf sched timehist: Handle cpu migration events Add handlers for sched:sched_migrate_task event. Total number of migrations is added to summary display and -M/--migrations can be used to show migration events. Signed-off-by: David Ahern <dsahern@gmail.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lkml.kernel.org/r/1480091321-35591-1-git-send-email-dsa@cumulusnetworks.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
8388deb3 |
|
23-Nov-2016 |
Namhyung Kim <namhyung@kernel.org> |
perf sched timehist: Enlarge max stack depth by 2 When it records callchains, they will always have 2 scheduler functions (__schedule + schedule or __schedule + preempt_schedule) and get ignored. So it should collect 2 more functions to show the expected number of callchains to user. Committer Notes: Example of final result, using the same perf.data file as in the previous cset comment, but this time redirecting the output of 'perf sched timehist' to a file instead of copy'n'pasting from xterm: [root@jouet experimental]# perf sched timehist > /tmp/bla [root@jouet experimental]# cat /tmp/bla time cpu task name wait time sch delay run time [tid/pid] (msec) (msec) (msec) -------- ---- -------------------- ------ ------ ----- 6.494998 [01] <idle> 0.000 0.000 0.000 6.495027 [02] perf[519] 0.000 0.000 0.000 schedule_hrtimeout_range_clock <- schedule_hrtimeout_range <- poll_schedule_timeout <- do_sys_poll <- sys_poll 6.495096 [03] <idle> 0.000 0.000 0.000 6.495100 [03] rcuos/0[9] 0.000 0.005 0.003 rcu_nocb_kthread <- kthread <- ret_from_fork 6.495113 [01] perf[520] 0.000 0.008 0.114 preempt_schedule_common <- _cond_resched <- wait_for_completion <- stop_one_cpu <- sched_exec <- do_execveat_common.isra.35 6.495121 [00] <idle> 0.000 0.000 0.000 6.495129 [01] migration/1[17] 0.000 0.003 0.016 smpboot_thread_fn <- kthread <- ret_from_fork 6.496085 [02] <idle> 0.000 0.000 1.057 6.496096 [02] kworker/u16:1[31169] 0.000 0.004 0.011 worker_thread <- kthread <- ret_from_fork 6.496096 [03] <idle> 0.003 0.000 0.996 6.496169 [02] <idle> 0.011 0.000 0.072 6.496171 [00] ls[520] 0.008 0.000 1.049 do_exit <- do_group_exit <- [unknown] <- entry_SYSCALL_64_fastpath 6.496172 [03] gnome-terminal-[4391] 0.000 0.003 0.076 schedule_hrtimeout_range_clock <- schedule_hrtimeout_range <- poll_schedule_timeout <- do_sys_poll <- sys_poll Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/20161124011114.7102-3-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
cdeb01bf |
|
23-Nov-2016 |
Namhyung Kim <namhyung@kernel.org> |
perf sched timehist: Mark schedule function in callchains The sched_switch event always captured from the scheduler function. So it'd be great omit them from the callchain. This patch marks the functions to be omitted by later patch. Committer notes: Testing it: Before: [root@jouet experimental]# perf sched record -g ls Dockerfile perf.data x-mips64 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 1.355 MB perf.data (29 samples) ] [root@jouet experimental]# perf sched timehist time cpu task name wait time sch delay run time [tid/pid] (msec) (msec) (msec) ----------- ----- ----------------- ------ ------ ------ 6.494998 [001] <idle> 0.000 0.000 0.000 6.495027 [002] perf[519] 0.000 0.000 0.000 __schedule <- schedule <- schedule_hrtimeout_range_clock <- schedule_hrtimeou 6.495096 [003] <idle> 0.000 0.000 0.000 6.495100 [003] rcuos/0[9] 0.000 0.005 0.003 __schedule <- schedule <- rcu_nocb_kthread <- kthread <- ret_from_fork 6.495113 [001] perf[520] 0.000 0.008 0.114 __schedule <- preempt_schedule_common <- _cond_resched <- wait_for_completion 6.495121 [000] <idle> 0.000 0.000 0.000 6.495129 [001] migration/1[17] 0.000 0.003 0.016 __schedule <- schedule <- smpboot_thread_fn <- kthread <- ret_from_fork 6.496085 [002] <idle> 0.000 0.000 1.057 6.496096 [002] kworker/u16:1[31169] 0.000 0.004 0.011 __schedule <- schedule <- worker_thread <- kthread <- ret_from_fork 6.496096 [003] <idle> 0.003 0.000 0.996 6.496169 [002] <idle> 0.011 0.000 0.072 6.496171 [000] ls[520] 0.008 0.000 1.049 __schedule <- schedule <- do_exit <- do_group_exit <- [unknown] 6.496172 [003] gnome-terminal-[4391] 0.000 0.003 0.076 __schedule <- schedule <- schedule_hrtimeout_range_clock <- schedule_hrtimeo After: [root@jouet experimental]# perf sched timehist time cpu task name wait time sch delay run time [tid/pid] (msec) (msec) (msec) ----------- ----- ----------------- ----- ----- ------ 6.494998 [001] <idle> 0.000 0.000 0.000 6.495027 [002] perf[519] 0.000 0.000 0.000 schedule_hrtimeout_range_clock <- schedule_hrtimeout_range <- poll_schedule_t 6.495096 [003] <idle> 0.000 0.000 0.000 6.495100 [003] rcuos/0[9] 0.000 0.005 0.003 rcu_nocb_kthread <- kthread <- ret_from_fork 6.495113 [001] perf[520] 0.000 0.008 0.114 preempt_schedule_common <- _cond_resched <- wait_for_completion <- stop_one_c 6.495121 [000] <idle> 0.000 0.000 0.000 6.495129 [001] migration/1[17] 0.000 0.003 0.016 smpboot_thread_fn <- kthread <- ret_from_fork 6.496085 [002] <idle> 0.000 0.000 1.057 6.496096 [002] kworker/u16:1[31169] 0.000 0.004 0.011 worker_thread <- kthread <- ret_from_fork 6.496096 [003] <idle> 0.003 0.000 0.996 6.496169 [002] <idle> 0.011 0.000 0.072 6.496171 [000] ls[520] 0.008 0.000 1.049 do_exit <- do_group_exit <- [unknown] 6.496172 [003] gnome-terminal-[4391] 0.000 0.003 0.076 schedule_hrtimeout_range_clock <- schedule_hrtimeout_range <- poll_schedule_ [root@jouet experimental]# Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/20161124011114.7102-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
2d9bbf6e |
|
23-Nov-2016 |
Namhyung Kim <namhyung@kernel.org> |
perf callchain: Add option to skip ignore symbol when printing callchains For tracepoint events, callchains always contain certain functions. Sometimes it'd be better to skip those functions as they have no value. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/20161124011114.7102-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
a407b067 |
|
15-Nov-2016 |
David Ahern <dsahern@gmail.com> |
perf sched timehist: Add -V/--cpu-visual option The -V option provides a visual aid for sched switches by cpu: $ perf sched timehist -V time cpu 0123456789abc task name b/n time sch delay run time [tid/pid] (msec) (msec) (msec) --------------- ------ ------------- -------------------- --------- --------- --------- ... 2412598.429696 [0009] i <idle> 0.000 0.000 0.000 2412598.429767 [0002] s perf[7219] 0.000 0.000 0.000 2412598.429783 [0009] s perf[7220] 0.000 0.006 0.087 2412598.429794 [0010] i <idle> 0.000 0.000 0.000 2412598.429795 [0009] s migration/9[53] 0.000 0.003 0.011 2412598.430370 [0010] s sleep[7220] 0.011 0.000 0.576 2412598.432584 [0003] i <idle> 0.000 0.000 0.000 ... Committer notes: 'i' marks idle time, 's' are scheduler events. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Ingo Molnar <mingo@kernel.org> Acked-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/20161116060634.28477-8-namhyung@kernel.org [ Add documentation based on above commit message ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
6c973c90 |
|
15-Nov-2016 |
David Ahern <dsahern@gmail.com> |
perf sched timehist: Add call graph options If callchains were recorded they are appended to the line with a default stack depth of 5: 1.874569 [0011] gcc[31949] 0.014 0.000 1.148 wait_for_completion_killable <- do_fork <- sys_vfork <- stub_vfork <- __vfork 1.874591 [0010] gcc[31951] 0.000 0.000 0.024 __cond_resched <- _cond_resched <- wait_for_completion <- stop_one_cpu <- sched_exec 1.874603 [0010] migration/10[59] 3.350 0.004 0.011 smpboot_thread_fn <- kthread <- ret_from_fork 1.874604 [0011] <idle> 1.148 0.000 0.035 cpu_startup_entry <- start_secondary 1.874723 [0005] <idle> 0.016 0.000 1.383 cpu_startup_entry <- start_secondary 1.874746 [0005] gcc[31949] 0.153 0.078 0.022 do_wait sys_wait4 <- system_call_fastpath <- __GI___waitpid --no-call-graph can be used to not show the callchains. --max-stack is used to control the number of frames shown (default of 5). -x/--excl options can be used to collapse redundant callchains to get more relevant data on screen. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Ingo Molnar <mingo@kernel.org> Acked-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/20161116060634.28477-7-namhyung@kernel.org [ Add documentation based on above commit message ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
fc1469f1 |
|
15-Nov-2016 |
David Ahern <dsahern@gmail.com> |
perf sched timehist: Add -w/--wakeups option The -w option is to show wakeup events with timehist. $ perf sched timehist -w time cpu task name b/n time sch delay run time [tid/pid] (msec) (msec) (msec) --------------- ------ -------------------- --------- --------- --------- 2412598.429689 [0002] perf[7219] awakened: perf[7220] 2412598.429696 [0009] <idle> 0.000 0.000 0.000 2412598.429767 [0002] perf[7219] 0.000 0.000 0.000 2412598.429780 [0009] perf[7220] awakened: migration/9[53] ... Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Ingo Molnar <mingo@kernel.org> Acked-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/20161116060634.28477-6-namhyung@kernel.org [ Add documentation based on above commit message ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
52df138c |
|
15-Nov-2016 |
David Ahern <dsahern@gmail.com> |
perf sched timehist: Add summary options The -s/--summary option is to show process runtime statistics. And the -S/--with-summary option is to show the stats with the normal output. $ perf sched timehist -s Runtime summary comm parent sched-in run-time min-run avg-run max-run stddev (count) (msec) (msec) (msec) (msec) % --------------------------------------------------------------------------------------------------------- ksoftirqd/0[3] 2 2 0.011 0.004 0.005 0.006 14.87 rcu_preempt[7] 2 11 0.071 0.002 0.006 0.017 20.23 watchdog/0[11] 2 1 0.002 0.002 0.002 0.002 0.00 watchdog/1[12] 2 1 0.004 0.004 0.004 0.004 0.00 ... Terminated tasks: sleep[7220] 7219 3 0.770 0.087 0.256 0.576 62.28 Idle stats: CPU 0 idle for 2352.006 msec CPU 1 idle for 2764.497 msec CPU 2 idle for 2998.229 msec CPU 3 idle for 2967.800 msec Total number of unique tasks: 52 Total number of context switches: 2532 Total run time (msec): 218.036 Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Ingo Molnar <mingo@kernel.org> Acked-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/20161116060634.28477-5-namhyung@kernel.org [ Add documentation from last commit, so that docs comes with the cset that introduces the feature ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
49394a2a |
|
15-Nov-2016 |
David Ahern <dsahern@gmail.com> |
perf sched timehist: Introduce timehist command 'perf sched timehist' provides an analysis of scheduling events. Example usage: perf sched record -- sleep 1 perf sched timehist By default it shows the individual schedule events, including the wait time (time between sched-out and next sched-in events for the task), the task scheduling delay (time between wakeup and actually running) and run time for the task: time cpu task name wait time sch delay run time [tid/pid] (msec) (msec) (msec) -------------- ------ -------------------- --------- --------- --------- 79371.874569 [0011] gcc[31949] 0.014 0.000 1.148 79371.874591 [0010] gcc[31951] 0.000 0.000 0.024 79371.874603 [0010] migration/10[59] 3.350 0.004 0.011 79371.874604 [0011] <idle> 1.148 0.000 0.035 79371.874723 [0005] <idle> 0.016 0.000 1.383 79371.874746 [0005] gcc[31949] 0.153 0.078 0.022 ... Times are in msec.usec. Committer note: Add above explanation as the 'perf sched timehist' entry for 'man perf-sched'. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Ingo Molnar <mingo@kernel.org> Acked-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/20161116060634.28477-4-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
99620a5d |
|
23-Oct-2016 |
Namhyung Kim <namhyung@kernel.org> |
perf tools: Introduce timestamp__scnprintf_usec() Joonwoo reported that there's a mismatch between timestamps in script and sched commands. This was because of difference in printing the timestamp. Factor out the code and share it so that they can be in sync. Also I found that sched map has similar problem, fix it too. Committer notes: Fixed the max_lat_at bug introduced by Namhyung's original patch, as pointed out by Joonwoo, and made it a function following the scnprintf() model, i.e. returning the number of bytes formatted, and receiving as the first parameter the object from where the data to the formatting is obtained, renaming it from: char *timestamp_in_usec(char *bf, size_t size, u64 timestamp) to int timestamp__scnprintf_usec(u64 timestamp, char *bf, size_t size) Reported-by: Joonwoo Park <joonwoop@codeaurora.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20161024020246.14928-3-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
e107f129 |
|
23-Oct-2016 |
Namhyung Kim <namhyung@kernel.org> |
perf sched map: Always show task comm with -v I'd like to see the name of tasks with perf sched map, but it only shows name of new tasks and then use short names after all. This is not good for long running tasks since it's hard for users to track the short names. This patch makes it show the names (except the idle task) when -v option is used. Probably we may make it as default behavior. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20161024020246.14928-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
1208bb27 |
|
23-Oct-2016 |
Namhyung Kim <namhyung@kernel.org> |
perf sched map: Apply cpu color when there's an activity Applying cpu color always doesn't help readability IMHO. Instead it might be better to applying the color when there's an activity on those CPUs. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20161024020246.14928-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
77f02f44 |
|
23-Oct-2016 |
Namhyung Kim <namhyung@kernel.org> |
perf sched: Make common options cascading The -i and -v options can be used in subcommands so enable cascading the sched_options. This fixes the following inconvenience in 'perf sched': $ perf sched -i perf.data.sched map ... (it works well) ... $ perf sched map -i perf.data.sched Error: unknown switch `i' Usage: perf sched map [<options>] --color-cpus <cpus> highlight given CPUs in map --color-pids <pids> highlight given pids in map --compact map output in compact mode --cpus <cpus> display given CPUs in map With this patch, the second command line works with the perf.data.sched data file. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Ahern <dsahern@gmail.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/20161024030003.28534-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
4fc76e49 |
|
07-Aug-2016 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
perf sched: Use linux/time64.h Probably the next step is to introduce linux/time.h and use timespec_to_ns(), etc. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-4nqhskn27fn93cz3ukbc8drf@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
c8b5f2c9 |
|
06-Jul-2016 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
tools: Introduce str_error_r() The tools so far have been using the strerror_r() GNU variant, that returns a string, be it the buffer passed or something else. But that, besides being tricky in cases where we expect that the function using strerror_r() returns the error formatted in a provided buffer (we have to check if it returned something else and copy that instead), breaks the build on systems not using glibc, like Alpine Linux, where musl libc is used. So, introduce yet another wrapper, str_error_r(), that has the GNU interface, but uses the portable XSI variant of strerror_r(), so that users rest asured that the provided buffer is used and it is what is returned. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-d4t42fnf48ytlk8rjxs822tf@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
73643bb6 |
|
12-Apr-2016 |
Jiri Olsa <jolsa@kernel.org> |
perf sched map: Display only given cpus Introducing --cpus option that will display only given cpus. Could be used together with color-cpus option. $ perf sched map --cpus 0,1 *A0 309999.786924 secs A0 => rcu_sched:7 *. 309999.786930 secs *B0 . 309999.786931 secs B0 => rcuos/2:25 B0 *A0 309999.786947 secs Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1460467771-26532-9-git-send-email-jolsa@kernel.org [ Added entry to man page ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
cf294f24 |
|
12-Apr-2016 |
Jiri Olsa <jolsa@kernel.org> |
perf sched map: Color given cpus Adding --color-cpus option to display selected cpus with background color (red by default). It helps on navigating through the perf sched map output. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1460467771-26532-8-git-send-email-jolsa@kernel.org [ Added entry to man page ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
a151a37a |
|
12-Apr-2016 |
Jiri Olsa <jolsa@kernel.org> |
perf sched map: Color given pids Adding --color-pids option to display selected pids in color (blue by default). It helps on navigating through the 'perf sched map' output. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1460467771-26532-7-git-send-email-jolsa@kernel.org [ Added entry to man page ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
8cd91195 |
|
12-Apr-2016 |
Jiri Olsa <jolsa@kernel.org> |
perf sched: Use color_fprintf for output As preparation for next patch. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1460467771-26532-5-git-send-email-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
99623c62 |
|
12-Apr-2016 |
Jiri Olsa <jolsa@kernel.org> |
perf sched: Add compact display option Add compact map display that does not output the whole cpu matrix, only cpus that got event. $ perf sched map --compact *A0 1082427.094098 secs A0 => perf:19404 (CPU 2) A0 *. 1082427.094127 secs . => swapper:0 (CPU 1) A0 . *B0 1082427.094174 secs B0 => rcuos/2:25 (CPU 3) A0 . *. 1082427.094177 secs *C0 . . 1082427.094187 secs C0 => migration/2:21 C0 *A0 . 1082427.094193 secs *. A0 . 1082427.094195 secs *D0 A0 . 1082427.094402 secs D0 => rngd:968 *. A0 . 1082427.094406 secs . *E0 . 1082427.095221 secs E0 => kworker/1:1:5333 . E0 *F0 1082427.095227 secs F0 => xterm:3342 It helps to display sane output for small thread loads on big cpu servers. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1460467771-26532-4-git-send-email-jolsa@kernel.org [ Add entry in 'perf sched' man page ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
4b6ab94e |
|
15-Dec-2015 |
Josh Poimboeuf <jpoimboe@redhat.com> |
perf subcmd: Create subcmd library Move the subcommand-related files from perf to a new library named libsubcmd.a. Since we're moving files anyway, go ahead and rename 'exec_cmd.*' to 'exec-cmd.*' to be consistent with the naming of all the other files. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/c0a838d4c878ab17fee50998811612b2281355c1.1450193761.git.jpoimboe@redhat.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
0014de17 |
|
01-Nov-2015 |
Jiri Olsa <jolsa@kernel.org> |
perf sched latency: Fix thread pid reuse issue The latency subcommand holds a tree of working atoms sorted by thread's pid/tid. If there's new thread with same pid and tid, the old working atom is found and assert bug condition is hit in search function: thread_atoms_search: Assertion `!(thread != atoms->thread)' failed Changing the sort function to use thread object pointers together with pid and tid check. This way new thread will never find old one with same pid/tid. Link: http://lkml.kernel.org/n/tip-o4doazhhv0zax5zshkg8hnys@git.kernel.org Reported-by: Mohit Agrawal <moagrawa@redhat.com> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1446462625-15807-1-git-send-email-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
c7118369 |
|
24-Oct-2015 |
Namhyung Kim <namhyung@kernel.org> |
perf tools: Introduce usage_with_options_msg() Now usage_with_options() setup a pager before printing message so normal printf() or pr_err() will not be shown. The usage_with_options_msg() can be used to print some help message before usage strings. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1445701767-12731-4-git-send-email-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
2f80dd44 |
|
22-May-2015 |
Josef Bacik <jbacik@fb.com> |
perf sched: Add option to merge like comms to lat output Sometimes when debugging large multi-threaded applications it is helpful to collate all of the latency numbers into one bulk record to get an idea of what is going on. This patch does this by merging any entries that belong to the same comm into one entry and then spits out those totals. I've also slightly changed the output so you can see how many threads were merged in the processing. Here is the new default output format ----------------------------------------------------------------------------------------------------------- Task | Runtime ms | Switches | Average delay ms | Maximum delay ms | Maximum delay at | ----------------------------------------------------------------------------------------------------------- chrome:(23) | 740.878 ms | 2612 | avg: 0.022 ms | max: 0.845 ms | max at: 7935.254223 s pulseaudio:1523 | 94.440 ms | 597 | avg: 0.027 ms | max: 0.110 ms | max at: 7934.668372 s threaded-ml:6042 | 72.554 ms | 386 | avg: 0.035 ms | max: 1.186 ms | max at: 7935.330911 s Chrome_IOThread:3832 | 52.388 ms | 456 | avg: 0.021 ms | max: 1.365 ms | max at: 7935.330602 s Chrome_ChildIOT:(7) | 50.694 ms | 743 | avg: 0.021 ms | max: 1.448 ms | max at: 7935.256659 s Compositor:5510 | 30.012 ms | 192 | avg: 0.019 ms | max: 0.131 ms | max at: 7936.636815 s plugin_audio_th:6043 | 24.828 ms | 314 | avg: 0.018 ms | max: 0.143 ms | max at: 7936.205994 s CompositorTileW:(2) | 14.099 ms | 45 | avg: 0.022 ms | max: 0.153 ms | max at: 7937.521800 s the (#) after the task is the number of tasks merged, and then if there were no tasks merged it just shows the pid. Here is the same trace file with the -p option to print the per-pid latency numbers ----------------------------------------------------------------------------------------------------------- Task | Runtime ms | Switches | Average delay ms | Maximum delay ms | Maximum delay at | ----------------------------------------------------------------------------------------------------------- chrome:5500 | 386.872 ms | 387 | avg: 0.023 ms | max: 0.241 ms | max at: 7936.001694 s pulseaudio:1523 | 94.440 ms | 597 | avg: 0.027 ms | max: 0.110 ms | max at: 7934.668372 s threaded-ml:6042 | 72.554 ms | 386 | avg: 0.035 ms | max: 1.186 ms | max at: 7935.330911 s chrome:10226 | 69.710 ms | 251 | avg: 0.023 ms | max: 0.764 ms | max at: 7935.992305 s chrome:4267 | 64.551 ms | 418 | avg: 0.021 ms | max: 0.294 ms | max at: 7937.862427 s chrome:4827 | 62.268 ms | 54 | avg: 0.029 ms | max: 0.666 ms | max at: 7935.992813 s Chrome_IOThread:3832 | 52.388 ms | 456 | avg: 0.021 ms | max: 1.365 ms | max at: 7935.330602 s chrome:3776 | 46.150 ms | 349 | avg: 0.023 ms | max: 0.845 ms | max at: 7935.254223 s Signed-off-by: Josef Bacik <jbacik@fb.com> Acked-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: kernel-team@fb.com Link: http://lkml.kernel.org/r/1432300720-30478-1-git-send-email-jbacik@fb.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
b91fc39f |
|
06-Apr-2015 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
perf machine: Protect the machine->threads with a rwlock In addition to using refcounts for the struct thread lifetime management, we need to protect access to machine->threads from concurrent access. That happens in 'perf top', where a thread processes events, inserting and deleting entries from that rb_tree while another thread decays hist_entries, that end up dropping references and ultimately deleting threads from the rb_tree and releasing its resources when no further hist_entry (or other data structures, like in 'perf sched') references it. So the rule is the same for refcounts + protected trees in the kernel, get the tree lock, find object, bump the refcount, drop the tree lock, return, use object, drop the refcount if no more use of it is needed, keep it if storing it in some other data structure, drop when releasing that data structure. I.e. pair "t = machine__find(new)_thread()" with a "thread__put(t)", and "perf_event__preprocess_sample(&al)" with "addr_location__put(&al)". The addr_location__put() one is because as we return references to several data structures, we may end up adding more reference counting for the other data structures and then we'll drop it at addr_location__put() time. Acked-by: David Ahern <dsahern@gmail.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: Don Zickus <dzickus@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-bs9rt4n0jw3hi9f3zxyy3xln@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
ff5f3bbd |
|
31-Mar-2015 |
Yunlong Song <yunlong.song@huawei.com> |
perf sched replay: Use replay_repeat to calculate the runavg of cpu usage instead of the default value 10 Since sched->replay_repeat is set to 10 as default, the sched->run_avg, sched->runavg_cpu_usage, and sched->runavg_parent_cpu_usage all use 10 to calculate their value. However, the replay_repeat can be changed to other value by using -r option, so the calculation above should use replay_repeat to achieve more accurate results instead of the default value 10. Signed-off-by: Yunlong Song <yunlong.song@huawei.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/1427809596-29559-10-git-send-email-yunlong.song@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
f0dd330f |
|
31-Mar-2015 |
Yunlong Song <yunlong.song@huawei.com> |
perf sched replay: Support using -f to override perf.data file ownership Enable to use perf.data when it is not owned by current user or root. Example: $ ls -al perf.data -rw------- 1 Yunlong.Song Yunlong.Song 5321918 Mar 25 15:14 perf.data $ sudo id uid=0(root) gid=0(root) groups=0(root),64(pkcs11) Before this patch: $ sudo perf sched replay -f run measurement overhead: 98 nsecs sleep measurement overhead: 52909 nsecs the run test took 1000015 nsecs the sleep test took 1054253 nsecs File perf.data not owned by current user or root (use -f to override) As shown above, the -f option does not work at all. After this patch: $ sudo perf sched replay -f run measurement overhead: 221 nsecs sleep measurement overhead: 40514 nsecs the run test took 1000003 nsecs the sleep test took 1056098 nsecs nr_run_events: 10 nr_sleep_events: 1562 nr_wakeup_events: 5 task 0 ( :1: 1), nr_events: 1 task 1 ( :2: 2), nr_events: 1 task 2 ( :3: 3), nr_events: 1 ... ... task 1549 ( :163132: 163132), nr_events: 1 task 1550 ( :163540: 163540), nr_events: 1 task 1551 ( <unknown>: 0), nr_events: 10 ------------------------------------------------------------ #1 : 50.198, ravg: 50.20, cpu: 2335.18 / 2335.18 #2 : 219.099, ravg: 67.09, cpu: 2835.11 / 2385.17 #3 : 238.626, ravg: 84.24, cpu: 3278.26 / 2474.48 #4 : 200.364, ravg: 95.85, cpu: 2977.41 / 2524.77 #5 : 176.882, ravg: 103.96, cpu: 2801.35 / 2552.43 #6 : 191.093, ravg: 112.67, cpu: 2813.70 / 2578.56 #7 : 189.448, ravg: 120.35, cpu: 2809.21 / 2601.62 #8 : 200.637, ravg: 128.38, cpu: 2849.91 / 2626.45 #9 : 248.338, ravg: 140.37, cpu: 4380.61 / 2801.87 #10 : 511.139, ravg: 177.45, cpu: 3077.73 / 2829.45 As shown above, the -f option really works now. Besides for replay, -f option can also work for latency and map. Signed-off-by: Yunlong Song <yunlong.song@huawei.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/1427809596-29559-9-git-send-email-yunlong.song@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
939cda52 |
|
31-Mar-2015 |
Yunlong Song <yunlong.song@huawei.com> |
perf sched replay: Fix the EMFILE error caused by the limitation of the maximum open files The soft maximum number of open files for a calling process is 1024, which is defined as INR_OPEN_CUR in include/uapi/linux/fs.h, and the hard maximum number of open files for a calling process is 4096, which is defined as INR_OPEN_MAX in include/uapi/linux/fs.h. Both INR_OPEN_CUR and INR_OPEN_MAX are used to limit the value of RLIMIT_NOFILE in include/asm-generic/resource.h. And the soft maximum number finally decides the limitation of the maximum files which are allowed to be opened. That is to say a process can use at most 1024 file descriptors for its o pened files, or an EMFILE error will happen. This error can be fixed by increasing the soft maximum number, under the constraint that the soft maximum number can not exceed the hard maximum number, or both soft and hard maximum number should be increased simultaneously with privilege. For perf sched replay, it uses sys_perf_event_open to create the file descriptor for each of the tasks in order to handle information of perf events. That is to say each task needs a unique file descriptor. In x86_64, there may be over 1024 or 4096 tasks correspoinding to the record in perf.data, which causes that no enough file descriptors can be used. As a result, EMFILE error happens and stops the replay process. To solve this problem, we adaptively increase the soft and hard maximum number of open files with a '-f' option. Example: Test environment: x86_64 with 160 cores $ cat /proc/sys/kernel/pid_max 163840 $ cat /proc/sys/fs/file-max 6815744 $ ulimit -Sn 1024 $ ulimit -Hn 4096 Before this patch: $ perf sched replay ... task 1549 ( :163132: 163132), nr_events: 1 task 1550 ( :163540: 163540), nr_events: 1 task 1551 ( <unknown>: 0), nr_events: 10 Error: sys_perf_event_open() syscall returned with -1 (Too many open files) After this patch: $ perf sched replay ... task 1549 ( :163132: 163132), nr_events: 1 task 1550 ( :163540: 163540), nr_events: 1 task 1551 ( <unknown>: 0), nr_events: 10 Error: sys_perf_event_open() syscall returned with -1 (Too many open files) Have a try with -f option $ perf sched replay -f ... task 1549 ( :163132: 163132), nr_events: 1 task 1550 ( :163540: 163540), nr_events: 1 task 1551 ( <unknown>: 0), nr_events: 10 ------------------------------------------------------------ #1 : 54.401, ravg: 54.40, cpu: 3285.21 / 3285.21 #2 : 199.548, ravg: 68.92, cpu: 4999.65 / 3456.66 #3 : 170.483, ravg: 79.07, cpu: 1349.94 / 3245.99 #4 : 192.034, ravg: 90.37, cpu: 1322.88 / 3053.67 #5 : 182.929, ravg: 99.62, cpu: 1406.51 / 2888.96 #6 : 152.974, ravg: 104.96, cpu: 1167.54 / 2716.82 #7 : 155.579, ravg: 110.02, cpu: 2992.53 / 2744.39 #8 : 130.557, ravg: 112.08, cpu: 1126.43 / 2582.59 #9 : 138.520, ravg: 114.72, cpu: 1253.22 / 2449.65 #10 : 134.328, ravg: 116.68, cpu: 1587.95 / 2363.48 Signed-off-by: Yunlong Song <yunlong.song@huawei.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/1427809596-29559-8-git-send-email-yunlong.song@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
1aff59be |
|
31-Mar-2015 |
Yunlong Song <yunlong.song@huawei.com> |
perf sched replay: Handle the dead halt of sem_wait when create_tasks() fails for any task Since there is sem_wait for each task in the wait_for_tasks(), e.g. sem_wait(&task->work_done_sem). The sem_wait can continue only when work_done_sem is greater than 0, or it will be blocked. For perf sched replay, one task may sem_post the work_done_sem of another task, which causes the work_done_sem of that task processed in a reasonable sequence, e.g. sem_post, sem_wait, sem_wait, sem_post... This sequence simulates the sched process of the running tasks at the time when perf sched record runs. As a result, all the tasks are required and their threads must be successfully created. If any one (task A) of the tasks fails to create its thread, then another task (task B), whose work_done_sem needs sem_post from that failed task A, may likely block itself due to seg_wait. And this is a dead halt, since task B's thread_func cannot continue at all. To solve this problem, perf sched replay should exit once any task fails to create its thread. Example: Test environment: x86_64 with 160 cores Before this patch: $ perf sched replay ... Error: sys_perf_event_open() syscall returned with -1 (Too many open files) ------------------------------------------------------------ <- dead halt After this patch: $ perf sched replay ... task 1551 ( <unknown>: 0), nr_events: 10 Error: sys_perf_event_open() syscall returned with -1 (Too many open files) $ As shown above, perf sched replay finishes the process after printing an error message and does not block itself. Signed-off-by: Yunlong Song <yunlong.song@huawei.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/1427809596-29559-7-git-send-email-yunlong.song@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
08097abc |
|
31-Mar-2015 |
Yunlong Song <yunlong.song@huawei.com> |
perf sched replay: Fix the segmentation fault problem caused by pr_err in threads The pr_err in self_open_counters() prints error message to stderr. Unlike stdout, stderr uses memory buffer on the stack of each calling process. The pr_err in self_open_counters() works in a thread called thread_func created in function create_tasks, which concurrently creates sched->nr_tasks threads. If the error happens and pr_err prints the error message in each of these threads, the stack size of the perf process (default is 8192 kbytes) will quickly run out and the segmentation fault will happen then. To solve this problem, pr_err with self_open_counters() should be moved from newly created threads to the old main thread of the perf process. Then the pr_err can work in a stable situation without the strange segmentation fault problem. Example: Test environment: x86_64 with 160 cores Before this patch: $ perf sched replay ... task 1549 ( :163132: 163132), nr_events: 1 task 1550 ( :163540: 163540), nr_events: 1 task 1551 ( <unknown>: 0), nr_events: 10 Segmentation fault After this patch: $ perf sched replay ... task 1549 ( :163132: 163132), nr_events: 1 task 1550 ( :163540: 163540), nr_events: 1 task 1551 ( <unknown>: 0), nr_events: 10 ... As shown above, the result continues without any segmentation fault. Signed-off-by: Yunlong Song <yunlong.song@huawei.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/1427809596-29559-6-git-send-email-yunlong.song@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
3a423a5c |
|
31-Mar-2015 |
Yunlong Song <yunlong.song@huawei.com> |
perf sched replay: Realloc the memory of pid_to_task stepwise to adapt to the different pid_max configurations Although the memory of pid_to_task can be allocated via calloc according to the value of /proc/sys/kernel/pid_max, it cannot handle the case when pid_max is changed after 'perf sched record' has created its perf.data. If the new pid_max configured in 'perf sched replay' is smaller than the old pid_max configured in 'perf sched record', then it will cause the assertion failure problem. To solve this problem, we realloc the memory of pid_to_task stepwise once the passed-in pid parameter in register_pid is larger than the current pid_max. Example: Test environment: x86_64 with 160 cores $ cat /proc/sys/kernel/pid_max 163840 $ perf sched record ls $ echo 5000 > /proc/sys/kernel/pid_max $ cat /proc/sys/kernel/pid_max 5000 Before this patch: $ perf sched replay run measurement overhead: 221 nsecs sleep measurement overhead: 55356 nsecs the run test took 1000011 nsecs the sleep test took 1060940 nsecs perf: builtin-sched.c:337: register_pid: Assertion `!(pid >= (unsigned long)pid_max)' failed. Aborted After this patch: $ perf sched replay run measurement overhead: 221 nsecs sleep measurement overhead: 55611 nsecs the run test took 1000026 nsecs the sleep test took 1060486 nsecs nr_run_events: 10 nr_sleep_events: 1562 nr_wakeup_events: 5 task 0 ( :1: 1), nr_events: 1 task 1 ( :2: 2), nr_events: 1 task 2 ( :3: 3), nr_events: 1 task 3 ( :5: 5), nr_events: 1 ... Signed-off-by: Yunlong Song <yunlong.song@huawei.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/1427809596-29559-5-git-send-email-yunlong.song@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
cb06ac25 |
|
31-Mar-2015 |
Yunlong Song <yunlong.song@huawei.com> |
perf sched replay: Alloc the memory of pid_to_task dynamically to adapt to the unexpected change of pid_max The current memory allocation of struct task_desc *pid_to_task[MAX_PID] is in a permanent and preset way, and it has two problems: Problem 1: If the pid_max, which is the max number of pids in the system, is much smaller than MAX_PID (1024*1000), then it causes a waste of stack memory. This may happen in the case where the number of cpu cores is much smaller than 1000. Problem 2: If the pid_max is changed from the default value to a value larger than MAX_PID, then it will cause assertion failure problem. The maximum value of pid_max can be set to pid_max_max (see pidmap_init defined in kernel/pid.c), which equals to PID_MAX_LIMIT. In x86_64, PID_MAX_LIMIT is 4*1024*1024 (defined in include/linux/threads.h). This value is much larger than MAX_PID, and will take up 32768 Kbytes (4*1024*1024*8/1024) for memory allocation of pid_to_task, which is much larger than the default 8192 Kbytes of the stack size of calling process. Due to these two problems, we use calloc to allocate the memory of pid_to_task dynamically. Example: Test environment: x86_64 with 160 cores $ cat /proc/sys/kernel/pid_max 163840 $ echo 1025000 > /proc/sys/kernel/pid_max $ cat /proc/sys/kernel/pid_max 1025000 Run some applications until the pid of some process is greater than the value of MAX_PID (1024*1000). Before this patch: $ perf sched replay run measurement overhead: 221 nsecs sleep measurement overhead: 55480 nsecs the run test took 1000008 nsecs the sleep test took 1063151 nsecs perf: builtin-sched.c:330: register_pid: Assertion `!(pid >= 1024000)' failed. Aborted After this patch: $ perf sched replay run measurement overhead: 221 nsecs sleep measurement overhead: 55435 nsecs the run test took 1000004 nsecs the sleep test took 1059312 nsecs nr_run_events: 10 nr_sleep_events: 1562 nr_wakeup_events: 5 task 0 ( :1: 1), nr_events: 1 task 1 ( :2: 2), nr_events: 1 task 2 ( :3: 3), nr_events: 1 task 3 ( :5: 5), nr_events: 1 ... Signed-off-by: Yunlong Song <yunlong.song@huawei.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/1427809596-29559-4-git-send-email-yunlong.song@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
a35e27d0 |
|
31-Mar-2015 |
Yunlong Song <yunlong.song@huawei.com> |
perf sched replay: Increase the MAX_PID value to fix assertion failure problem Current MAX_PID is only 65536, which will cause assertion failure problem when CPU cores are more than 64 in x86_64. This is because the pid_max value in x86_64 is at least PIDS_PER_CPU_DEFAULT * num_possible_cpus() (see function pidmap_init defined in kernel/pid.c), where PIDS_PER_CPU_DEFAULT is 1024 (defined in include/linux/threads.h). Thus for MAX_PID = 65536, the correspoinding CPU cores are 65536/1024=64. This is obviously not enough at all for x86_64, and will cause an assertion failure problem due to BUG_ON(pid >= MAX_PID) in the codes. We increase MAX_PID value from 65536 to 1024*1000, which can be used in x86_64 with 1000 cores. This number is finally decided according to the limitation of stack size of calling process. Use 'ulimit -a', the result shows the stack size of any process is 8192 Kbytes, which is defined in include/uapi/linux/resource.h (#define _STK_LIM (8*1024*1024)). Thus we choose a large enough value for MAX_PID, and make it satisfy to the limitation of the stack size, i.e., making the perf process take up a memory space just smaller than 8192 Kbytes. We have calculated and tested that 1024*1000 is OK for MAX_PID. This means perf sched replay can now be used with at most 1000 cores in x86_64 without any assertion failure problem. Example: Test environment: x86_64 with 160 cores $ cat /proc/sys/kernel/pid_max 163840 Before this patch: $ perf sched replay run measurement overhead: 240 nsecs sleep measurement overhead: 55379 nsecs the run test took 1000004 nsecs the sleep test took 1059424 nsecs perf: builtin-sched.c:330: register_pid: Assertion `!(pid >= 65536)' failed. Aborted After this patch: $ perf sched replay run measurement overhead: 221 nsecs sleep measurement overhead: 55397 nsecs the run test took 999920 nsecs the sleep test took 1053313 nsecs nr_run_events: 10 nr_sleep_events: 1562 nr_wakeup_events: 5 task 0 ( :1: 1), nr_events: 1 task 1 ( :2: 2), nr_events: 1 task 2 ( :3: 3), nr_events: 1 task 3 ( :5: 5), nr_events: 1 ... Signed-off-by: Yunlong Song <yunlong.song@huawei.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/1427809596-29559-3-git-send-email-yunlong.song@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
0755bc4d |
|
31-Mar-2015 |
Yunlong Song <yunlong.song@huawei.com> |
perf sched replay: Use struct task_desc instead of struct task_task for correct meaning There is no struct task_task at all, thus it is a typo error in the old commits, now fix it to what it should be in order to avoid unnecessary misunderstanding. Signed-off-by: Yunlong Song <yunlong.song@huawei.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/1427809596-29559-2-git-send-email-yunlong.song@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
b7b61cbe |
|
03-Mar-2015 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
perf ordered_events: Shorten function signatures By keeping pointers to machines, evlist and tool in ordered_events. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: David Ahern <dsahern@gmail.com> Cc: Don Zickus <dzickus@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-0c6huyaf59mqtm2ek9pmposl@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
ae536acf |
|
02-Mar-2015 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
perf sched: No need to keep the session around We were keeping the session around just because we kept pointers to struct thread instances, but now we reference count them, so no need for deferring the perf_session__delete call to after we traverse the work_list entries. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: David Ahern <dsahern@gmail.com> Cc: Don Zickus <dzickus@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-9agtck6jdr3rebdp39z1lo0e@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
f3b623b8 |
|
02-Mar-2015 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
perf tools: Reference count struct thread We need to do that to stop accumulating entries in the dead_threads linked list, i.e. we were keeping references to threads in struct hists that continue to exist even after a thread exited and was removed from the machine threads rbtree. We still keep the dead_threads list, but just for debugging, allowing us to iterate at any given point over the threads that still are referenced by things like struct hist_entry. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: David Ahern <dsahern@gmail.com> Cc: Don Zickus <dzickus@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-3ejvfyed0r7ue61dkurzjux4@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
75be989a |
|
14-Feb-2015 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
perf evlist: Adopt events_stats from perf_session For tools that don't deal with perf.data files, thus do not need to use perf_session. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: David Ahern <dsahern@gmail.com> Cc: Don Zickus <dzickus@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-kglq67gvauq9tak02a4se00r@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
b3f25b6e |
|
09-Oct-2014 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
perf sched: Stop updating hists stats, not used Not used here, remove to reduce perf_evsel/hists structs interaction. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: David Ahern <dsahern@gmail.com> Cc: Don Zickus <dzickus@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jean Pihet <jean.pihet@linaro.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-cb7wkk4a3jpoovzim914ih3c@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
fb74fbda |
|
13-Aug-2014 |
Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> |
perf sched: Use strerror_r instead of strerror Use strerror_r instead of strerror in error message for thread-safety. Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Naohiro Aota <naota@elisp.net> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20140814022247.3545.4564.stgit@kbuild-fedora.novalocal Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
0a7e6d1b |
|
12-Aug-2014 |
Namhyung Kim <namhyung@kernel.org> |
perf tools: Check recorded kernel version when finding vmlinux Currently vmlinux_path__init() only tries to find vmlinux file from current directory, /boot and some canonical directories with version number of the running kernel. This can be a problem when reporting old data recorded on a kernel version not running currently. We can use --symfs option for this but it's annoying for user to do it always. As we already have the info in the perf.data file, it can be changed to use it for the search automatically. Before: $ perf report ... # Samples: 4K of event 'cpu-clock' # Event count (approx.): 1067250000 # # Overhead Command Shared Object Symbol # ........ .......... ................. .............................. 71.87% swapper [kernel.kallsyms] [k] recover_probed_instruction After: # Overhead Command Shared Object Symbol # ........ .......... ................. .................... 71.87% swapper [kernel.kallsyms] [k] native_safe_halt This requires to change signature of symbol__init() to receive struct perf_session_env *. Reported-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Namhyung Kim <namhyung.kim@lge.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1407825645-24586-14-git-send-email-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
04934106 |
|
12-Aug-2014 |
Namhyung Kim <namhyung@kernel.org> |
perf sched: Move call to symbol__init() after creating session This is a preparation of fixing dso__load_kernel_sym(). It needs a session info before calling symbol__init(). Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Namhyung Kim <namhyung.kim@lge.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1407825645-24586-10-git-send-email-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
0a8cb85c |
|
06-Jul-2014 |
Jiri Olsa <jolsa@kernel.org> |
perf tools: Rename ordered_samples bool to ordered_events The time ordering is generic for all kinds of events, so using generic name 'ordered_events' for ordered_samples bool in perf_tool struct. No functional change was intended. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Acked-by: David Ahern <dsahern@gmail.com> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jean Pihet <jean.pihet@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-07mrqzcuhsks9wfmxrzsvemz@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
57480d2c |
|
30-Jun-2014 |
Yann Droneaud <ydroneaud@opteya.com> |
perf tools: Enable close-on-exec flag on perf file descriptor In commit a21b0b354d4a ('perf: Introduce a flag to enable close-on-exec in perf_event_open()'), flag PERF_FLAG_FD_CLOEXEC was added to perf_event_open(2) syscall to allows userspace to atomically enable close-on-exec behavor when creating the file descriptor. This patch makes perf tools use the new flag if supported by the kernel, so that the event file descriptors got automatically closed if perf tool exec a sub-command. Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/n/1404160127-7475-1-git-send-email-ydroneaud@opteya.com Signed-off-by: Jiri Olsa <jolsa@kernel.org>
|
#
1fcb8768 |
|
14-Jul-2014 |
Adrian Hunter <adrian.hunter@intel.com> |
perf machine: Fix the value used for unknown pids The value used for unknown pids cannot be zero because that is used by the "idle" task. Use -1 instead. Also handle the unknown pid case when creating map groups. Note that, threads with an unknown pid should not occur because fork (or synthesized) events precede the thread's existence. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1405332185-4050-2-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
1844dbcb |
|
27-May-2014 |
Namhyung Kim <namhyung@kernel.org> |
perf tools: Introduce hists__inc_nr_samples() There're some duplicate code for counting number of samples. Add hists__inc_nr_samples() and reuse it. Suggested-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: http://lkml.kernel.org/r/1401335910-16832-2-git-send-email-namhyung@kernel.org Signed-off-by: Jiri Olsa <jolsa@kernel.org>
|
#
9d372ca5 |
|
15-May-2014 |
Dongsheng Yang <yangds.fnst@cn.fujitsu.com> |
perf sched: Cleanup, remove unused variables in map_switch_event() In map_switch_event(), we don't care the previous process currently, this patch remove the infomation we get but not used. Signed-off-by: Dongsheng Yang <yangds.fnst@cn.fujitsu.com> Link: http://lkml.kernel.org/r/1400218625-14613-1-git-send-email-yangds.fnst@cn.fujitsu.com Signed-off-by: Jiri Olsa <jolsa@kernel.org>
|
#
67d6259d |
|
12-May-2014 |
Dongsheng Yang <yangds.fnst@cn.fujitsu.com> |
perf sched: Remove nr_state_machine_bugs in perf latency As we do not use .success in sched_wakeup event any more, then we can not guarantee that the task when wakeup event happen is out of run queue. So the message of nr_state_machine_bugs is not correct. Signed-off-by: Dongsheng Yang <yangds.fnst@cn.fujitsu.com> Link: http://lkml.kernel.org/r/1399945101-21736-1-git-send-email-yangds.fnst@cn.fujitsu.com Signed-off-by: Jiri Olsa <jolsa@kernel.org>
|
#
0680ee7d |
|
12-May-2014 |
Peter Zijlstra <peterz@infradead.org> |
perf tools: Remove usage of trace_sched_wakeup(.success) trace_sched_wakeup(.success) is a dead argument and has been for ages, the only reason its still there is because of brain dead software, which apparently includes perf tools There's a few more instances in pearly snake shit, but that's not supported as far as I care anyhow, so let that bitrot. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20140512181946.GG13467@laptop.programming.kicks-ass.net Signed-off-by: Jiri Olsa <jolsa@kernel.org>
|
#
6bcab4e1 |
|
05-May-2014 |
Dongsheng <yangds.fnst@cn.fujitsu.com> |
perf tools: Clarify the output of perf sched map. In output of perf sched map, any shortname of thread will be explained at the first time when it appear. Example: *A0 228836.978985 secs A0 => perf:23032 *. A0 228836.979016 secs B0 => swapper:0 . *C0 228836.979099 secs C0 => migration/3:22 *A0 . C0 228836.979115 secs A0 . *. 228836.979115 secs But B0, which is explained as swapper:0 did not appear in the left part of output. Instead, we use '.' as the shortname of swapper:0. So the comment of "B0 => swapper:0" is not easy to understand. This patch clarify the output of perf sched map with not allocating one letter-number shortname for swapper:0 and print ". => swapper:0" as the explanation for swapper:0. Example: *A0 228836.978985 secs A0 => perf:23032 * . A0 228836.979016 secs . => swapper:0 . *B0 228836.979099 secs B0 => migration/3:22 *A0 . B0 228836.979115 secs A0 . * . 228836.979115 secs A0 *C0 . 228836.979225 secs C0 => ksoftirqd/2:18 A0 *D0 . 228836.979236 secs D0 => rcu_sched:7 Signed-off-by: Dongsheng <yangds.fnst@cn.fujitsu.com> Acked-by: Ingo Molnar <mingo@kernel.org> Link: http://lkml.kernel.org/r/1399354741-19522-1-git-send-email-yangds.fnst@cn.fujitsu.com [ small style fixes to make checkpatch happy ] Signed-off-by: Jiri Olsa <jolsa@kernel.org>
|
#
e936e8e4 |
|
05-May-2014 |
Dongsheng <yangds.fnst@cn.fujitsu.com> |
perf tools: Adapt the TASK_STATE_TO_CHAR_STR to new value in kernel space. Currently, TASK_STATE_TO_CHAR_STR in kernel space is already expanded to RSDTtZXxKWP, but it is still RSDTtZX in perf sched tool. This patch update TASK_STATE_TO_CHAR_STR to the new value in kernel space. Signed-off-by: Dongsheng <yangds.fnst@cn.fujitsu.com> Link: http://lkml.kernel.org/r/6d2f55dc1e02c1e29a5d70bfeb9d6e8863caf2aa.1399273302.git.yangds.fnst@cn.fujitsu.com Signed-off-by: Jiri Olsa <jolsa@kernel.org>
|
#
7fff9597 |
|
05-May-2014 |
Dongsheng <yangds.fnst@cn.fujitsu.com> |
perf tools: Add missing event for perf sched record. We should record and process sched:sched_wakeup_new event in perf sched tool, but currently, there is the process function for it, without recording it in record subcommand. This patch add -e sched:sched_wakeup_new to perf sched record. Signed-off-by: Dongsheng <yangds.fnst@cn.fujitsu.com> Link: http://lkml.kernel.org/r/710c6edd2162b2cea1711443f54de47c0210d9fd.1399273302.git.yangds.fnst@cn.fujitsu.com Signed-off-by: Jiri Olsa <jolsa@kernel.org>
|
#
a83edb2d |
|
14-Mar-2014 |
Ramkumar Ramachandra <artagnon@gmail.com> |
perf sched: Introduce --list-cmds for use by scripts Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com> Acked-by: David Ahern <dsahern@gmail.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/1394853474-31019-5-git-send-email-artagnon@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Jiri Olsa <jolsa@redhat.com>
|
#
80790e0b |
|
17-Mar-2014 |
Ramkumar Ramachandra <artagnon@gmail.com> |
perf sched: Fixup header alignment in 'latency' output Before: --------------------------------------------------------------------------------------------------------------- Task | Runtime ms | Switches | Average delay ms | Maximum delay ms | Maximum delay at | --------------------------------------------------------------------------------------------------------------- ... | | | | | git:24540 | 336.622 ms | 10 | avg: 0.032 ms | max: 0.062 ms | max at: 115610.111046 s git:24541 | 0.457 ms | 1 | avg: 0.000 ms | max: 0.000 ms | max at: 0.000000 s ----------------------------------------------------------------------------------------- TOTAL: | 396.542 ms | 353 | --------------------------------------------------- After: ----------------------------------------------------------------------------------------------------------------- Task | Runtime ms | Switches | Average delay ms | Maximum delay ms | Maximum delay at | ----------------------------------------------------------------------------------------------------------------- ... | | | | | git:24540 | 336.622 ms | 10 | avg: 0.032 ms | max: 0.062 ms | max at: 115610.111046 s git:24541 | 0.457 ms | 1 | avg: 0.000 ms | max: 0.000 ms | max at: 0.000000 s ----------------------------------------------------------------------------------------------------------------- TOTAL: | 396.542 ms | 353 | --------------------------------------------------- Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/1395065901-25740-1-git-send-email-artagnon@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
74cf249d |
|
27-Dec-2013 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
perf tools: Use zfree to help detect use after free bugs Several areas already used this technique, so do some audit to consistently use it elsewhere. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-9sbere0kkplwe45ak6rk4a1f@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
744a9719 |
|
06-Nov-2013 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
perf evsel: Ditch evsel->handler.data field Not needed since this cset: fcf65bf149af: perf evsel: Cache associated event_format So lets trim this struct a bit. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-j8setslokt0goiwxq9dogzqm@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
b9c5143a |
|
11-Sep-2013 |
Frederic Weisbecker <fweisbec@gmail.com> |
perf tools: Use an accessor to read thread comm As the thread comm is going to be implemented by way of a more complicated data structure than just a pointer to a string from the thread struct, convert the readers of comm to use an accessor instead of accessing it directly. The accessor will be later overriden to support an enhanced comm implementation. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Tested-by: Jiri Olsa <jolsa@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: David Ahern <dsahern@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-wr683zwy94hmj4ibogmnv9ce@git.kernel.org [ Rename thread__comm_curr() to thread__comm_str() ] Signed-off-by: Namhyung Kim <namhyung@kernel.org> [ Fixed up some minor const pointer issues ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
156a2b02 |
|
22-Oct-2013 |
Adrian Hunter <adrian.hunter@intel.com> |
perf sched: Optimize build time builtin-sched.c took a log time to build with -O6 optimization. This turned out to be caused by: .curr_pid = { [0 ... MAX_CPUS - 1] = -1 }, Fix by initializing curr_pid programmatically. This addresses the problem cured in f36f83f947ed using a smaller hammer. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: David Ahern <dsahern@gmail.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1382427258-17495-13-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
8a39df8f |
|
22-Oct-2013 |
Adrian Hunter <adrian.hunter@intel.com> |
perf sched: Make struct perf_sched sched a local variable Change "struct perf_sched sched" from being global to being local. The build slowdown cured by f36f83f947ed is dealt with in the following patch, by programatically setting perf_sched.curr_pid. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: David Ahern <dsahern@gmail.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1382427258-17495-12-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
f5fc1412 |
|
15-Oct-2013 |
Jiri Olsa <jolsa@redhat.com> |
perf tools: Add data object to handle perf data file This patch is adding 'struct perf_data_file' object as a placeholder for all attributes regarding perf.data file handling. Changing perf_session__new to take it as an argument. The rest of the functionality will be added later to keep this change simple enough, because all the places using perf_session are changed now. Signed-off-by: Jiri Olsa <jolsa@redhat.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1381847254-28809-2-git-send-email-jolsa@redhat.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
314add6b |
|
27-Aug-2013 |
Adrian Hunter <adrian.hunter@intel.com> |
perf tools: change machine__findnew_thread() to set thread pid Add a new parameter for 'pid' to machine__findnew_thread(). Change callers to pass 'pid' when it is known. Note that callers sometimes want to find the main thread which has the memory maps. The main thread has tid == pid so the usage in that case is: machine__findnew_thread(machine, pid, pid) whereas the usage to find the specific thread is: machine__findnew_thread(machine, pid, tid) Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: David Ahern <dsahern@gmail.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1377591794-30553-2-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
cb627505 |
|
07-Aug-2013 |
David Ahern <dsahern@gmail.com> |
perf sched: Remove sched_process_fork tracepoint The PERF_RECORD_FORK event is already collected as part of the use of cmd_record and those events are analyzed as part of the libperf machinery. Using the fork tracepoint as well just duplicates the event load. Signed-off-by: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1375930261-77273-6-git-send-email-dsahern@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
4a957e4d |
|
07-Aug-2013 |
David Ahern <dsahern@gmail.com> |
perf sched: Remove sched_process_exit tracepoint Event is not needed nor analyzed. Since perf-sched leverages perf-record to capture the sched data, we already capture task events like EXIT. Signed-off-by: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1375930261-77273-5-git-send-email-dsahern@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
ffb273dd |
|
07-Aug-2013 |
David Ahern <dsahern@gmail.com> |
perf sched: Remove thread lookup in sample handler Not used in the function, so no sense in doing the lookup here. Thread look up will be done in the timehist command, and no sense in doing it twice. Signed-off-by: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1375930261-77273-4-git-send-email-dsahern@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
ad9def7c |
|
07-Aug-2013 |
David Ahern <dsahern@gmail.com> |
perf sched: Simplify arguments to read_events Destroy argument is not necessary. If session is not returned to caller, then clean it up. Signed-off-by: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1375930261-77273-3-git-send-email-dsahern@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
38051234 |
|
04-Jul-2013 |
Adrian Hunter <adrian.hunter@intel.com> |
perf tools: struct thread has a tid not a pid As evident from 'machine__process_fork_event()' and 'machine__process_exit_event()' the 'pid' member of struct thread is actually the tid. Rename 'pid' to 'tid' in struct thread accordingly. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: David Ahern <dsahern@gmail.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1372944040-32690-13-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
f36f83f9 |
|
03-Jun-2013 |
Namhyung Kim <namhyung.kim@lge.com> |
perf sched: Move struct perf_sched definition out of cmd_sched() For some reason it consumed quite amount of compile time when declared as local variable, and it disappeared when moved out of the function. Moving other variables/tables didn't help. On my system this single-file-change build time reduced from 11s to 3s. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1370324779-16921-1-git-send-email-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
4a4d371a |
|
05-Jun-2013 |
Jiri Olsa <jolsa@redhat.com> |
perf record: Remove -f/--force option It no longer have any affect on the processing and is marked as obsolete anyway. Signed-off-by: Jiri Olsa <jolsa@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-tvwyspiqr4getzfib2lw06ty@git.kernel.org Link: http://lkml.kernel.org/r/1372307120-737-1-git-send-email-namhyung@kernel.org [ combined patch removing the -f usage in various sub-commands, such as 'perf sched', etc, by Namhyung Kim ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
1c6763cb |
|
27-Mar-2013 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
Revert "perf sched: Handle PERF_RECORD_EXIT events" This reverts commit 0439539f72ea222fbfe511b47318b9c1815a7108. This caused this segfault: [root@sandy linux]# perf sched rec ^C[ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 1.306 MB perf.data (~57062 samples) ] perf [root@sandy linux]# perf sched lat perf: builtin-sched.c:781: thread_atoms_search: Assertion `!(thread != atoms->thread)' failed. Aborted (core dumped) [root@sandy linux]# Further investigation is needed to check that even with machine__remove_thread() not really deleting the thread referenced in the PERF_RECORD_EXIT (it goes to machine->dead_threads, because references may still exist to them in things like hist, etc) some event later comes for this dead thread and then machine__findnew_thread() will create a new thead instance that will not be the same as the one referenced by work_atoms->thread in thread_atoms_search(). For now just revert this patch to get the 'perf sched lat' back working. Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> echo Link: http://lkml.kernel.org/n/tip-`ranpwd -l 24`@git.kernel.org Link: http://lkml.kernel.org/n/tip-hg4s6e5txiwqe00h8rdg1sin@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
28a6b6aa |
|
18-Dec-2012 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
perf session: There is no need for a per session hists instance It was being used just for its stats member, so ditch session->hists and use just what is needed, session->stats. This completes the move support multiple events in the hists layer, the last user of session->hists was 'perf diff' but Jiri Olsa has fixed that some time ago. Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-pimk92kek8kcp4dmb1jakoro@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
70cb4e96 |
|
29-Oct-2012 |
Feng Tang <feng.tang@intel.com> |
perf tools: Add a global variable "const char *input_name" Currently many perf commands annotate/evlist/report/script/lock etc all support "-i" option to chose a specific perf data, and all of them create a local "input_name" to save the file name for that perf data. Since most of these commands need it, we can add a global variable for it, also it can some other benefits: 1. When calling script browser inside hists/annotation browser, it needs to know the perf data file name to run that script. 2. For further feature like runtime switching to another perf data file, this variable can also help. Signed-off-by: Feng Tang <feng.tang@intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1351569369-26732-2-git-send-email-feng.tang@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
0439539f |
|
06-Oct-2012 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
perf sched: Handle PERF_RECORD_EXIT events Noticed sched wasn't handling those events while introducing perf_event__process_{fork,exit}. Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-035rzjtnv9ri8sssi7ojjjq0@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
f62d3f0f |
|
06-Oct-2012 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
perf event: No need to create a thread when handling PERF_RECORD_EXIT When we were processing a PERF_RECORD_EXIT event we first used machine__findnew_thread for both the thread exiting and for its parent, only to use just the thread struct associated with the one exiting, and to just delete it. If it existed, i.e. not created at this very moment in machine__findnew_thread, it will be moved to the machine->dead_threads linked list, because we may have hist_entries pointing to it, but if it was created just do be deleted, it will just sit there with no references at all. Use the new machine__find_thread() method so that if it is not there, we don't create it. As a bonus the parent thread will also not be created at this point. Create process_fork() and process_exit() helpers to use this and make the builtins use it instead of the generic process_task(), ditched by this patch. Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-z7n2y98ebjyrvmytaope4vdl@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
73ee3b27 |
|
01-Oct-2012 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
perf sched: Look up thread using tid instead of pid Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-zdu8up6vahogckg2uft7wh3n@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
60b7d14a |
|
11-Sep-2012 |
Namhyung Kim <namhyung.kim@lge.com> |
perf sched: Fixup for the die() removal The commit a116e05dcf61 ("perf sched: Remove die() calls") replaced die() call to pr_debug + return -1, but it should be pr_err otherwise it'll not show up unless -v option is given. Fix it. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1347415866-303-2-git-send-email-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
9ec3f4e4 |
|
11-Sep-2012 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
perf sched: Don't read all tracepoint variables in advance Do it just at the actual consumer of these fields, that way we avoid needless lookups: [root@sandy ~]# perf sched record sleep 30s [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 8.585 MB perf.data (~375063 samples) ] Before: [root@sandy ~]# perf stat -r 10 perf sched lat > /dev/null Performance counter stats for 'perf sched lat' (10 runs): 103.592215 task-clock # 0.993 CPUs utilized ( +- 0.33% ) 12 context-switches # 0.114 K/sec ( +- 3.29% ) 0 cpu-migrations # 0.000 K/sec 7,605 page-faults # 0.073 M/sec ( +- 0.00% ) 345,796,112 cycles # 3.338 GHz ( +- 0.07% ) [82.90%] 106,876,796 stalled-cycles-frontend # 30.91% frontend cycles idle ( +- 0.38% ) [83.23%] 62,060,877 stalled-cycles-backend # 17.95% backend cycles idle ( +- 0.80% ) [67.14%] 628,246,586 instructions # 1.82 insns per cycle # 0.17 stalled cycles per insn ( +- 0.04% ) [83.64%] 134,962,057 branches # 1302.820 M/sec ( +- 0.10% ) [83.64%] 1,233,037 branch-misses # 0.91% of all branches ( +- 0.29% ) [83.41%] 0.104333272 seconds time elapsed ( +- 0.33% ) [root@sandy ~]# perf stat -r 10 perf sched lat > /dev/null Performance counter stats for 'perf sched lat' (10 runs): 98.848272 task-clock # 0.993 CPUs utilized ( +- 0.48% ) 11 context-switches # 0.112 K/sec ( +- 2.83% ) 0 cpu-migrations # 0.003 K/sec ( +- 50.92% ) 7,604 page-faults # 0.077 M/sec ( +- 0.00% ) 332,216,085 cycles # 3.361 GHz ( +- 0.14% ) [82.87%] 100,623,710 stalled-cycles-frontend # 30.29% frontend cycles idle ( +- 0.53% ) [82.95%] 58,788,692 stalled-cycles-backend # 17.70% backend cycles idle ( +- 0.59% ) [67.15%] 609,402,433 instructions # 1.83 insns per cycle # 0.17 stalled cycles per insn ( +- 0.04% ) [83.76%] 131,277,138 branches # 1328.067 M/sec ( +- 0.06% ) [83.77%] 1,117,871 branch-misses # 0.85% of all branches ( +- 0.32% ) [83.51%] 0.099580430 seconds time elapsed ( +- 0.48% ) [root@sandy ~]# Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-kracdpw8wqlr0xjh75uk8g11@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
2b7fcbc5 |
|
11-Sep-2012 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
perf sched: Use perf_evsel__{int,str}val This patch also stops reading the common fields, as they were not being used except for one ->common_pid case that was replaced by sample->tid, i.e. the info is already in the perf_sample struct. Also it only fills the _event structures when there is a handler. [root@sandy ~]# perf sched record sleep 30s [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 8.585 MB perf.data (~375063 samples) ] Before: [root@sandy ~]# perf stat -r 10 perf sched lat > /dev/null Performance counter stats for 'perf sched lat' (10 runs): 129.117838 task-clock # 0.994 CPUs utilized ( +- 0.28% ) 14 context-switches # 0.111 K/sec ( +- 2.10% ) 0 cpu-migrations # 0.002 K/sec ( +- 66.67% ) 7,654 page-faults # 0.059 M/sec ( +- 0.67% ) 438,121,661 cycles # 3.393 GHz ( +- 0.06% ) [83.06%] 150,808,605 stalled-cycles-frontend # 34.42% frontend cycles idle ( +- 0.14% ) [83.10%] 80,748,941 stalled-cycles-backend # 18.43% backend cycles idle ( +- 0.64% ) [66.73%] 758,605,879 instructions # 1.73 insns per cycle # 0.20 stalled cycles per insn ( +- 0.08% ) [83.54%] 162,164,321 branches # 1255.940 M/sec ( +- 0.10% ) [83.70%] 1,609,903 branch-misses # 0.99% of all branches ( +- 0.08% ) [83.62%] 0.129949153 seconds time elapsed ( +- 0.28% ) After: [root@sandy ~]# perf stat -r 10 perf sched lat > /dev/null Performance counter stats for 'perf sched lat' (10 runs): 103.592215 task-clock # 0.993 CPUs utilized ( +- 0.33% ) 12 context-switches # 0.114 K/sec ( +- 3.29% ) 0 cpu-migrations # 0.000 K/sec 7,605 page-faults # 0.073 M/sec ( +- 0.00% ) 345,796,112 cycles # 3.338 GHz ( +- 0.07% ) [82.90%] 106,876,796 stalled-cycles-frontend # 30.91% frontend cycles idle ( +- 0.38% ) [83.23%] 62,060,877 stalled-cycles-backend # 17.95% backend cycles idle ( +- 0.80% ) [67.14%] 628,246,586 instructions # 1.82 insns per cycle # 0.17 stalled cycles per insn ( +- 0.04% ) [83.64%] 134,962,057 branches # 1302.820 M/sec ( +- 0.10% ) [83.64%] 1,233,037 branch-misses # 0.91% of all branches ( +- 0.29% ) [83.41%] 0.104333272 seconds time elapsed ( +- 0.33% ) [root@sandy ~]# Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-weu9t63zkrfrazkn0gxj48xy@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
0e9b07e5 |
|
11-Sep-2012 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
perf sched: Use perf_tool as ancestor So that we can remove all the globals. Before: text data bss dec hex filename 1586833 110368 1438600 3135801 2fd939 /tmp/oldperf After: text data bss dec hex filename 1629329 93568 848328 2571225 273bd9 /root/bin/perf Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-oph40vikij0crjz4eyapneov@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
4218e673 |
|
11-Sep-2012 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
perf sched: Remove unused thread parameter From the tracepoint handling routines. Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-mcqd9mv34z6he0wqiz4a3mh9@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
1d037ca1 |
|
10-Sep-2012 |
Irina Tirdea <irina.tirdea@gmail.com> |
perf tools: Use __maybe_used for unused variables perf defines both __used and __unused variables to use for marking unused variables. The variable __used is defined to __attribute__((__unused__)), which contradicts the kernel definition to __attribute__((__used__)) for new gcc versions. On Android, __used is also defined in system headers and this leads to warnings like: warning: '__used__' attribute ignored __unused is not defined in the kernel and is not a standard definition. If __unused is included everywhere instead of __used, this leads to conflicts with glibc headers, since glibc has a variables with this name in its headers. The best approach is to use __maybe_unused, the definition used in the kernel for __attribute__((unused)). In this way there is only one definition in perf sources (instead of 2 definitions that point to the same thing: __used and __unused) and it works on both Linux and Android. This patch simply replaces all instances of __used and __unused with __maybe_unused. Signed-off-by: Irina Tirdea <irina.tirdea@intel.com> Acked-by: Pekka Enberg <penberg@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Namhyung Kim <namhyung.kim@lge.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/1347315303-29906-7-git-send-email-irina.tirdea@intel.com [ committer note: fixed up conflict with a116e05 in builtin-sched.c ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
a116e05d |
|
08-Sep-2012 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
perf sched: Remove die() calls Just use pr_err() + return -1 and perf_session__process_events to abort when some event would call die(), then let the perf's main() exit doing whatever it needs. Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-88cwdogxqomsy9tfr8r0as58@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
7f7f8d0b |
|
07-Aug-2012 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
perf sched: Use perf_sample To reduce the number of parameters passed to the various event handling functions. Cc: Andrey Wagin <avagin@gmail.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-fc537qykjjqzvyol5fecx6ug@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
fcf65bf1 |
|
07-Aug-2012 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
perf evsel: Cache associated event_format We already lookup the associated event_format when reading the perf.data header, so that we can cache the tracepoint name in evsel->name, so do it a little further and save the event_format itself, so that we can avoid relookups in tools that need to access it. Change the tools to take the most obvious advantage, when they were using pevent_find_event directly. More work is needed for further removing the need of a pointer to pevent, such as when asking for event field values ("common_pid" and the other common fields and per event_format fields). This is something that was planned but only got actually done when Andrey Wagin needed to do this lookup at perf_tool->sample() time, when we don't have access to pevent (session->pevent) to use with pevent_find_event(). Cc: Andrey Wagin <avagin@gmail.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Link: http://lkml.kernel.org/n/tip-txkvew2ckko0b594ae8fbnyk@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
da378962 |
|
27-Jun-2012 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
perf tools: Stop using a global trace events description list The pevent thing is per perf.data file, so I made it stop being static and become a perf_session member, so tools processing perf.data files use perf_session and _there_ we read the trace events description into session->pevent and then change everywhere to stop using that single global pevent variable and use the per session one. Note that it _doesn't_ fall backs to trace__event_id, as we're not interested at all in what is present in the /sys/kernel/debug/tracing/events in the workstation doing the analysis, just in what is in the perf.data file. This patch also introduces perf_session__set_tracepoints_handlers that is the perf perf.data/session way to associate handlers to tracepoint events by resolving their IDs using the events descriptions stored in a perf.data file. Make 'perf sched' use it. Reported-by: Dmitry Antipov <dmitry.antipov@linaro.org> Tested-by: Dmitry Antipov <dmitry.antipov@linaro.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: linaro-dev@lists.linaro.org Cc: patches@linaro.org Link: http://lkml.kernel.org/r/20120625232016.GA28525@infradead.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
22c8b843 |
|
12-Jun-2012 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
perf tools: Don't access evsel->name directly One needs to use perf_evsel__name() so that if needed the name gets synthesized and stored in evsel->name, from where perf_evsel__name() will serve from them on. Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-ml7zbenjmri9bghmrea0jm0d@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
aaf045f7 |
|
05-Apr-2012 |
Steven Rostedt <srostedt@redhat.com> |
perf: Have perf use the new libtraceevent.a library The event parsing code in perf was originally copied from trace-cmd but never was kept up-to-date with the changes that was done there. The trace-cmd libtraceevent.a code is much more mature than what is currently in perf. This updates the code to use wrappers to handle the calls to the new event parsing code. The new code requires a handle to be pass around, which removes the global event variables and allows more than one event structure to be read from different files (and different machines). But perf still has the old global events and the code throughout perf does not yet have a nice way to pass around a handle. A global 'pevent' has been made for perf and the old calls have been created as wrappers to the new event parsing code that uses the global pevent. With this change, perf can later incorporate the pevent handle into the perf structures and allow more than one file to be read and compared, that contains different events. Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Arun Sharma <asharma@fb.com> Cc: Namhyung Kim <namhyung.kim@lge.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
|
#
7b78f136 |
|
04-Apr-2012 |
Markus Trippelsdorf <markus@trippelsdorf.de> |
perf tools: Fix getrusage() related build failure on glibc trunk On a system running glibc trunk perf doesn't build: CC builtin-sched.o builtin-sched.c: In function ‘get_cpu_usage_nsec_parent’: builtin-sched.c:399:16: error: storage size of ‘ru’ isn’t known builtin-sched.c:403:2: error: implicit declaration of function ‘getrusage’ [-Werror=implicit-function-declaration] [...] Fix it by including sys/resource.h. Signed-off-by: Markus Trippelsdorf <markus@trippelsdorf.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20120404084527.GA294@x4 Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
efad1415 |
|
07-Dec-2011 |
Robert Richter <robert.richter@amd.com> |
perf report: Accept fifos as input file The default input file for perf report is not handled the same way as perf record does it for its output file. This leads to unexpected behavior of perf report, etc. E.g.: # perf record -a -e cpu-cycles sleep 2 | perf report | cat failed to open perf.data: No such file or directory (try 'perf record' first) While perf record writes to a fifo, perf report expects perf.data to be read. This patch changes this to accept fifos as input file. Applies to the following commands: perf annotate perf buildid-list perf evlist perf kmem perf lock perf report perf sched perf script perf timechart Also fixes char const* -> const char* type declaration for filename strings. v2: * Prevent potential null pointer access to input_name in builtin-report.c. Needed due to removal of patch "perf report: Setup browser if stdout is a pipe" Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1323248577-11268-5-git-send-email-robert.richter@amd.com Signed-off-by: Robert Richter <robert.richter@amd.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
ee29be62 |
|
28-Nov-2011 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
perf tools: Save some loops using perf_evlist__id2evsel Since we already ask for PERF_SAMPLE_ID and use it to quickly find the associated evsel, add handler func + data to struct perf_evsel to avoid using chains of if(strcmp(event_name)) and also to avoid all the linear list searches via trace_event_find. To demonstrate the technique convert 'perf sched' to it: # perf sched record sleep 5m And then: Performance counter stats for '/tmp/oldperf sched lat': 646.929438 task-clock # 0.999 CPUs utilized 9 context-switches # 0.000 M/sec 0 CPU-migrations # 0.000 M/sec 20,901 page-faults # 0.032 M/sec 1,290,144,450 cycles # 1.994 GHz <not supported> stalled-cycles-frontend <not supported> stalled-cycles-backend 1,606,158,439 instructions # 1.24 insns per cycle 339,088,395 branches # 524.151 M/sec 4,550,735 branch-misses # 1.34% of all branches 0.647524759 seconds time elapsed Versus: Performance counter stats for 'perf sched lat': 473.564691 task-clock # 0.999 CPUs utilized 9 context-switches # 0.000 M/sec 0 CPU-migrations # 0.000 M/sec 20,903 page-faults # 0.044 M/sec 944,367,984 cycles # 1.994 GHz <not supported> stalled-cycles-frontend <not supported> stalled-cycles-backend 1,442,385,571 instructions # 1.53 insns per cycle 308,383,106 branches # 651.195 M/sec 4,481,784 branch-misses # 1.45% of all branches 0.474215751 seconds time elapsed [root@emilia ~]# Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-1kbzpl74lwi6lavpqke2u2p3@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
45694aa7 |
|
28-Nov-2011 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
perf tools: Rename perf_event_ops to perf_tool To better reflect that it became the base class for all tools, that must be in each tool struct and where common stuff will be put. Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-qgpc4msetqlwr8y2k7537cxe@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
743eb868 |
|
28-Nov-2011 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
perf tools: Resolve machine earlier and pass it to perf_event_ops Reducing the exposure of perf_session further, so that we can use the classes in cases where no perf.data file is created. Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-stua66dcscsezzrcdugvbmvd@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
d20deb64 |
|
25-Nov-2011 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
perf tools: Pass tool context in the the perf_event_ops functions So that we don't need to have that many globals. Next steps will remove the 'session' pointer, that in most cases is not needed. Then we can rename perf_event_ops to 'perf_tool' that better describes this class hierarchy. Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-wp4djox7x6w1i2bab1pt4xxp@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
e3f42609 |
|
16-Nov-2011 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
perf tools: Use evsel->attr.sample_type instead of session->sample_type Eventually session->sample_type will go away as we want to support multiple sample types per session, so use it from the evsel which is a step in that direction. Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-0vwdpjcwbjezw459lw5n3ew1@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
580cabed |
|
09-Aug-2011 |
Jiri Olsa <jolsa@redhat.com> |
perf sched: Usage leftover from trace -> script rename The 'perf sched' command usage still showing 'trace' command instead of the 'script' command. Cc: Ingo Molnar <mingo@elte.hu> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20110809124651.GD2056@jolsa.brq.redhat.com Signed-off-by: Jiri Olsa <jolsa@redhat.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
4c09bafa |
|
08-Aug-2011 |
Jiri Olsa <jolsa@redhat.com> |
perf sched: Do not delete session object prematurely The session object is released prematurely when processing events for latency command. The session's thread objects are used within the output_lat_thread function. Runnning following commands: # perf sched record # perf sched latency the latter displays incorrect data and might cause access violation. Cc: Ingo Molnar <mingo@elte.hu> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1312837414-3819-1-git-send-email-jolsa@redhat.com Signed-off-by: Jiri Olsa <jolsa@redhat.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
9e69c210 |
|
15-Mar-2011 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
perf session: Pass evsel in event_ops->sample() Resolving the sample->id to an evsel since the most advanced tools, report and annotate, and the others will too when they evolve to properly support multi-event perf.data files. Good also because it does an extra validation, checking that the ID is valid when present. When that is not the case, the overhead is just a branch + function call (perf_evlist__id2evsel). Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Tom Zanussi <tzanussi@gmail.com> LKML-Reference: <new-submission> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
fb7d0b3c |
|
24-Jan-2011 |
Kyle McMartin <kyle@mcmartin.ca> |
perf tool: Fix gcc 4.6.0 issues GCC 4.6.0 in Fedora rawhide turned up some compile errors in tools/perf due to the -Werror=unused-but-set-variable flag. I've gone through and annotated some of the assignments that had side effects (ie: return value from a function) with the __used annotation, and in some cases, just removed unused code. In a few cases, we were assigning something useful, but not using it in later parts of the function. kyle@dreadnought:~/src% gcc --version gcc (GCC) 4.6.0 20110122 (Red Hat 4.6.0-0.3) Cc: Ingo Molnar <mingo@redhat.com> LKML-Reference: <20110124161304.GK27353@bombadil.infradead.org> Signed-off-by: Kyle McMartin <kyle@redhat.com> [ committer note: Fixed up the annotation fixes, as that code moved recently ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
8115d60c |
|
29-Jan-2011 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
perf tools: Kill event_t typedef, use 'union perf_event' instead And move the event_t methods to the perf_event__ too. No code changes, just namespace consistency. Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Tom Zanussi <tzanussi@gmail.com> LKML-Reference: <new-submission> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
8d50e5b4 |
|
29-Jan-2011 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
perf tools: Rename 'struct sample_data' to 'struct perf_sample' Making the namespace more uniform. Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Tom Zanussi <tzanussi@gmail.com> LKML-Reference: <new-submission> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
9486aa38 |
|
22-Jan-2011 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
perf tools: Fix 64 bit integer format strings Using %L[uxd] has issues in some architectures, like on ppc64. Fix it by making our 64 bit integers typedefs of stdint.h types and using PRI[ux]64 like, for instance, git does. Reported by Denis Kirjanov that provided a patch for one case, I went and changed all cases. Reported-by: Denis Kirjanov <dkirjanov@kernel.org> Tested-by: Denis Kirjanov <dkirjanov@kernel.org> LKML-Reference: <20110120093246.GA8031@hera.kernel.org> Cc: Denis Kirjanov <dkirjanov@kernel.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Pingtian Han <phan@redhat.com> Cc: Stephane Eranian <eranian@google.com> Cc: Tom Zanussi <tzanussi@gmail.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
9710118b |
|
12-Jan-2011 |
Stephane Eranian <eranian@google.com> |
perf sched: Fix list of events, dropping unsupported ':r' modifier Looks to me like the :r modifier is not supported anymore, so remove it from the list of events. Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robert Richter <robert.richter@amd.com> LKML-Reference: <AANLkTim=jawJyBj0iFd0r4-LCKzvjFW+NddzJMD5GUB9@mail.gmail.com> Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
12f7e036 |
|
10-Jan-2011 |
Jiri Pirko <jpirko@redhat.com> |
perf sched: Use PTHREAD_STACK_MIN to avoid pthread_attr_setstacksize() fail on ppc64: /usr/include/bits/local_lim.h:#define PTHREAD_STACK_MIN 131072 therefore following set of commands: gives: perf.2.6.37test: builtin-sched.c:493: create_tasks: Assertion `!(err)' failed. So make sure we do not set stack size lower than PTHREAD_STACK_MIN. Cc: Ingo Molnar <mingo@elte.hu> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> LKML-Reference: <20110110160417.GB2685@psychotron.brq.redhat.com> Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
e462dc55 |
|
10-Jan-2011 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
perf sched: Fix allocation result check Bug introduced in ce47dc56. Reported-by: Mike Galbraith <efault@gmx.de> Cc: Chris Samuel <chris@csamuel.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Tom Zanussi <tzanussi@gmail.com> LKML-Reference: <new-submission> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
21ef97f0 |
|
09-Dec-2010 |
Ian Munsie <imunsie@au1.ibm.com> |
perf session: Fallback to unordered processing if no sample_id_all If we are running the new perf on an old kernel without support for sample_id_all, we should fall back to the old unordered processing of events. If we didn't than we would *always* process events without timestamps out of order, whether or not we hit a reordering race. In other words, instead of there being a chance of not attributing samples correctly, we would guarantee that samples would not be attributed. While processing all events without timestamps before events with timestamps may seem like an intuitive solution, it falls down as PERF_RECORD_EXIT events would also be processed before any samples. Even with a workaround for that case, samples before/after an exec would not be attributed correctly. This patch allows commands to indicate whether they need to fall back to unordered processing, so that commands that do not care about timestamps on every event will not be affected. If we do fallback, this will print out a warning if report -D was invoked. This patch adds the test in perf_session__new so that we only need to test once per session. Commands that do not use an event_ops (such as record and top) can simply pass NULL in it's place. Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> LKML-Reference: <1291951882-sup-6069@au1.ibm.com> Signed-off-by: Ian Munsie <imunsie@au1.ibm.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
ce47dc56 |
|
12-Nov-2010 |
Chris Samuel <chris@csamuel.org> |
perf tools: Catch a few uncheck calloc/malloc's There were a few stray calloc()'s and malloc()'s which were not having their return values checked for success. As the calling code either already coped with failure or didn't actually care we just return -ENOMEM at that point. Cc: Ingo Molnar <mingo@elte.hu> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Chris Samuel <chris@csamuel.org> LKML-Reference: <4CDDF95A.1050400@csamuel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
640c03ce |
|
02-Dec-2010 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
perf session: Parse sample earlier At perf_session__process_event, so that we reduce the number of lines in eache tool sample processing routine that now receives a sample_data pointer already parsed. This will also be useful in the next patch, where we'll allow sample the identity fields in MMAP, FORK, EXIT, etc, when it will be possible to see (cpu, timestamp) just after before every event. Also validate callchains in perf_session__process_event, i.e. as early as possible, and keep a counter of the number of events discarded due to invalid callchains, warning the user about it if it happens. There is an assumption that was kept that all events have the same sample_type, that will be dealt with in the future, when this preexisting limitation will be removed. Tested-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Ian Munsie <imunsie@au1.ibm.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Frédéric Weisbecker <fweisbec@gmail.com> Cc: Ian Munsie <imunsie@au1.ibm.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Stephane Eranian <eranian@google.com> LKML-Reference: <1291318772-30880-4-git-send-email-acme@infradead.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
133dc4c3 |
|
16-Nov-2010 |
Ingo Molnar <mingo@elte.hu> |
perf: Rename 'perf trace' to 'perf script' Free the perf trace name space and rename the trace to 'script' which is a better match for the scripting engine. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
af64865b |
|
31-May-2010 |
Frederic Weisbecker <fweisbec@gmail.com> |
perf: Use event__process_task from perf sched perf sched uses event__process_comm(), which means it can resolve comms from: - tasks that have exec'ed (kernel comm events) - tasks that were running when perf record started the actual recording (synthetized comm events) But perf sched can't resolve the pids of tasks that were created after the recording started. To solve this, we need to inherit the comms on fork events using event__process_task(). This fixes various unresolved pids in perf sched, easily visible with: perf sched record perf bench sched messaging Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Tom Zanussi <tzanussi@gmail.com> Cc: Stephane Eranian <eranian@google.com>
|
#
edb7c60e |
|
17-May-2010 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
perf options: Type check all the remaining OPT_ variants OPT_SET_INT was renamed to OPT_SET_UINT since the only use in these tools is to set something that has an enum type, that is builtin compatible with unsigned int. Several string constifications were done to make OPT_STRING require a const char * type. Cc: Frédéric Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Cc: Tom Zanussi <tzanussi@gmail.com> LKML-Reference: <new-submission> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
1967936d |
|
17-May-2010 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
perf options: Check v type in OPT_U?INTEGER To avoid problems like the one fixed by Stephane Eranian in 3de29ca, now we'll got this instead: bench/sched-messaging.c:259: error: negative width in bit-field ‘<anonymous>’ bench/sched-messaging.c:261: error: negative width in bit-field ‘<anonymous>’ Which is rather cryptic, but is how BUILD_BUG_ON_ZERO works, so kernel hackers should be already used to this. With it in place found some problems, fixed by changing the affected variables to sensible types or changed some OPT_INTEGER to OPT_UINTEGER. Next csets will go thru converting each of the remaining OPT_ so that review can be made easier by grouping changes per type per patch. Cc: Frédéric Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Cc: Tom Zanussi <tzanussi@gmail.com> LKML-Reference: <new-submission> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
cee75ac7 |
|
14-May-2010 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
perf hist: Clarify events_stats fields usage The events_stats.total field is too generic, rename it to .total_period, and also add a comment explaining that it is the sum of all the .period fields in samples, that is needed because we use auto-freq to avoid sampling artifacts. Ditto for events_stats.lost, that is the sum of all lost_event.lost fields, i.e. the number of events the kernel dropped. Looking at the users, builtin-sched.c can make use of these fields and stop doing it again. Cc: Frédéric Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Tom Zanussi <tzanussi@gmail.com> LKML-Reference: <new-submission> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
454c407e |
|
01-May-2010 |
Tom Zanussi <tzanussi@gmail.com> |
perf: add perf-inject builtin Currently, perf 'live mode' writes build-ids at the end of the session, which isn't actually useful for processing live mode events. What would be better would be to have the build-ids sent before any of the samples that reference them, which can be done by processing the event stream and retrieving the build-ids on the first hit. Doing that in perf-record itself, however, is off-limits. This patch introduces perf-inject, which does the same job while leaving perf-record untouched. Normal mode perf still records the build-ids at the end of the session as it should, but for live mode, perf-inject can be injected in between the record and report steps e.g.: perf record -o - ./hackbench 10 | perf inject -v -b | perf report -v -i - perf-inject reads a perf-record event stream and repipes it to stdout. At any point the processing code can inject other events into the event stream - in this case build-ids (-b option) are read and injected as needed into the event stream. Build-ids are just the first user of perf-inject - potentially anything that needs userspace processing to augment the trace stream with additional information could make use of this facility. Cc: Ingo Molnar <mingo@elte.hu> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frédéric Weisbecker <fweisbec@gmail.com> LKML-Reference: <1272696080-16435-3-git-send-email-tzanussi@gmail.com> Signed-off-by: Tom Zanussi <tzanussi@gmail.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
a64eae70 |
|
23-Apr-2010 |
Frederic Weisbecker <fweisbec@gmail.com> |
perf: Use generic sample reordering in perf sched Use the new generic sample events reordering from perf sched, this drops the need of multiplexing the buffers on record time, improving the scalability of perf sched. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp> Cc: Ingo Molnar <mingo@elte.hu> Cc: Masami Hiramatsu <mhiramat@redhat.com> Cc: Tom Zanussi <tzanussi@gmail.com>
|
#
c0555642 |
|
13-Apr-2010 |
Ian Munsie <imunsie@au.ibm.com> |
perf: Fix endianness argument compatibility with OPT_BOOLEAN() and introduce OPT_INCR() Parsing an option from the command line with OPT_BOOLEAN on a bool data type would not work on a big-endian machine due to the manner in which the boolean was being cast into an int and incremented. For example, running 'perf probe --list' on a PowerPC machine would fail to properly set the list_events bool and would therefore print out the usage information and terminate. This patch makes OPT_BOOLEAN work as expected with a bool datatype. For cases where the original OPT_BOOLEAN was intentionally being used to increment an int each time it was passed in on the command line, this patch introduces OPT_INCR with the old behaviour of OPT_BOOLEAN (the verbose variable is currently the only such example of this). I have reviewed every use of OPT_BOOLEAN to verify that a true C99 bool was passed. Where integers were used, I verified that they were only being used for boolean logic and changed them to bools to ensure that they would not be mistakenly used as ints. The major exception was the verbose variable which now uses OPT_INCR instead of OPT_BOOLEAN. Signed-off-by: Ian Munsie <imunsie@au.ibm.com> Acked-by: David S. Miller <davem@davemloft.net> Cc: <stable@kernel.org> # NOTE: wont apply to .3[34].x cleanly, please backport Cc: Git development list <git@vger.kernel.org> Cc: Ian Munsie <imunsie@au1.ibm.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Eric B Munson <ebmunson@us.ibm.com> Cc: Valdis.Kletnieks@vt.edu Cc: WANG Cong <amwang@redhat.com> Cc: Thiago Farina <tfransosi@gmail.com> Cc: Masami Hiramatsu <mhiramat@redhat.com> Cc: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Cc: Jaswinder Singh Rajput <jaswinderrajput@gmail.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Cc: Mike Galbraith <efault@gmx.de> Cc: Tom Zanussi <tzanussi@gmail.com> Cc: Anton Blanchard <anton@samba.org> Cc: John Kacur <jkacur@redhat.com> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Steven Rostedt <rostedt@goodmis.org> LKML-Reference: <1271147857-11604-1-git-send-email-imunsie@au.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
eed05fe7 |
|
04-Apr-2010 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
perf tools: Reorganize some structs to save space Using 'pahole --packable' I found some structs that could be reorganized to eliminate alignment holes, in some cases getting them to be cacheline multiples. [acme@doppio linux-2.6-tip]$ codiff perf.old ~/bin/perf builtin-annotate.c: struct perf_session | -8 struct perf_header | -8 2 structs changed builtin-diff.c: struct sample_data | -8 1 struct changed diff__process_sample_event | -8 1 function changed, 8 bytes removed, diff: -8 builtin-sched.c: struct sched_atom | -8 1 struct changed builtin-timechart.c: struct per_pid | -8 1 struct changed cmd_timechart | -16 1 function changed, 16 bytes removed, diff: -16 builtin-probe.c: struct perf_probe_point | -8 struct perf_probe_event | -8 2 structs changed opt_add_probe_event | -3 1 function changed, 3 bytes removed, diff: -3 util/probe-finder.c: struct probe_finder | -8 1 struct changed find_kprobe_trace_events | -16 1 function changed, 16 bytes removed, diff: -16 /home/acme/bin/perf: 4 functions changed, 43 bytes removed, diff: -43 [acme@doppio linux-2.6-tip]$ Cc: Frédéric Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> LKML-Reference: <new-submission> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
0d755034 |
|
13-Jan-2010 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
perf tools: Don't cast RIP to pointers Since they can come from another architecture with bigger pointers, i.e. processing a 64-bit perf.data on a 32-bit arch. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frédéric Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> LKML-Reference: <1263478990-8200-1-git-send-email-acme@infradead.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
55aa640f |
|
27-Dec-2009 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
perf session: Remove redundant prefix & suffix from perf_event_ops Since now all that we have are perf event handlers, leave just the name of the event. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frédéric Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> LKML-Reference: <1261957026-15580-9-git-send-email-acme@infradead.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
d549c769 |
|
27-Dec-2009 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
perf session: Remove sample_type_check from event_ops This is really something tools need to do before asking for the events to be processed, leaving perf_session__process_events to do just that, process events. Also add a msg parameter to perf_session__has_traces() so that the right message can be printed, fixing a regression added by me in the previous cset (right timechart message) and also fixing 'perf kmem', that was not asking if 'perf kmem record' was ran. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frédéric Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> LKML-Reference: <1261957026-15580-6-git-send-email-acme@infradead.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
27295592 |
|
27-Dec-2009 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
perf session: Share the common trace sample_check routine as perf_session__has_traces Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frédéric Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> LKML-Reference: <1261957026-15580-5-git-send-email-acme@infradead.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
75be6cf4 |
|
15-Dec-2009 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
perf symbols: Make symbol_conf global This simplifies a lot of functions, less stuff to be done by tool writers. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frédéric Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> LKML-Reference: <1260914682-29652-1-git-send-email-acme@infradead.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
c019879b |
|
14-Dec-2009 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
perf session: Adopt the sample_type variable All tools had copies, and perf diff would have to specify a sample_type_check method just for copying it. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frédéric Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> LKML-Reference: <1260807780-19377-2-git-send-email-acme@infradead.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
4e4f06e4 |
|
14-Dec-2009 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
perf session: Move the hist_entries rb tree to perf_session As we'll need to sort multiple times for multiple perf sessions, so that we can then do a diff. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frédéric Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> LKML-Reference: <1260803439-16783-1-git-send-email-acme@infradead.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
4aa65636 |
|
13-Dec-2009 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
perf session: Move kmaps to perf_session There is still some more work to do to disentangle map creation from DSO loading, but this happens only for the kernel, and for the early adopters of perf diff, where this disentanglement matters most, we'll be testing different kernels, so no problem here. Further clarification: right now we create the kernel maps for the various modules and discontiguous kernel text maps when loading the DSO, we should do it as a two step process, first creating the maps, for multiple mappings with the same DSO store, then doing the dso load just once, for the first hit on one of the maps sharing this DSO backing store. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frédéric Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> LKML-Reference: <1260741029-4430-6-git-send-email-acme@infradead.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
b3165f41 |
|
13-Dec-2009 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
perf session: Move the global threads list to perf_session So that we can process two perf.data files. We still need to add a O_MMAP mode for perf_session so that we can do all the mmap stuff in it. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frédéric Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> LKML-Reference: <1260741029-4430-5-git-send-email-acme@infradead.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
ec913369 |
|
13-Dec-2009 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
perf session: Reduce the number of parms to perf_session__process_events By having the cwd/cwdlen in the perf_session struct and full_paths in perf_event_ops. Now its just a matter of passing the ops. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frédéric Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> LKML-Reference: <1260741029-4430-4-git-send-email-acme@infradead.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
13df45ca |
|
13-Dec-2009 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
perf session: Register the idle thread in perf_session__process_events No need for all tools to register it and then immediately call perf_session__process_events. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frédéric Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> LKML-Reference: <1260741029-4430-3-git-send-email-acme@infradead.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
301a0b02 |
|
13-Dec-2009 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
perf session: Ditch register_perf_file_handler Pass the event_ops to perf_session__process_events instead. Also move the event_ops definition to session.h, starting to move things around to their right place, trimming the many unneeded headers we have. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frédéric Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> LKML-Reference: <1260741029-4430-2-git-send-email-acme@infradead.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
d8f66248 |
|
13-Dec-2009 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
perf session: Pass the perf_session to the event handling operations They will need it to get the right threads list, etc. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frédéric Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> LKML-Reference: <1260741029-4430-1-git-send-email-acme@infradead.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
94c744b6 |
|
11-Dec-2009 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
perf tools: Introduce perf_session class That does all the initialization boilerplate, opening the file, reading the header, checking if it is valid, etc. And that will as well have the threads list, kmap (now) global variable, etc, so that we can handle two (or more) perf.data files describing sessions to compare. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frédéric Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> LKML-Reference: <1260573842-19720-1-git-send-email-acme@infradead.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
3786310a |
|
09-Dec-2009 |
Frederic Weisbecker <fweisbec@gmail.com> |
perf sched: Add max delay time snapshot When we have a maximum latency reported for a task, we need a convenient way to find the matching location to the raw traces or to perf sched map that shows where the task has been eventually scheduled in. This gives a pointer to retrieve the events that occured during this max latency. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Reviewed-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Paul Mackerras <paulus@samba.org> LKML-Reference: <1260391208-6808-1-git-send-regression-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
c0c9e721 |
|
09-Dec-2009 |
Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> |
perf sched: Fix for getting task's execution time In current code, task's execute time is got by reading '/proc/<pid>/sched' file, it's wrong if the task is created by pthread_create(), because every thread task has same pid. This way also has two demerits: 1: 'perf sched replay' can't work if the kernel is not compiled with the 'CONFIG_SCHED_DEBUG' option 2: perf tool should depend on proc file system So, this patch uses PERF_COUNT_SW_TASK_CLOCK to get task's execution time instead of reading /proc file. Changelog v2 -> v3: use PERF_COUNT_SW_TASK_CLOCK instead of rusage() as Ingo's suggestion Reported-by: Torok Edwin <edwintorok@gmail.com> Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Cc: Xiao Guangrong <ericxiao.gr@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Paul Mackerras <paulus@samba.org> LKML-Reference: <4B1F7322.80103@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
f48f669d |
|
06-Dec-2009 |
Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> |
perf_event: Eliminate raw->size raw->size is not used, this patch just cleans it up. Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Li Zefan <lizf@cn.fujitsu.com> LKML-Reference: <4B1C8CC4.4050007@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
d8bd9e0a |
|
06-Dec-2009 |
Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> |
perf_event: Fix raw event processing We use 'data.raw_data' parameter to call process_raw_event(), but data.raw_data buffer not include data size. it can make perf tool crash. This bug was introduced by commit 180f95e29a ("perf: Make common SAMPLE_EVENT parser"). Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Li Zefan <lizf@cn.fujitsu.com> LKML-Reference: <4B1C7F45.5080105@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
c0777c5a |
|
06-Dec-2009 |
Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> |
perf/sched: Fix 'perf sched trace' If we use 'perf sched trace', it will call symbol__init() again, and can lead to a perf tool crash: [root@localhost perf]# ./perf sched trace *** glibc detected *** ./perf: free(): invalid next size (normal): 0x094c1898 *** ======= Backtrace: ========= /lib/libc.so.6[0xb7602404] /lib/libc.so.6(cfree+0x96)[0xb76043b6] ./perf[0x80730fe] ./perf[0x8074c97] ./perf[0x805eb59] ./perf[0x80536fd] ./perf[0x804b618] ./perf[0x804bdc3] /lib/libc.so.6(__libc_start_main+0xe5)[0xb75a9735] ./perf[0x804af81] ======= Memory map: ======== 08048000-08158000 r-xp 00000000 fe:00 556831 /home/eric/.... 08158000-08168000 rw-p 0010f000 fe:00 556831 /home/eric/... 08168000-085fe000 rw-p 00000000 00:00 0 094ab000-094cc000 rw-p 00000000 00:00 0 [heap] Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> LKML-Reference: <4B1C7EE1.8030906@cn.fujitsu.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
180f95e2 |
|
06-Dec-2009 |
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> |
perf: Make common SAMPLE_EVENT parser Currently, sample event data is parsed for each commands, and it is assuming that the data is not including other data. (E.g. timechart, trace, etc. can't parse the event if it has PERF_SAMPLE_CALLCHAIN) So, even if we record the superset data for multiple commands at a time, commands can't parse. etc. To fix it, this makes common sample event parser, and use it to parse sample event correctly. (PERF_SAMPLE_READ is unsupported for now though, it seems to be not using.) Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <87hbs48imv.fsf@devron.myhome.or.jp> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
62daacb5 |
|
27-Nov-2009 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
perf tools: Reorganize event processing routines, lotsa dups killed While implementing event__preprocess_sample, that will do all of the symbol lookup in one convenient function, I noticed that util/process_event.[ch] were not being used at all, then started looking if there were other functions that could be shared and... All those functions really don't need to receive offset + head, the only thing they did was common to all of them, so do it at one place instead. Stats about number of each type of event processed now is done in a central place. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frédéric Weisbecker <fweisbec@gmail.com> Cc: John Kacur <jkacur@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> LKML-Reference: <1259346563-12568-11-git-send-email-acme@infradead.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
36479484 |
|
23-Nov-2009 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
perf tools: Introduce zalloc() for the common calloc(1, N) case This way we type less characters and it looks more like the kzalloc kernel counterpart. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frédéric Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> LKML-Reference: <1259071517-3242-3-git-send-email-acme@infradead.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
b32d133a |
|
23-Nov-2009 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
perf symbols: Simplify symbol machinery setup And also express its configuration toggles via a struct. Now all one has to do is to call symbol__init(NULL) if the defaults are OK, or pass a struct symbol_conf pointer with the desired configuration. If a tool uses kernel_maps__find_symbol() to look at the kernel and modules mappings for a symbol but didn't call symbol__init() first, that will generate a one time warning too, alerting the subcommand developer that symbol__init() must be called. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frédéric Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> LKML-Reference: <1259071517-3242-2-git-send-email-acme@infradead.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
cc612d81 |
|
23-Nov-2009 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
perf symbols: Look for vmlinux in more places Now that we can check the buildid to see if it really matches, this can be done safely: vmlinux /boot/vmlinux /boot/vmlinux-<uts.release> /lib/modules/<uts.release>/build/vmlinux /usr/lib/debug/lib/modules/%s/vmlinux More can be added - if you know about distros that put the vmlinux somewhere else please let us know. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frédéric Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> LKML-Reference: <1259001550-8194-1-git-send-email-acme@infradead.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
00a192b3 |
|
30-Oct-2009 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
perf tools: Simplify the symbol priv area mechanism Before we were storing this in the DSO, but in fact this is a property of the 'symbol' class, not something that will vary among DSOs, so move it to a global variable and initialize it using the existing symbol__init routine. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Mike Galbraith <efault@gmx.de> LKML-Reference: <1256927305-4628-2-git-send-email-acme@infradead.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
6beba7ad |
|
21-Oct-2009 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
perf tools: Unify debug messages mechanisms We were using eprintf in some places, that looks at a global 'verbose' level, and at other places passing a 'v' parameter to specify the verbosity level, unify it by introducing pr_{err,warning,debug,etc}, just like in the kernel. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Mike Galbraith <efault@gmx.de> LKML-Reference: <1256153646-10097-1-git-send-email-acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
5a116dd2 |
|
17-Oct-2009 |
Frederic Weisbecker <fweisbec@gmail.com> |
perf tools: Use kernel bitmap library Use the kernel bitmap library for internal perf tools uses. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Steven Rostedt <rostedt@goodmis.org> LKML-Reference: <1255792354-11304-1-git-send-email-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
f39cdf25 |
|
17-Oct-2009 |
Julia Lawall <julia@diku.dk> |
perf tools: Move dereference after NULL test In each case, if the NULL test on thread is needed, then the dereference should be after the NULL test. A simplified version of the semantic match that detects this problem is as follows (http://coccinelle.lip6.fr/): // <smpl> @match exists@ expression x, E; identifier fld; @@ * x->fld ... when != \(x = E\|&x\) * x == NULL // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> LKML-Reference: <Pine.LNX.4.64.0910170842500.9213@ask.diku.dk> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
d5b889f2 |
|
13-Oct-2009 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
perf tools: Move threads & last_match to threads.c This was just being copy'n'pasted all over. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Mike Galbraith <efault@gmx.de> LKML-Reference: <20091013141629.GD21809@ghostprotocols.net> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
55ffb7a6 |
|
10-Oct-2009 |
Mike Galbraith <efault@gmx.de> |
perf sched: Add -C option to measure on a specific CPU To refresh, trying to sched record only one CPU results in bogus latencies as below. I fixed^Wmade it stop doing the bad thing today, by following task migration events properly. Before: marge:/root/tmp # taskset -c 1 perf sched record -C 0 -- sleep 10 marge:/root/tmp # perf sched lat ----------------------------------------------------------------------------------------- Task | Runtime ms | Switches | Average delay ms | Maximum delay ms | ----------------------------------------------------------------------------------------- Xorg:4943 | 1.290 ms | 1 | avg: 1670.132 ms | max: 1670.132 ms | hald-addon-stor:3569 | 0.091 ms | 3 | avg: 658.609 ms | max: 1975.797 ms | hald-addon-stor:3573 | 0.209 ms | 4 | avg: 499.138 ms | max: 1990.565 ms | audispd:4270 | 0.012 ms | 1 | avg: 0.015 ms | max: 0.015 ms | .... marge:/root/tmp # perf sched trace|grep 'Xorg:4943' swapper-0 [000] 401.184013288: sched_stat_runtime: task: Xorg:4943 runtime: 1233188 [ns], vruntime: 19105169779 [ns] rt2870TimerQHan-4947 [000] 402.854140127: sched_stat_wait: task: Xorg:4943 wait: 580073 [ns] rt2870TimerQHan-4947 [000] 402.854141770: sched_migrate_task: task Xorg:4943 [140] from: 1 to: 0 rt2870TimerQHan-4947 [000] 402.854143854: sched_stat_wait: task: Xorg:4943 wait: 0 [ns] rt2870TimerQHan-4947 [000] 402.854145397: sched_switch: task rt2870TimerQHan:4947 [140] (D) ==> Xorg:4943 [140] Xorg-4943 [000] 402.854193133: sched_stat_runtime: task: Xorg:4943 runtime: 56546 [ns], vruntime: 11766332500 [ns] Xorg-4943 [000] 402.854196842: sched_switch: task Xorg:4943 [140] (S) ==> swapper:0 [140] After: marge:/root/tmp # taskset -c 1 perf sched record -C 0 -- sleep 10 marge:/root/tmp # perf sched lat ----------------------------------------------------------------------------------------- Task | Runtime ms | Switches | Average delay ms | Maximum delay ms | ----------------------------------------------------------------------------------------- amarokapp:11150 | 271.297 ms | 878 | avg: 0.130 ms | max: 1.057 ms | konsole:5965 | 1.370 ms | 12 | avg: 0.092 ms | max: 0.855 ms | Xorg:4943 | 179.980 ms | 1109 | avg: 0.087 ms | max: 1.206 ms | hald-addon-stor:3574 | 0.212 ms | 9 | avg: 0.040 ms | max: 0.169 ms | hald-addon-stor:3570 | 0.223 ms | 9 | avg: 0.037 ms | max: 0.223 ms | klauncher:5864 | 0.550 ms | 8 | avg: 0.032 ms | max: 0.048 ms | The 'Maximum delay ms' results are now sane. Signed-off-by: Mike Galbraith <efault@gmx.de> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
cbef79a8 |
|
05-Oct-2009 |
Randy Dunlap <randy.dunlap@oracle.com> |
perf tools: Fix const char type propagation The following perf build warnings/errors in function argument types: builtin-sched.c:1894: warning: passing argument 1 of 'sort_dimension__add' discards qualifiers from pointer target type util/trace-event-parse.c:685: warning: passing argument 2 of 'read_expected' discards qualifiers from pointer target type util/trace-event-parse.c:741: warning: passing argument 4 of 'test_type_token' discards qualifiers from pointer target type util/trace-event-parse.c:706: warning: passing argument 2 of 'read_expected_item' discards qualifiers from pointer target type ... trigger because older GCC is not able to prove that sort_dimension__add() does not change the string. Some goes for test_type_token(). Fix this by improving type consistency. Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> LKML-Reference: <20091005131729.78444bfb.randy.dunlap@oracle.com> [ Also remove ugly type cast now unnecessary. ] Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
97ea1a7f |
|
08-Oct-2009 |
Frederic Weisbecker <fweisbec@gmail.com> |
perf tools: Fix thread comm resolution in perf sched This reverts commit 9a92b479b2f088ee2d3194243f4c8e59b1b8c9c2 ("perf tools: Improve thread comm resolution in perf sched") and fixes the real bug. The bug was elsewhere: We are failing to resolve thread names in perf sched because the table of threads we are building, on top of comm events, has a per process granularity. But perf sched, unlike the other perf tools, needs a per thread granularity as we are profiling every tasks individually. So fix it by building our threads table using the tid instead of the pid as the thread identifier. v2: Revert the previous fix - it is not really needed Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> LKML-Reference: <1255028657-11158-1-git-send-email-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
9a92b479 |
|
08-Oct-2009 |
Frederic Weisbecker <fweisbec@gmail.com> |
perf tools: Improve thread comm resolution in perf sched When we get sched traces that involve a task that was already created before opening the event, we won't have the comm event for it. So if we can't find the comm event for a given thread, we look at the traces that may contain these informations. Before: ata/1:371 | 0.000 ms | 1 | avg: 3988.693 ms | max: 3988.693 ms | kondemand/1:421 | 0.096 ms | 3 | avg: 345.346 ms | max: 1035.989 ms | kondemand/0:420 | 0.025 ms | 3 | avg: 421.332 ms | max: 964.014 ms | :5124:5124 | 0.103 ms | 5 | avg: 74.082 ms | max: 277.194 ms | :6244:6244 | 0.691 ms | 9 | avg: 125.655 ms | max: 271.306 ms | firefox:5080 | 0.924 ms | 5 | avg: 53.833 ms | max: 257.828 ms | npviewer.bin:6225 | 21.871 ms | 53 | avg: 22.462 ms | max: 220.835 ms | :6245:6245 | 9.631 ms | 21 | avg: 41.864 ms | max: 213.349 ms | After: ata/1:371 | 0.000 ms | 1 | avg: 3988.693 ms | max: 3988.693 ms | kondemand/1:421 | 0.096 ms | 3 | avg: 345.346 ms | max: 1035.989 ms | kondemand/0:420 | 0.025 ms | 3 | avg: 421.332 ms | max: 964.014 ms | firefox:5124 | 0.103 ms | 5 | avg: 74.082 ms | max: 277.194 ms | npviewer.bin:6244 | 0.691 ms | 9 | avg: 125.655 ms | max: 271.306 ms | firefox:5080 | 0.924 ms | 5 | avg: 53.833 ms | max: 257.828 ms | npviewer.bin:6225 | 21.871 ms | 53 | avg: 22.462 ms | max: 220.835 ms | npviewer.bin:6245 | 9.631 ms | 21 | avg: 41.864 ms | max: 213.349 ms | Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> LKML-Reference: <1255012632-7882-1-git-send-email-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
016e92fb |
|
06-Oct-2009 |
Frederic Weisbecker <fweisbec@gmail.com> |
perf tools: Unify perf.data mapping and events handling This librarizes the perf.data file mapping and handling in various perf tools, roughly reducing the amount of code and fixing the places that mmap from beginning of the file whereas we want to mmap from the beginning of the data, leading to page fault because the mmap window is too small since the trace info are written in the file too. TODO: - convert perf timechart too Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arjan van de Ven <arjan@infradead.org> LKML-Reference: <20091007104729.GD5043@nowhere> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
03456a15 |
|
06-Oct-2009 |
Frederic Weisbecker <fweisbec@gmail.com> |
perf tools: Merge trace.info content into perf.data This drops the trace.info file and move its contents into the common perf.data file. This is done by creating a new trace_info section into this file. A user of perf headers needs to call perf_header__set_trace_info() to save the trace meta informations into the perf.data file. A file created by perf after his patch is unsupported by previous version because the size of the headers have increased. That said, it's two new fields that have been added in the end of the headers, and those could be ignored by previous versions if they just handled the dynamic header size and then ignore the unknow part. The offsets guarantee the compatibility. We'll do a -stable fix for that. But current previous versions handle the header size using its static size, not dynamic, then it's not backward compatible with trace records. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> LKML-Reference: <20091006213643.GA5343@nowhere> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
a80deb62 |
|
28-Sep-2009 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
perf sched: Remove dead code Several variables are not used at all, cut'n'paste leftovers. Also check if the sample_type is RAW earlier, to avoid needless searches. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frédéric Weisbecker <fweisbec@gmail.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Mike Galbraith <efault@gmx.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
cdd6c482 |
|
20-Sep-2009 |
Ingo Molnar <mingo@elte.hu> |
perf: Do the big rename: Performance Counters -> Performance Events Bye-bye Performance Counters, welcome Performance Events! In the past few months the perfcounters subsystem has grown out its initial role of counting hardware events, and has become (and is becoming) a much broader generic event enumeration, reporting, logging, monitoring, analysis facility. Naming its core object 'perf_counter' and naming the subsystem 'perfcounters' has become more and more of a misnomer. With pending code like hw-breakpoints support the 'counter' name is less and less appropriate. All in one, we've decided to rename the subsystem to 'performance events' and to propagate this rename through all fields, variables and API names. (in an ABI compatible fashion) The word 'event' is also a bit shorter than 'counter' - which makes it slightly more convenient to write/handle as well. Thanks goes to Stephane Eranian who first observed this misnomer and suggested a rename. User-space tooling and ABI compatibility is not affected - this patch should be function-invariant. (Also, defconfigs were not touched to keep the size down.) This patch has been generated via the following script: FILES=$(find * -type f | grep -vE 'oprofile|[^K]config') sed -i \ -e 's/PERF_EVENT_/PERF_RECORD_/g' \ -e 's/PERF_COUNTER/PERF_EVENT/g' \ -e 's/perf_counter/perf_event/g' \ -e 's/nb_counters/nb_events/g' \ -e 's/swcounter/swevent/g' \ -e 's/tpcounter_event/tp_event/g' \ $FILES for N in $(find . -name perf_counter.[ch]); do M=$(echo $N | sed 's/perf_counter/perf_event/g') mv $N $M done FILES=$(find . -name perf_event.*) sed -i \ -e 's/COUNTER_MASK/REG_MASK/g' \ -e 's/COUNTER/EVENT/g' \ -e 's/\<event\>/event_id/g' \ -e 's/counter/event/g' \ -e 's/Counter/Event/g' \ $FILES ... to keep it as correct as possible. This script can also be used by anyone who has pending perfcounters patches - it converts a Linux kernel tree over to the new naming. We tried to time this change to the point in time where the amount of pending patches is the smallest: the end of the merge window. Namespace clashes were fixed up in a preparatory patch - and some stylistic fallout will be fixed up in a subsequent patch. ( NOTE: 'counters' are still the proper terminology when we deal with hardware registers - and these sed scripts are a bit over-eager in renaming them. I've undone some of that, but in case there's something left where 'counter' would be better than 'event' we can undo that on an individual basis instead of touching an otherwise nicely automated patch. ) Suggested-by: Stephane Eranian <eranian@google.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Paul Mackerras <paulus@samba.org> Reviewed-by: Arjan van de Ven <arjan@linux.intel.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: David Howells <dhowells@redhat.com> Cc: Kyle McMartin <kyle@mcmartin.ca> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: <linux-arch@vger.kernel.org> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
4b77a729 |
|
18-Sep-2009 |
Mike Galbraith <efault@gmx.de> |
perf sched: Add --input=file option to builtin-sched.c perf sched record passes unparsed args on to perf record, so specifying an output file via perf sched record -o FILE (cmd) just works. Ergo, provide an option to specify input file as well. Also add the missing 'map' command to help. Signed-off-by: Mike Galbraith <efault@gmx.de> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1253254944.20589.11.camel@marge.simson.net> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
40749d0f |
|
17-Sep-2009 |
Ingo Molnar <mingo@elte.hu> |
perf sched: Determine the number of CPUs automatically For 'perf sched map' output, determine max_cpu automatically, instead of the static default of 15. [ v2: use sysconf() pointed out by Arjan van de Ven <arjan@infradead.org> ] Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
0ec04e16 |
|
16-Sep-2009 |
Ingo Molnar <mingo@elte.hu> |
perf sched: Add 'perf sched map' scheduling event map printout This prints a textual context-switching outline of workload captured via perf sched record. For example, on a 16 CPU box it outputs: N1 O1 . . . S1 . . . B0 . *I0 C1 . M1 . 23002.773423 secs N1 O1 . *Q0 . S1 . . . B0 . I0 C1 . M1 . 23002.773423 secs N1 O1 . Q0 . S1 . . . B0 . *R1 C1 . M1 . 23002.773485 secs N1 O1 . Q0 . S1 . *S0 . B0 . R1 C1 . M1 . 23002.773478 secs *L0 O1 . Q0 . S1 . S0 . B0 . R1 C1 . M1 . 23002.773523 secs L0 O1 . *. . S1 . S0 . B0 . R1 C1 . M1 . 23002.773531 secs L0 O1 . . . S1 . S0 . B0 . R1 C1 *T1 M1 . 23002.773547 secs T1 => irqbalance:2089 L0 O1 . . . S1 . S0 . *P0 . R1 C1 T1 M1 . 23002.773549 secs *N1 O1 . . . S1 . S0 . P0 . R1 C1 T1 M1 . 23002.773566 secs N1 O1 . . . *J0 . S0 . P0 . R1 C1 T1 M1 . 23002.773571 secs N1 O1 . . . J0 . S0 *B0 P0 . R1 C1 T1 M1 . 23002.773592 secs N1 O1 . . . J0 . *U0 B0 P0 . R1 C1 T1 M1 . 23002.773582 secs N1 O1 . . . *S1 . U0 B0 P0 . R1 C1 T1 M1 . 23002.773604 secs N1 O1 . . . S1 . U0 B0 *. . R1 C1 T1 M1 . 23002.773615 secs N1 O1 . . . S1 . U0 B0 . . *K0 C1 T1 M1 . 23002.773631 secs N1 O1 . *M0 . S1 . U0 B0 . . K0 C1 T1 M1 . 23002.773624 secs N1 O1 . M0 . S1 . U0 *. . . K0 C1 T1 M1 . 23002.773644 secs N1 O1 . M0 . S1 . U0 . . . *R1 C1 T1 M1 . 23002.773662 secs N1 O1 . M0 . S1 . *. . . . R1 C1 T1 M1 . 23002.773648 secs N1 O1 . *. . S1 . . . . . R1 C1 T1 M1 . 23002.773680 secs N1 O1 . . . *L0 . . . . . R1 C1 T1 M1 . 23002.773717 secs *N0 O1 . . . L0 . . . . . R1 C1 T1 M1 . 23002.773709 secs *N1 O1 . . . L0 . . . . . R1 C1 T1 M1 . 23002.773747 secs Columns stand for individual CPUs, from CPU0 to CPU15, and the two-letter shortcuts stand for tasks that are running on a CPU. '*' denotes the CPU that had the event. A dot signals an idle CPU. New tasks are assigned new two-letter shortcuts - when they occur first they are printed. In the above example 'T1' stood for irqbalance: T1 => irqbalance:2089 Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
80ed0987 |
|
16-Sep-2009 |
Ingo Molnar <mingo@elte.hu> |
perf sched: Make idle thread and comm/pid names more consistent Peter noticed that we have 3 ways of referring to the idle thread: [idle]:0 swapper:0 swapper-0 Standardize on 'swapper:0'. Reported-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
c8a37751 |
|
16-Sep-2009 |
Ingo Molnar <mingo@elte.hu> |
perf sched: Sanity check context switch events Use 'perf sched latency' to track the current task based on context-switch events, and flag the cases where there's some impossible transition: such as a PID being switched out that was not switched in. Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
dc02bf71 |
|
16-Sep-2009 |
Ingo Molnar <mingo@elte.hu> |
perf sched: Account for lost events, increase default buffering Output such lost event and state machine weirdness stats: TOTAL: | 14974.910 ms | 46384 | --------------------------------------------------- INFO: 8.865% lost events (19132 out of 215819, in 8 chunks) INFO: 0.198% state machine bugs (49 out of 24708) (due to lost events?) And increase buffering to -m 1024 (4 MB) by default. Since we use output multiplexing that kind of space is needed. Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
39aeb52f |
|
14-Sep-2009 |
mingo <mingo@europe.(none)> |
perf sched: Add support for sched:sched_stat_runtime events This allows more precise 'perf sched latency' output: --------------------------------------------------------------------------------------- Task | Runtime ms | Switches | Average delay ms | Maximum delay ms | --------------------------------------------------------------------------------------- ksoftirqd/0-4 | 0.010 ms | 2 | avg: 2.476 ms | max: 2.977 ms | perf-12328 | 15.844 ms | 66 | avg: 1.118 ms | max: 9.979 ms | bdi-default-235 | 0.009 ms | 1 | avg: 0.998 ms | max: 0.998 ms | events/1-8 | 0.020 ms | 2 | avg: 0.998 ms | max: 0.998 ms | events/0-7 | 0.018 ms | 2 | avg: 0.992 ms | max: 0.996 ms | sleep-12329 | 0.742 ms | 3 | avg: 0.906 ms | max: 2.289 ms | sshd-12122 | 0.163 ms | 2 | avg: 0.283 ms | max: 0.562 ms | loop-getpid-lon-12322 | 1023.636 ms | 69 | avg: 0.208 ms | max: 5.996 ms | loop-getpid-lon-12321 | 1038.638 ms | 5 | avg: 0.073 ms | max: 0.171 ms | migration/1-5 | 0.000 ms | 1 | avg: 0.006 ms | max: 0.006 ms | --------------------------------------------------------------------------------------- TOTAL: | 2079.078 ms | 153 | ------------------------------------------------- Also, streamline the code a bit more, add asserts for various state machine failures (they should be debugged if they occur) and fix a few odd ends. Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
08f69e6c |
|
14-Sep-2009 |
mingo <mingo@europe.(none)> |
perf sched: Print PIDs too Often it's useful to know the PID of the task as well - print it out too. ( While at it, reformat the output to be a bit more paste-into-commit-logs friendly. ) Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
d1153389 |
|
14-Sep-2009 |
Ingo Molnar <mingo@elte.hu> |
perf sched: Fix 'perf sched latency' output on 32-bit systems Before: ----------------------------------------------------------------------------------- Task | Runtime ms | Switches | Average delay ms | Maximum delay ms | ----------------------------------------------------------------------------------- perf |4853313.251 ms | 10 | avg: 0.046 ms | max: 0.337 ms | flush-8:0 |2426659.202 ms | 5 | avg: 0.015 ms | max: 0.016 ms | sleep |485331.966 ms | 1 | avg: 0.012 ms | max: 0.012 ms | ksoftirqd/1 |485331.320 ms | 1 | avg: 0.005 ms | max: 0.005 ms | ----------------------------------------------------------------------------------- TOTAL: |8250635.739 ms | 17 | --------------------------------------------- After: ----------------------------------------------------------------------------------- Task | Runtime ms | Switches | Average delay ms | Maximum delay ms | ----------------------------------------------------------------------------------- perf | 0.206 ms | 10 | avg: 0.046 ms | max: 0.337 ms | flush-8:0 | 2.680 ms | 5 | avg: 0.015 ms | max: 0.016 ms | sleep | 0.662 ms | 1 | avg: 0.012 ms | max: 0.012 ms | ksoftirqd/1 | 0.015 ms | 1 | avg: 0.005 ms | max: 0.005 ms | ----------------------------------------------------------------------------------- TOTAL: | 3.563 ms | 17 | --------------------------------------------- Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
ea57c4f5 |
|
13-Sep-2009 |
Ingo Molnar <mingo@elte.hu> |
perf tools: Implement counter output multiplexing Finish the -M/--multiplex option implementation: - separate it out from group_fd - correctly set it via the ioctl and dont mmap counters that are multiplexed - modify the perf record event loop to deal with buffer-less counters. - remove the -g option from perf sched record - account for unordered events in perf sched latency - (add -f to perf sched record to ease measurements) - skip idle threads (pid==0) in latency output The result is better latency output by 'perf sched latency': ----------------------------------------------------------------------------------- Task | Runtime ms | Switches | Average delay ms | Maximum delay ms | ----------------------------------------------------------------------------------- ksoftirqd/8 | 0.071 ms | 2 | avg: 0.458 ms | max: 0.913 ms | at-spi-registry | 0.609 ms | 19 | avg: 0.013 ms | max: 0.023 ms | perf | 3.316 ms | 16 | avg: 0.013 ms | max: 0.054 ms | Xorg | 0.392 ms | 19 | avg: 0.011 ms | max: 0.018 ms | sleep | 0.537 ms | 2 | avg: 0.009 ms | max: 0.009 ms | ----------------------------------------------------------------------------------- TOTAL: | 4.925 ms | 58 | --------------------------------------------- Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
aa1ab9d2 |
|
13-Sep-2009 |
Frederic Weisbecker <fweisbec@gmail.com> |
perf tools: Fix processing of randomly serialized sched traces Currently it's possible to meet such too high latency results with 'perf sched latency'. ----------------------------------------------------------------------------------- Task | Runtime ms | Switches | Average delay ms | Maximum delay ms | ----------------------------------------------------------------------------------- xfce4-panel | 0.222 ms | 2 | avg: 4718.345 ms | max: 9436.493 ms | scsi_eh_3 | 3.962 ms | 36 | avg: 55.957 ms | max: 1977.829 ms | The origin is on traces that are sometimes badly serialized across cpus. For example the raw traces that raised such results for xfce4-panel: (1) [init]-0 [000] 1494.663899990: sched_switch: task swapper:0 [140] (R) ==> xfce4-panel:4569 [120] (2) xfce4-panel-4569 [000] 1494.663928373: sched_switch: task xfce4-panel:4569 [120] (S) ==> swapper:0 [140] (3) Xorg-4276 [001] 1494.663860125: sched_wakeup: task xfce4-panel:4569 [120] success=1 [000] (4) Xorg-4276 [001] 1504.098252756: sched_wakeup: task xfce4-panel:4569 [120] success=1 [000] (5) perf-5219 [000] 1504.100353302: sched_switch: task perf:5219 [120] (S) ==> xfce4-panel:4569 [120] The traces are processed in the order they arrive. Then in (2), xfce4-panel sleeps, it is first waken up in (3) and eventually scheduled in (5). The latency reported is then 1504 - 1495 = 9 secs, as reported by perf sched. But this is wrong, we are confident in the fact the traces are nicely serialized while we should actually more trust the timestamps. If we reorder by timestamps we get: (1) Xorg-4276 [001] 1494.663860125: sched_wakeup: task xfce4-panel:4569 [120] success=1 [000] (2) [init]-0 [000] 1494.663899990: sched_switch: task swapper:0 [140] (R) ==> xfce4-panel:4569 [120] (3) xfce4-panel-4569 [000] 1494.663928373: sched_switch: task xfce4-panel:4569 [120] (S) ==> swapper:0 [140] (4) Xorg-4276 [001] 1504.098252756: sched_wakeup: task xfce4-panel:4569 [120] success=1 [000] (5) perf-5219 [000] 1504.100353302: sched_switch: task perf:5219 [120] (S) ==> xfce4-panel:4569 [120] Now the trace make more sense, xfce4-panel is sleeping. Then it is woken up in (1), scheduled in (2) It goes to sleep in (3), woken up in (4) and scheduled in (5). Now, latency captured between (1) and (2) is of 39 us. And between (4) and (5) it is 2.1 ms. Such pattern of bad serializing is the origin of the high latencies reported by perf sched. Basically, we need to check whether wake up time is higher than schedule out time. If it's not the case, we need to tag the current work atom as invalid. Beside that, we may need to work later on a better ordering of the traces given by the kernel. After this patch: xfce4-session | 0.221 ms | 1 | avg: 0.538 ms | max: 0.538 ms | Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
d1302522 |
|
14-Sep-2009 |
Frederic Weisbecker <fweisbec@gmail.com> |
perf tools: Add an option to multiplex counters in a single channel Add an option to multiplex counters output in the channel of the group leader, ie: the first counter opened: -M --multiplex The effect is better serialized samples. This is especially useful for tracepoint samples that need to be well serialized for their post-processing. Also make use of this option in 'perf sched'. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
c13f0d3c |
|
13-Sep-2009 |
Ingo Molnar <mingo@elte.hu> |
perf sched: Add 'perf sched trace', improve documentation Alias 'perf sched trace' to 'perf trace', for workflow completeness. Add a bit of documentation for perf sched. Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
1fc35b29 |
|
13-Sep-2009 |
Ingo Molnar <mingo@elte.hu> |
perf sched: Implement the 'perf sched record' subcommand Implement the 'perf sched record' subcommand that adds a default list of events, turns on raw sampling and system-wide tracing and passes off the rest of the command to perf record. This is more convenient than having to specify the events all the time. Before: $ perf record -a -R -e sched:sched_switch:r -e sched:sched_stat_wait:r -e sched:sched_stat_sleep:r -e sched:sched_stat_iowait:r -e sched:sched_process_exit:r -e sched:sched_process_fork:r -e sched:sched_wakeup:r -e sched:sched_migrate_task:r -c 1 sleep 1 After: $ perf sched record -f sleep 1 Also fix an assumption in the event string parser that assumed that strings passed in can be modified. (In this case they wont be as they come from a readonly constant section.) Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
b5fae128 |
|
10-Sep-2009 |
Ingo Molnar <mingo@elte.hu> |
perf sched: Clean up PID sorting logic Use a sort list for thread atoms insertion as well - instead of hardcoded for PID. Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
b1ffe8f3 |
|
10-Sep-2009 |
Ingo Molnar <mingo@elte.hu> |
perf sched: Finish latency => atom rename and misc cleanups - Rename 'latency' field/variable names to the better 'atom' ones - Reduce the number of #include lines and consolidate them - Gather file scope variables at the top of the file - Remove unused bits No change in functionality. Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
f2858d8a |
|
10-Sep-2009 |
Ingo Molnar <mingo@elte.hu> |
perf sched: Add 'perf sched latency' and 'perf sched replay' Separate the option parsing cleanly and add two variants: - 'perf sched latency' (can be abbreviated via 'perf sched lat') - 'perf sched replay' (can be abbreviated via 'perf sched rep') Also add a repeat count option to replay and add a separation set of options for replay. Do the sorting setup only in the latency sub-command. Display separate help screens for 'perf sched' and 'perf sched replay -h' - i.e. further separation of the sub-commands. Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
daa1d7a5 |
|
12-Sep-2009 |
Frederic Weisbecker <fweisbec@gmail.com> |
perf sched: Implement multidimensional sorting Implement multidimensional sorting on perf sched so that you can sort either by number of switches, latency average, latency maximum, runtime. perf sched -l -s avg,max (this is the default) ----------------------------------------------------------------------------------- Task | Runtime ms | Switches | Average delay ms | Maximum delay ms | ----------------------------------------------------------------------------------- gnome-power-man | 0.113 ms | 1 | avg: 4998.531 ms | max: 4998.531 ms | xfdesktop | 1.190 ms | 7 | avg: 136.475 ms | max: 940.933 ms | xfce-mcs-manage | 2.194 ms | 22 | avg: 38.534 ms | max: 735.174 ms | notification-da | 2.749 ms | 31 | avg: 27.436 ms | max: 731.791 ms | xfce4-session | 3.343 ms | 28 | avg: 26.796 ms | max: 734.891 ms | xfwm4 | 3.159 ms | 22 | avg: 12.406 ms | max: 241.333 ms | xchat | 42.789 ms | 214 | avg: 11.886 ms | max: 100.349 ms | xfce4-terminal | 5.386 ms | 22 | avg: 11.414 ms | max: 241.611 ms | firefox | 151.992 ms | 123 | avg: 9.543 ms | max: 153.717 ms | xfce4-panel | 24.324 ms | 47 | avg: 8.189 ms | max: 242.352 ms | :5090 | 6.932 ms | 111 | avg: 8.131 ms | max: 102.665 ms | events/0 | 0.758 ms | 12 | avg: 1.964 ms | max: 21.879 ms | Xorg | 280.558 ms | 340 | avg: 1.864 ms | max: 99.526 ms | geany | 63.391 ms | 295 | avg: 1.099 ms | max: 9.334 ms | reiserfs/0 | 0.039 ms | 2 | avg: 0.854 ms | max: 1.487 ms | kondemand/0 | 8.251 ms | 245 | avg: 0.691 ms | max: 34.372 ms | Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
73622626 |
|
12-Sep-2009 |
Frederic Weisbecker <fweisbec@gmail.com> |
perf sched: Fix nsec to msec conversion We are dividing a time in ns by 1e9. This is a nsec to sec conversion. What we want is msecs. Fix it by dividing by 1e6. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
66685678 |
|
12-Sep-2009 |
Frederic Weisbecker <fweisbec@gmail.com> |
perf sched: Export the total, max latency and total runtime to thread atoms list Add a field in the thread atom list that keeps track of the total and max latencies and also the total runtime. This makes a faster output and also prepares for sorting. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
c6ced611 |
|
12-Sep-2009 |
Frederic Weisbecker <fweisbec@gmail.com> |
perf sched: Add involuntarily sleeping task in work atoms Currently in perf sched, we are measuring the scheduler wakeup latencies. Now we also want measure the time a task wait to be scheduled after it gets preempted. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
17562205 |
|
12-Sep-2009 |
Frederic Weisbecker <fweisbec@gmail.com> |
perf sched: Rename struct lat_snapshot to struct work atoms To measures the latencies, we capture the sched atoms data into a specific structure named struct lat_snapshot. As this structure can be used for other purposes of scheduler profiling and mirrors what happens in a thread work atom, lets rename it to struct work_atom and propagate this renaming in other functions and structures names to keep it coherent. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
3e304147 |
|
12-Sep-2009 |
Ingo Molnar <mingo@elte.hu> |
perf sched: Output runtime and context switch totals After: ----------------------------------------------------------------------------------- Task | Runtime ms | Switches | Average delay ms | Maximum delay ms | ----------------------------------------------------------------------------------- make | 0.678 ms | 13 | avg: 0.018 ms | max: 0.050 ms | gcc | 0.014 ms | 2 | avg: 0.320 ms | max: 0.627 ms | gcc | 0.000 ms | 2 | avg: 0.185 ms | max: 0.369 ms | ... ----------------------------------------------------------------------------------- TOTAL: | 21.316 ms | 63 | --------------------------------------------- Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
ea92ed5a |
|
12-Sep-2009 |
Ingo Molnar <mingo@elte.hu> |
perf sched: Add runtime stats Extend the latency tracking structure with scheduling atom runtime info - and sum it up during per task display. (Also clean up a few details.) Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
d9340c1d |
|
12-Sep-2009 |
Ingo Molnar <mingo@elte.hu> |
perf sched: Display time in milliseconds, reorganize output After: ----------------------------------------------------------------------------------- Task | runtime ms | switches | average delay ms | maximum delay ms | ----------------------------------------------------------------------------------- migration/0 | 0.000 ms | 1 | avg: 0.047 ms | max: 0.047 ms | ksoftirqd/0 | 0.000 ms | 1 | avg: 0.039 ms | max: 0.039 ms | migration/1 | 0.000 ms | 3 | avg: 0.013 ms | max: 0.016 ms | migration/3 | 0.000 ms | 2 | avg: 0.003 ms | max: 0.004 ms | migration/4 | 0.000 ms | 1 | avg: 0.022 ms | max: 0.022 ms | distccd | 0.000 ms | 1 | avg: 0.004 ms | max: 0.004 ms | distccd | 0.000 ms | 1 | avg: 0.014 ms | max: 0.014 ms | distccd | 0.000 ms | 2 | avg: 0.000 ms | max: 0.000 ms | distccd | 0.000 ms | 2 | avg: 0.012 ms | max: 0.019 ms | distccd | 0.000 ms | 1 | avg: 0.002 ms | max: 0.002 ms | as | 0.000 ms | 2 | avg: 0.019 ms | max: 0.019 ms | as | 0.000 ms | 3 | avg: 0.015 ms | max: 0.017 ms | as | 0.000 ms | 1 | avg: 0.009 ms | max: 0.009 ms | perf | 0.000 ms | 1 | avg: 0.001 ms | max: 0.001 ms | gcc | 0.000 ms | 1 | avg: 0.021 ms | max: 0.021 ms | run-mozilla.sh | 0.000 ms | 2 | avg: 0.010 ms | max: 0.017 ms | mozilla-plugin- | 0.000 ms | 1 | avg: 0.006 ms | max: 0.006 ms | gcc | 0.000 ms | 2 | avg: 0.013 ms | max: 0.013 ms | ----------------------------------------------------------------------------------- (The runtime ms column is not filled in yet.) Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
46f392c9 |
|
12-Sep-2009 |
Ingo Molnar <mingo@elte.hu> |
perf sched: Clean up latency and replay sub-commands - Separate the latency and the replay commands more cleanly - Use consistent naming - Display help page on 'perf sched' outlining comments, instead of aborting Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
cdce9d73 |
|
12-Sep-2009 |
Frederic Weisbecker <fweisbec@gmail.com> |
perf sched: Add sched latency profiling Add the -l --latency option that reports statistics about the scheduler latencies. For now, the latencies are measured in the following sequence scope: - task A is sleeping (D or S state) - task B wakes up A ^ | | latency timeframe | | v - task A is scheduled in Start by recording every scheduler events: perf record -e sched:* and then fetch the results: perf sched -l Tasks count total avg max migration/0 2 39849 19924 28826 ksoftirqd/0 7 756383 108054 373014 migration/1 5 45391 9078 10452 ksoftirqd/1 2 399055 199527 359130 events/0 8 4780110 597513 4500250 events/1 9 6353057 705895 2986012 kblockd/0 42 37805097 900121 5077684 The snapshot are in nanoseconds. - Count: number of snapshots taken for the given task - Total: total latencies in nanosec - Avg : average of latency between wake up and sched in - Max : max snapshot latency Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
419ab0d6 |
|
11-Sep-2009 |
Frederic Weisbecker <fweisbec@gmail.com> |
perf sched: Make it easier to plug in new sub profilers Create a sched event structure of handlers in which various sched events reader can plug their own callbacks. This makes easier the addition of new perf sched sub commands. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
46538818 |
|
11-Sep-2009 |
Frederic Weisbecker <fweisbec@gmail.com> |
perf sched: Fix bad event alignment perf sched raises the following error when it meets a sched switch event: perf: builtin-sched.c:286: register_pid: Assertion `!(pid >= 65536)' failed. Abandon Currently in x86-64, the sched switch events have a hole in the middle of the structure: u16 common_type; u8 common_flags; u8 common_preempt_count; u32 common_pid; u32 common_tgid; char prev_comm[16]; u32 prev_pid; u32 prev_prio; <--- there u64 prev_state; char next_comm[16]; u32 next_pid; u32 next_prio; Gcc inserts a 4 bytes hole there for prev_state to be u64 aligned. And the events are exported to userspace with this hole. But in userspace, from perf sched, we fetch it using a structure that has a new field in the beginning: u32 size. This is because our trace is exported with its size as a field. But now that we have this new field, the hole in the middle disappears because it makes prev_state becoming well aligned. And since we are using a pointer to the raw trace using this struct, instead of reading prev_state, we are reading the hole. We could fix it by keeping the size seperate from the struct but actually there a lot of other potential problems: some fields may be saved as long in a 64 bits system and later read as long in a 32 bits system. Also this direct cast doesn't care about the endianness differences between the host traced machine and the machine in which we do the post processing. So instead of using such dangerous direct casts, fetch the values using the trace parsing API that already takes care of all these problems. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
ad236fd2 |
|
10-Sep-2009 |
Ingo Molnar <mingo@elte.hu> |
perf sched: Tighten up the code Various small cleanups - removal of debug printks and dead functions, etc. Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
fbf94829 |
|
10-Sep-2009 |
Ingo Molnar <mingo@elte.hu> |
perf sched: Implement the scheduling workload replay engine Integrate the schedbench.c bits with the raw trace events that we get from the perf machinery, and activate the workload replayer/simulator. Example of a captured 'make -j' workload: $ perf sched run measurement overhead: 90 nsecs sleep measurement overhead: 2724743 nsecs the run test took 1000081 nsecs the sleep test took 2981111 nsecs version = 0.5 ... nr_run_events: 70 nr_sleep_events: 66 nr_wakeup_events: 9 target-less wakeups: 71 multi-target wakeups: 47 run events optimized: 139 task 0 ( perf: 6607), nr_events: 2 task 1 ( perf: 6608), nr_events: 6 task 2 ( : 0), nr_events: 1 task 3 ( make: 6609), nr_events: 5 task 4 ( sh: 6610), nr_events: 4 task 5 ( make: 6611), nr_events: 6 task 6 ( sh: 6612), nr_events: 4 task 7 ( make: 6613), nr_events: 5 task 8 ( migration/11: 25), nr_events: 1 task 9 ( migration/13: 29), nr_events: 1 task 10 ( migration/15: 33), nr_events: 1 task 11 ( migration/9: 21), nr_events: 1 task 12 ( sh: 6614), nr_events: 4 task 13 ( make: 6615), nr_events: 5 task 14 ( sh: 6616), nr_events: 4 task 15 ( make: 6617), nr_events: 7 task 16 ( migration/3: 9), nr_events: 1 task 17 ( migration/5: 13), nr_events: 1 task 18 ( migration/7: 17), nr_events: 1 task 19 ( migration/1: 5), nr_events: 1 task 20 ( sh: 6618), nr_events: 4 task 21 ( make: 6619), nr_events: 5 task 22 ( sh: 6620), nr_events: 4 task 23 ( make: 6621), nr_events: 10 task 24 ( sh: 6623), nr_events: 3 task 25 ( gcc: 6624), nr_events: 4 task 26 ( gcc: 6625), nr_events: 4 task 27 ( gcc: 6626), nr_events: 5 task 28 ( collect2: 6627), nr_events: 5 task 29 ( sh: 6622), nr_events: 1 task 30 ( make: 6628), nr_events: 7 task 31 ( sh: 6630), nr_events: 4 task 32 ( gcc: 6631), nr_events: 4 task 33 ( sh: 6629), nr_events: 1 task 34 ( gcc: 6632), nr_events: 4 task 35 ( gcc: 6633), nr_events: 4 task 36 ( collect2: 6634), nr_events: 4 task 37 ( make: 6635), nr_events: 8 task 38 ( sh: 6637), nr_events: 4 task 39 ( sh: 6636), nr_events: 1 task 40 ( gcc: 6638), nr_events: 4 task 41 ( gcc: 6639), nr_events: 4 task 42 ( gcc: 6640), nr_events: 4 task 43 ( collect2: 6641), nr_events: 4 task 44 ( make: 6642), nr_events: 6 task 45 ( sh: 6643), nr_events: 5 task 46 ( sh: 6644), nr_events: 3 task 47 ( sh: 6645), nr_events: 4 task 48 ( make: 6646), nr_events: 6 task 49 ( sh: 6647), nr_events: 3 task 50 ( make: 6648), nr_events: 5 task 51 ( sh: 6649), nr_events: 5 task 52 ( sh: 6650), nr_events: 6 task 53 ( make: 6651), nr_events: 4 task 54 ( make: 6652), nr_events: 5 task 55 ( make: 6653), nr_events: 4 task 56 ( make: 6654), nr_events: 4 task 57 ( make: 6655), nr_events: 5 task 58 ( sh: 6656), nr_events: 4 task 59 ( gcc: 6657), nr_events: 9 task 60 ( ksoftirqd/3: 10), nr_events: 1 task 61 ( gcc: 6658), nr_events: 4 task 62 ( make: 6659), nr_events: 5 task 63 ( sh: 6660), nr_events: 3 task 64 ( gcc: 6661), nr_events: 5 task 65 ( collect2: 6662), nr_events: 4 ------------------------------------------------------------ #1 : 256.745, ravg: 256.74, cpu: 0.00 / 0.00 #2 : 439.372, ravg: 275.01, cpu: 0.00 / 0.00 #3 : 411.971, ravg: 288.70, cpu: 0.00 / 0.00 #4 : 385.500, ravg: 298.38, cpu: 0.00 / 0.00 #5 : 366.526, ravg: 305.20, cpu: 0.00 / 0.00 #6 : 381.281, ravg: 312.81, cpu: 0.00 / 0.00 #7 : 410.756, ravg: 322.60, cpu: 0.00 / 0.00 #8 : 368.009, ravg: 327.14, cpu: 0.00 / 0.00 #9 : 408.098, ravg: 335.24, cpu: 0.00 / 0.00 #10 : 368.582, ravg: 338.57, cpu: 0.00 / 0.00 I.e. we successfully analyzed the trace, replayed it via real threads and measured the replayed workload's scheduling properties. This is how it looked like in 'top' output: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 7164 mingo 20 0 1434m 8080 888 R 57.0 0.1 0:02.04 :perf 7165 mingo 20 0 1434m 8080 888 R 41.8 0.1 0:01.52 :perf 7228 mingo 20 0 1434m 8080 888 R 39.8 0.1 0:01.44 :gcc 7225 mingo 20 0 1434m 8080 888 R 33.8 0.1 0:01.26 :gcc 7202 mingo 20 0 1434m 8080 888 R 31.2 0.1 0:01.16 :sh 7222 mingo 20 0 1434m 8080 888 R 25.2 0.1 0:00.96 :sh 7211 mingo 20 0 1434m 8080 888 R 21.9 0.1 0:00.82 :sh 7213 mingo 20 0 1434m 8080 888 D 19.2 0.1 0:00.74 :sh 7194 mingo 20 0 1434m 8080 888 D 18.6 0.1 0:00.72 :make There's still various kinks in it - more patches to come. Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
ec156764 |
|
10-Sep-2009 |
Ingo Molnar <mingo@elte.hu> |
perf sched: Import schedbench.c Import the schedbench.c tool that i wrote some time ago to simulate scheduler behavior but never finished. It's a good basis for perf sched nevertheless. Most of its guts are not hooked up to the perf event loop yet - that will be done in the patches to come. Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
0a02ad93 |
|
10-Sep-2009 |
Ingo Molnar <mingo@elte.hu> |
perf: Add 'perf sched' tool This turn-key tool allows scheduler measurements to be conducted and the results be displayed numerically. First baby step towards that goal: clone the new command off of perf trace. Fix a few other details along the way: - add (minimal) perf trace documentation - reorder a few places - list perf trace in the mainporcelain list as well as it's a very useful utility. Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|