#
d5835fb6 |
|
16-Feb-2024 |
Christophe Leroy <christophe.leroy@csgroup.eu> |
powerpc: Use user_mode() macro when possible There is a nice macro to check user mode. Use it instead of open coding anding with MSR_PR to increase readability and avoid having to comment what that anding is for. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/fbf74887dcf1f1ba9e1680fc3247cbb581b00662.1708078228.git.christophe.leroy@csgroup.eu
|
#
b22ea627 |
|
20-Feb-2024 |
Madhavan Srinivasan <maddy@linux.ibm.com> |
powerpc/perf: Power11 Performance Monitoring support Base enablement patch to register performance monitoring hardware support for Power11. Most of fields are copied from power10_pmu struct for power11_pmu struct. Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240221044623.1598642-2-mpe@ellerman.id.au
|
#
571d91dc |
|
25-Oct-2023 |
Kan Liang <kan.liang@linux.intel.com> |
perf: Add branch stack counters Currently, the additional information of a branch entry is stored in a u64 space. With more and more information added, the space is running out. For example, the information of occurrences of events will be added for each branch. Two places were suggested to append the counters. https://lore.kernel.org/lkml/20230802215814.GH231007@hirez.programming.kicks-ass.net/ One place is right after the flags of each branch entry. It changes the existing struct perf_branch_entry. The later ARCH specific implementation has to be really careful to consistently pick the right struct. The other place is right after the entire struct perf_branch_stack. The disadvantage is that the pointer of the extra space has to be recorded. The common interface perf_sample_save_brstack() has to be updated. The latter is much straightforward, and should be easily understood and maintained. It is implemented in the patch. Add a new branch sample type, PERF_SAMPLE_BRANCH_COUNTERS, to indicate the event which is recorded in the branch info. The "u64 counters" may store the occurrences of several events. The information regarding the number of events/counters and the width of each counter should be exposed via sysfs as a reference for the perf tool. Define the branch_counter_nr and branch_counter_width ABI here. The support will be implemented later in the Intel-specific patch. Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20231025201626.3000228-1-kan.liang@linux.intel.com
|
#
ea142e59 |
|
18-Oct-2023 |
Nicholas Piggin <npiggin@gmail.com> |
powerpc/perf: Fix disabling BHRB and instruction sampling When the PMU is disabled, MMCRA is not updated to disable BHRB and instruction sampling. This can lead to those features remaining enabled, which can slow down a real or emulated CPU. Fixes: 1cade527f6e9 ("powerpc/perf: BHRB control to disable BHRB logic when not used") Cc: stable@vger.kernel.org # v5.9+ Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20231018153423.298373-1-npiggin@gmail.com
|
#
eb55b455 |
|
17-Jan-2023 |
Namhyung Kim <namhyung@kernel.org> |
perf/core: Add perf_sample_save_brstack() helper When we saves the branch stack to the perf sample data, we needs to update the sample flags and the dynamic size. To make sure this is done consistently, add the perf_sample_save_brstack() helper and convert all call sites. Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Tested-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20230118060559.615653-5-namhyung@kernel.org
|
#
bd275681 |
|
08-Oct-2022 |
Peter Zijlstra <peterz@infradead.org> |
perf: Rewrite core context handling There have been various issues and limitations with the way perf uses (task) contexts to track events. Most notable is the single hardware PMU task context, which has resulted in a number of yucky things (both proposed and merged). Notably: - HW breakpoint PMU - ARM big.little PMU / Intel ADL PMU - Intel Branch Monitoring PMU - AMD IBS PMU - S390 cpum_cf PMU - PowerPC trace_imc PMU *Current design:* Currently we have a per task and per cpu perf_event_contexts: task_struct::perf_events_ctxp[] <-> perf_event_context <-> perf_cpu_context ^ | ^ | ^ `---------------------------------' | `--> pmu ---' v ^ perf_event ------' Each task has an array of pointers to a perf_event_context. Each perf_event_context has a direct relation to a PMU and a group of events for that PMU. The task related perf_event_context's have a pointer back to that task. Each PMU has a per-cpu pointer to a per-cpu perf_cpu_context, which includes a perf_event_context, which again has a direct relation to that PMU, and a group of events for that PMU. The perf_cpu_context also tracks which task context is currently associated with that CPU and includes a few other things like the hrtimer for rotation etc. Each perf_event is then associated with its PMU and one perf_event_context. *Proposed design:* New design proposed by this patch reduce to a single task context and a single CPU context but adds some intermediate data-structures: task_struct::perf_event_ctxp -> perf_event_context <- perf_cpu_context ^ | ^ ^ `---------------------------' | | | | perf_cpu_pmu_context <--. | `----. ^ | | | | | | v v | | ,--> perf_event_pmu_context | | | | | | | v v | perf_event ---> pmu ----------------' With the new design, perf_event_context will hold all events for all pmus in the (respective pinned/flexible) rbtrees. This can be achieved by adding pmu to rbtree key: {cpu, pmu, cgroup, group_index} Each perf_event_context carries a list of perf_event_pmu_context which is used to hold per-pmu-per-context state. For example, it keeps track of currently active events for that pmu, a pmu specific task_ctx_data, a flag to tell whether rotation is required or not etc. Additionally, perf_cpu_pmu_context is used to hold per-pmu-per-cpu state like hrtimer details to drive the event rotation, a pointer to perf_event_pmu_context of currently running task and some other ancillary information. Each perf_event is associated to it's pmu, perf_event_context and perf_event_pmu_context. Further optimizations to current implementation are possible. For example, ctx_resched() can be optimized to reschedule only single pmu events. Much thanks to Ravi for picking this up and pushing it towards completion. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Co-developed-by: Ravi Bangoria <ravi.bangoria@amd.com> Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20221008062424.313-1-ravi.bangoria@amd.com
|
#
b9c00127 |
|
21-Sep-2022 |
Athira Rajeev <atrajeev@linux.vnet.ibm.com> |
powerpc/perf: Fix branch_filter support for multiple filters For PERF_SAMPLE_BRANCH_STACK sample type, different branch_sample_type ie branch filters are supported. The branch filters are requested via event attribute "branch_sample_type". Multiple branch filters can be passed in event attribute. eg: $ perf record -b -o- -B --branch-filter any,ind_call true None of the Power PMUs support having multiple branch filters at the same time. Branch filters for branch stack sampling is set via MMCRA IFM bits [32:33]. But currently when requesting for multiple filter types, the "perf record" command does not report any error. eg: $ perf record -b -o- -B --branch-filter any,save_type true $ perf record -b -o- -B --branch-filter any,ind_call true The "bhrb_filter_map" function in PMU driver code does the validity check for supported branch filters. But this check is done for single filter. Hence "perf record" will proceed here without reporting any error. Fix power_pmu_event_init() to return EOPNOTSUPP when multiple branch filters are requested in the event attr. After the fix: $ perf record --branch-filter any,ind_call -- ls Error: cycles: PMU Hardware doesn't support sampling/overflow-interrupts. Try 'perf stat' Reported-by: Disha Goel <disgoel@linux.vnet.ibm.com> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Tested-by: Disha Goel<disgoel@linux.vnet.ibm.com> Reviewed-by: Madhavan Srinivasan <maddy@linux.ibm.com> Reviewed-by: Kajol Jain <kjain@linux.ibm.com> [mpe: Tweak comment and change log wording] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220921145255.20972-1-atrajeev@linux.vnet.ibm.com
|
#
e16fd7f2 |
|
01-Sep-2022 |
Kan Liang <kan.liang@linux.intel.com> |
perf: Use sample_flags for data_src Use the new sample_flags to indicate whether the data_src field is filled by the PMU driver. Remove the data_src field from the perf_sample_data_init() to minimize the number of cache lines touched. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20220901130959.1285717-6-kan.liang@linux.intel.com
|
#
2abe681d |
|
01-Sep-2022 |
Kan Liang <kan.liang@linux.intel.com> |
perf: Use sample_flags for weight Use the new sample_flags to indicate whether the weight field is filled by the PMU driver. Remove the weight field from the perf_sample_data_init() to minimize the number of cache lines touched. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20220901130959.1285717-5-kan.liang@linux.intel.com
|
#
a9a931e2 |
|
01-Sep-2022 |
Kan Liang <kan.liang@linux.intel.com> |
perf: Use sample_flags for branch stack Use the new sample_flags to indicate whether the branch stack is filled by the PMU driver. Remove the br_stack from the perf_sample_data_init() to minimize the number of cache lines touched. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20220901130959.1285717-4-kan.liang@linux.intel.com
|
#
6320e693 |
|
20-May-2022 |
Athira Rajeev <atrajeev@linux.vnet.ibm.com> |
powerpc/perf: Add support for caps under sysfs in powerpc Add caps support under "/sys/bus/event_source/devices/<pmu>/" for powerpc. This directory can be used to expose some of the specific features that powerpc PMU supports to the user. Example: pmu_name. The name of PMU registered will depend on platform, say power9 or power10 or it could be Generic Compat PMU. Currently the only way to know which is the registered PMU is from the dmesg logs. But clearing the dmesg will make it difficult to know exact PMU backend used. And even extracting from dmesg will be complicated, as we need to parse the dmesg logs and add filters for pmu name. Whereas by exposing it via caps will make it easy as we just need to directly read it from the sysfs. Add a caps directory to /sys/bus/event_source/devices/cpu/ for power8, power9, power10 and generic compat PMU in respective PMU driver code. Update the pmu_name file under caps folder in core-book3s using "attr_update". The information exposed currently: - pmu_name : Underlying PMU name from the driver Example result with power9 pmu: # ls /sys/bus/event_source/devices/cpu/caps pmu_name # cat /sys/bus/event_source/devices/cpu/caps/pmu_name POWER9 Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220520084630.15181-1-atrajeev@linux.vnet.ibm.com
|
#
890005a7 |
|
22-May-2022 |
Athira Rajeev <atrajeev@linux.vnet.ibm.com> |
powerpc/perf: Optimize clearing the pending PMI and remove WARN_ON for PMI check in power_pmu_disable commit 2c9ac51b850d ("powerpc/perf: Fix PMU callbacks to clear pending PMI before resetting an overflown PMC") added a new function "pmi_irq_pending" in hw_irq.h. This function is to check if there is a PMI marked as pending in Paca (PACA_IRQ_PMI).This is used in power_pmu_disable in a WARN_ON. The intention here is to provide a warning if there is PMI pending, but no counter is found overflown. During some of the perf runs, below warning is hit: WARNING: CPU: 36 PID: 0 at arch/powerpc/perf/core-book3s.c:1332 power_pmu_disable+0x25c/0x2c0 Modules linked in: ----- NIP [c000000000141c3c] power_pmu_disable+0x25c/0x2c0 LR [c000000000141c8c] power_pmu_disable+0x2ac/0x2c0 Call Trace: [c000000baffcfb90] [c000000000141c8c] power_pmu_disable+0x2ac/0x2c0 (unreliable) [c000000baffcfc10] [c0000000003e2f8c] perf_pmu_disable+0x4c/0x60 [c000000baffcfc30] [c0000000003e3344] group_sched_out.part.124+0x44/0x100 [c000000baffcfc80] [c0000000003e353c] __perf_event_disable+0x13c/0x240 [c000000baffcfcd0] [c0000000003dd334] event_function+0xc4/0x140 [c000000baffcfd20] [c0000000003d855c] remote_function+0x7c/0xa0 [c000000baffcfd50] [c00000000026c394] flush_smp_call_function_queue+0xd4/0x300 [c000000baffcfde0] [c000000000065b24] smp_ipi_demux_relaxed+0xa4/0x100 [c000000baffcfe20] [c0000000000cb2b0] xive_muxed_ipi_action+0x20/0x40 [c000000baffcfe40] [c000000000207c3c] __handle_irq_event_percpu+0x8c/0x250 [c000000baffcfee0] [c000000000207e2c] handle_irq_event_percpu+0x2c/0xa0 [c000000baffcff10] [c000000000210a04] handle_percpu_irq+0x84/0xc0 [c000000baffcff40] [c000000000205f14] generic_handle_irq+0x54/0x80 [c000000baffcff60] [c000000000015740] __do_irq+0x90/0x1d0 [c000000baffcff90] [c000000000016990] __do_IRQ+0xc0/0x140 [c0000009732f3940] [c000000bafceaca8] 0xc000000bafceaca8 [c0000009732f39d0] [c000000000016b78] do_IRQ+0x168/0x1c0 [c0000009732f3a00] [c0000000000090c8] hardware_interrupt_common_virt+0x218/0x220 This means that there is no PMC overflown among the active events in the PMU, but there is a PMU pending in Paca. The function "any_pmc_overflown" checks the PMCs on active events in cpuhw->n_events. Code snippet: <<>> if (any_pmc_overflown(cpuhw)) clear_pmi_irq_pending(); else WARN_ON(pmi_irq_pending()); <<>> Here the PMC overflown is not from active event. Example: When we do perf record, default cycles and instructions will be running on PMC6 and PMC5 respectively. It could happen that overflowed event is currently not active and pending PMI is for the inactive event. Debug logs from trace_printk: <<>> any_pmc_overflown: idx is 5: pmc value is 0xd9a power_pmu_disable: PMC1: 0x0, PMC2: 0x0, PMC3: 0x0, PMC4: 0x0, PMC5: 0xd9a, PMC6: 0x80002011 <<>> Here active PMC (from idx) is PMC5 , but overflown PMC is PMC6(0x80002011). When we handle PMI interrupt for such cases, if the PMC overflown is from inactive event, it will be ignored. Reference commit: commit bc09c219b2e6 ("powerpc/perf: Fix finding overflowed PMC in interrupt") Patch addresses two changes: 1) Fix 1 : Removal of warning ( WARN_ON(pmi_irq_pending()); ) We were printing warning if no PMC is found overflown among active PMU events, but PMI pending in PACA. But this could happen in cases where PMC overflown is not in active PMC. An inactive event could have caused the overflow. Hence the warning is not needed. To know pending PMI is from an inactive event, we need to loop through all PMC's which will cause more SPR reads via mfspr and increase in context switch. Also in existing function: perf_event_interrupt, already we ignore PMI's overflown when it is from an inactive PMC. 2) Fix 2: optimization in clearing pending PMI. Currently we check for any active PMC overflown before clearing PMI pending in Paca. This is causing additional SPR read also. From point 1, we know that if PMI pending in Paca from inactive cases, that is going to be ignored during replay. Hence if there is pending PMI in Paca, just clear it irrespective of PMC overflown or not. In summary, remove the any_pmc_overflown check entirely in power_pmu_disable. ie If there is a pending PMI in Paca, clear it, since we are in pmu_disable. There could be cases where PMI is pending because of inactive PMC ( which later when replayed also will get ignored ), so WARN_ON could give false warning. Hence removing it. Fixes: 2c9ac51b850d ("powerpc/perf: Fix PMU callbacks to clear pending PMI before resetting an overflown PMC") Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220522142256.24699-1-atrajeev@linux.vnet.ibm.com
|
#
1fd02f66 |
|
30-Apr-2022 |
Julia Lawall <Julia.Lawall@inria.fr> |
powerpc: fix typos in comments Various spelling mistakes in comments. Detected with the help of Coccinelle. Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr> Reviewed-by: Joel Stanley <joel@jms.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220430185654.5855-1-Julia.Lawall@inria.fr
|
#
fb6433b4 |
|
21-Jan-2022 |
Athira Rajeev <atrajeev@linux.vnet.ibm.com> |
powerpc/perf: Fix power_pmu_disable to call clear_pmi_irq_pending only if PMI is pending Running selftest with CONFIG_PPC_IRQ_SOFT_MASK_DEBUG enabled in kernel triggered below warning: [ 172.851380] ------------[ cut here ]------------ [ 172.851391] WARNING: CPU: 8 PID: 2901 at arch/powerpc/include/asm/hw_irq.h:246 power_pmu_disable+0x270/0x280 [ 172.851402] Modules linked in: dm_mod bonding nft_ct nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables rfkill nfnetlink sunrpc xfs libcrc32c pseries_rng xts vmx_crypto uio_pdrv_genirq uio sch_fq_codel ip_tables ext4 mbcache jbd2 sd_mod t10_pi sg ibmvscsi ibmveth scsi_transport_srp fuse [ 172.851442] CPU: 8 PID: 2901 Comm: lost_exception_ Not tainted 5.16.0-rc5-03218-g798527287598 #2 [ 172.851451] NIP: c00000000013d600 LR: c00000000013d5a4 CTR: c00000000013b180 [ 172.851458] REGS: c000000017687860 TRAP: 0700 Not tainted (5.16.0-rc5-03218-g798527287598) [ 172.851465] MSR: 8000000000029033 <SF,EE,ME,IR,DR,RI,LE> CR: 48004884 XER: 20040000 [ 172.851482] CFAR: c00000000013d5b4 IRQMASK: 1 [ 172.851482] GPR00: c00000000013d5a4 c000000017687b00 c000000002a10600 0000000000000004 [ 172.851482] GPR04: 0000000082004000 c0000008ba08f0a8 0000000000000000 00000008b7ed0000 [ 172.851482] GPR08: 00000000446194f6 0000000000008000 c00000000013b118 c000000000d58e68 [ 172.851482] GPR12: c00000000013d390 c00000001ec54a80 0000000000000000 0000000000000000 [ 172.851482] GPR16: 0000000000000000 0000000000000000 c000000015d5c708 c0000000025396d0 [ 172.851482] GPR20: 0000000000000000 0000000000000000 c00000000a3bbf40 0000000000000003 [ 172.851482] GPR24: 0000000000000000 c0000008ba097400 c0000000161e0d00 c00000000a3bb600 [ 172.851482] GPR28: c000000015d5c700 0000000000000001 0000000082384090 c0000008ba0020d8 [ 172.851549] NIP [c00000000013d600] power_pmu_disable+0x270/0x280 [ 172.851557] LR [c00000000013d5a4] power_pmu_disable+0x214/0x280 [ 172.851565] Call Trace: [ 172.851568] [c000000017687b00] [c00000000013d5a4] power_pmu_disable+0x214/0x280 (unreliable) [ 172.851579] [c000000017687b40] [c0000000003403ac] perf_pmu_disable+0x4c/0x60 [ 172.851588] [c000000017687b60] [c0000000003445e4] __perf_event_task_sched_out+0x1d4/0x660 [ 172.851596] [c000000017687c50] [c000000000d1175c] __schedule+0xbcc/0x12a0 [ 172.851602] [c000000017687d60] [c000000000d11ea8] schedule+0x78/0x140 [ 172.851608] [c000000017687d90] [c0000000001a8080] sys_sched_yield+0x20/0x40 [ 172.851615] [c000000017687db0] [c0000000000334dc] system_call_exception+0x18c/0x380 [ 172.851622] [c000000017687e10] [c00000000000c74c] system_call_common+0xec/0x268 The warning indicates that MSR_EE being set(interrupt enabled) when there was an overflown PMC detected. This could happen in power_pmu_disable since it runs under interrupt soft disable condition ( local_irq_save ) and not with interrupts hard disabled. commit 2c9ac51b850d ("powerpc/perf: Fix PMU callbacks to clear pending PMI before resetting an overflown PMC") intended to clear PMI pending bit in Paca when disabling the PMU. It could happen that PMC gets overflown while code is in power_pmu_disable callback function. Hence add a check to see if PMI pending bit is set in Paca before clearing it via clear_pmi_pending. Fixes: 2c9ac51b850d ("powerpc/perf: Fix PMU callbacks to clear pending PMI before resetting an overflown PMC") Reported-by: Sachin Sant <sachinp@linux.ibm.com> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Tested-by: Sachin Sant <sachinp@linux.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220122033429.25395-1-atrajeev@linux.vnet.ibm.com
|
#
429a64f6 |
|
13-Jan-2022 |
Athira Rajeev <atrajeev@linux.vnet.ibm.com> |
powerpc/perf: Only define power_pmu_wants_prompt_pmi() for CONFIG_PPC64 power_pmu_wants_prompt_pmi() is used to decide if PMIs should be taken promptly. This is valid only for ppc64 and is used only if CONFIG_PPC_BOOK3S_64=y. Hence include the function under config check for PPC64. Fixes warning for 32-bit compilation: arch/powerpc/perf/core-book3s.c:2455:6: warning: no previous prototype for 'power_pmu_wants_prompt_pmi' 2455 | bool power_pmu_wants_prompt_pmi(void) | ^~~~~~~~~~~~~~~~~~~~~~~~~~ Fixes: 5a7745b96f43 ("powerpc/64s/perf: add power_pmu_wants_prompt_pmi to say whether perf wants PMIs to be soft-NMI") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Move inside existing CONFIG_PPC64 ifdef block] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220114031355.87480-1-atrajeev@linux.vnet.ibm.com
|
#
c49f5d88 |
|
16-Dec-2021 |
Nick Child <nick.child@ibm.com> |
powerpc/perf: Add __init attribute to eligible functions Some functions defined in 'arch/powerpc/perf' are deserving of an `__init` macro attribute. These functions are only called by other initialization functions and therefore should inherit the attribute. Also, change function declarations in header files to include `__init`. Signed-off-by: Nick Child <nick.child@ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211216220035.605465-5-nick.child@ibm.com
|
#
5a7745b9 |
|
22-Sep-2021 |
Nicholas Piggin <npiggin@gmail.com> |
powerpc/64s/perf: add power_pmu_wants_prompt_pmi to say whether perf wants PMIs to be soft-NMI Interrupt code enables MSR[EE] in some irq handlers while keeping local irqs disabled via soft-mask, allowing PMI interrupts to be taken as soft-NMI to improve profiling of irq handlers. When perf is not enabled, there is no point to doing this, it's additional overhead. So provide a function that can say if PMIs should be taken promptly if possible. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210922145452.352571-4-npiggin@gmail.com
|
#
2c9ac51b |
|
20-Jul-2021 |
Athira Rajeev <atrajeev@linux.vnet.ibm.com> |
powerpc/perf: Fix PMU callbacks to clear pending PMI before resetting an overflown PMC Running perf fuzzer showed below in dmesg logs: "Can't find PMC that caused IRQ" This means a PMU exception happened, but none of the PMC's (Performance Monitor Counter) were found to be overflown. There are some corner cases that clears the PMCs after PMI gets masked. In such cases, the perf interrupt handler will not find the active PMC values that had caused the overflow and thus leads to this message while replaying. Case 1: PMU Interrupt happens during replay of other interrupts and counter values gets cleared by PMU callbacks before replay: During replay of interrupts like timer, __do_irq() and doorbell exception, we conditionally enable interrupts via may_hard_irq_enable(). This could potentially create a window to generate a PMI. Since irq soft mask is set to ALL_DISABLED, the PMI will get masked here. We could get IPIs run before perf interrupt is replayed and the PMU events could be deleted or stopped. This will change the PMU SPR values and resets the counters. Snippet of ftrace log showing PMU callbacks invoked in __do_irq(): <idle>-0 [051] dns. 132025441306354: __do_irq <-call_do_irq <idle>-0 [051] dns. 132025441306430: irq_enter <-__do_irq <idle>-0 [051] dns. 132025441306503: irq_enter_rcu <-__do_irq <idle>-0 [051] dnH. 132025441306599: xive_get_irq <-__do_irq <<>> <idle>-0 [051] dnH. 132025441307770: generic_smp_call_function_single_interrupt <-smp_ipi_demux_relaxed <idle>-0 [051] dnH. 132025441307839: flush_smp_call_function_queue <-smp_ipi_demux_relaxed <idle>-0 [051] dnH. 132025441308057: _raw_spin_lock <-event_function <idle>-0 [051] dnH. 132025441308206: power_pmu_disable <-perf_pmu_disable <idle>-0 [051] dnH. 132025441308337: power_pmu_del <-event_sched_out <idle>-0 [051] dnH. 132025441308407: power_pmu_read <-power_pmu_del <idle>-0 [051] dnH. 132025441308477: read_pmc <-power_pmu_read <idle>-0 [051] dnH. 132025441308590: isa207_disable_pmc <-power_pmu_del <idle>-0 [051] dnH. 132025441308663: write_pmc <-power_pmu_del <idle>-0 [051] dnH. 132025441308787: power_pmu_event_idx <-perf_event_update_userpage <idle>-0 [051] dnH. 132025441308859: rcu_read_unlock_strict <-perf_event_update_userpage <idle>-0 [051] dnH. 132025441308975: power_pmu_enable <-perf_pmu_enable <<>> <idle>-0 [051] dnH. 132025441311108: irq_exit <-__do_irq <idle>-0 [051] dns. 132025441311319: performance_monitor_exception <-replay_soft_interrupts Case 2: PMI's masked during local_* operations, example local_add(). If the local_add() operation happens within a local_irq_save(), replay of PMI will be during local_irq_restore(). Similar to case 1, this could also create a window before replay where PMU events gets deleted or stopped. Fix it by updating the PMU callback function power_pmu_disable() to check for pending perf interrupt. If there is an overflown PMC and pending perf interrupt indicated in paca, clear the PMI bit in paca to drop that sample. Clearing of PMI bit is done in power_pmu_disable() since disable is invoked before any event gets deleted/stopped. With this fix, if there are more than one event running in the PMU, there is a chance that we clear the PMI bit for the event which is not getting deleted/stopped. The other events may still remain active. Hence to make sure we don't drop valid sample in such cases, another check is added in power_pmu_enable. This checks if there is an overflown PMC found among the active events and if so enable back the PMI bit. Two new helper functions are introduced to clear/set the PMI, ie clear_pmi_irq_pending() and set_pmi_irq_pending(). Helper function pmi_irq_pending() is introduced to give a warning if there is pending PMI bit in paca, but no PMC is overflown. Also there are corner cases which result in performance monitor interrupts being triggered during power_pmu_disable(). This happens since PMXE bit is not cleared along with disabling of other MMCR0 bits in the pmu_disable. Such PMI's could leave the PMU running and could trigger PMI again which will set MMCR0 PMAO bit. This could lead to spurious interrupts in some corner cases. Example, a timer after power_pmu_del() which will re-enable interrupts and triggers a PMI again since PMAO bit is still set. But fails to find valid overflow since PMC was cleared in power_pmu_del(). Fix that by disabling PMXE along with disabling of other MMCR0 bits in power_pmu_disable(). We can't just replay PMI any time. Hence this approach is preferred rather than replaying PMI before resetting overflown PMC. Patch also documents core-book3s on a race condition which can trigger these PMC messages during idle path in PowerNV. Fixes: f442d004806e ("powerpc/64s: Add support to mask perf interrupts and replay them") Reported-by: Nageswara R Sastry <nasastry@in.ibm.com> Suggested-by: Nicholas Piggin <npiggin@gmail.com> Suggested-by: Madhavan Srinivasan <maddy@linux.ibm.com> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Tested-by: Nageswara R Sastry <rnsastry@linux.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Make pmi_irq_pending() return bool, reflow/reword some comments] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1626846509-1350-2-git-send-email-atrajeev@linux.vnet.ibm.com
|
#
0a4b4327 |
|
23-Nov-2021 |
Nicholas Piggin <npiggin@gmail.com> |
powerpc/64s: Implement PMU override command line option It can be useful in simulators (with very constrained environments) to allow some PMCs to run from boot so they can be sampled directly by a test harness, rather than having to run perf. A previous change freezes counters at boot by default, so provide a boot time option to un-freeze (plus a bit more flexibility). Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211123095231.1036501-13-npiggin@gmail.com
|
#
3c69a5f2 |
|
18-Aug-2021 |
Kajol Jain <kjain@linux.ibm.com> |
powerpc/perf: Fix the check for SIAR value Incase of random sampling, there can be scenarios where Sample Instruction Address Register(SIAR) may not latch to the sampled instruction and could result in the value of 0. In these scenarios it is preferred to return regs->nip. These corner cases are seen in the previous generation (p9) also. Patch adds the check for SIAR value along with regs_use_siar and siar_valid checks so that the function will return regs->nip incase SIAR is zero. Patch drops the code under PPMU_P10_DD1 flag check which handles SIAR 0 case only for Power10 DD1. Fixes: 2ca13a4cc56c9 ("powerpc/perf: Use regs->nip when SIAR is zero") Signed-off-by: Kajol Jain <kjain@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210818171556.36912-3-kjain@linux.ibm.com
|
#
cc90c674 |
|
18-Aug-2021 |
Kajol Jain <kjain@linux.ibm.com> |
powerpc/perf: Drop the case of returning 0 as instruction pointer Drop the case of returning 0 as instruction pointer since kernel never executes at 0 and userspace almost never does either. Fixes: e6878835ac47 ("powerpc/perf: Sample only if SIAR-Valid bit is set in P7+") Signed-off-by: Kajol Jain <kjain@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210818171556.36912-2-kjain@linux.ibm.com
|
#
b1643084 |
|
18-Aug-2021 |
Kajol Jain <kjain@linux.ibm.com> |
powerpc/perf: Use stack siar instead of mfspr Minor optimization in the 'perf_instruction_pointer' function code by making use of stack siar instead of mfspr. Fixes: 75382aa72f06 ("powerpc/perf: Move code to select SIAR or pt_regs into perf_read_regs") Signed-off-by: Kajol Jain <kjain@linux.ibm.com> Tested-by: Nageswara R Sastry <rnsastry@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210818171556.36912-1-kjain@linux.ibm.com
|
#
cf9c615c |
|
20-Jul-2021 |
Nicholas Piggin <npiggin@gmail.com> |
powerpc/64s/perf: Always use SIAR for kernel interrupts If an interrupt is taken in kernel mode, always use SIAR for it rather than looking at regs_sipr. This prevents samples piling up around interrupt enable (hard enable or interrupt replay via soft enable) in PMUs / modes where the PR sample indication is not in synch with SIAR. This results in better sampling of interrupt entry and exit in particular. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Tested-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210720141504.420110-1-npiggin@gmail.com
|
#
69d4d6e5 |
|
20-May-2021 |
Christophe Leroy <christophe.leroy@csgroup.eu> |
powerpc: Don't use 'struct ppc_inst' to reference instruction location 'struct ppc_inst' is an internal representation of an instruction, but in-memory instructions are and will remain a table of 'u32' forever. Replace all 'struct ppc_inst *' used for locating an instruction in memory by 'u32 *'. This removes a lot of undue casts to 'struct ppc_inst *'. It also helps locating ab-use of 'struct ppc_inst' dereference. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> [mpe: Fix ppc_inst_next(), use u32 instead of unsigned int] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/7062722b087228e42cbd896e39bfdf526d6a340a.1621516826.git.christophe.leroy@csgroup.eu
|
#
60b7ed54 |
|
17-Jun-2021 |
Athira Rajeev <atrajeev@linux.vnet.ibm.com> |
powerpc/perf: Fix crash in perf_instruction_pointer() when ppmu is not set On systems without any specific PMU driver support registered, running perf record causes Oops. The relevant portion from call trace: BUG: Kernel NULL pointer dereference on read at 0x00000040 Faulting instruction address: 0xc0021f0c Oops: Kernel access of bad area, sig: 11 [#1] BE PAGE_SIZE=4K PREEMPT CMPCPRO SAF3000 DIE NOTIFICATION CPU: 0 PID: 442 Comm: null_syscall Not tainted 5.13.0-rc6-s3k-dev-01645-g7649ee3d2957 #5164 NIP: c0021f0c LR: c00e8ad8 CTR: c00d8a5c NIP perf_instruction_pointer+0x10/0x60 LR perf_prepare_sample+0x344/0x674 Call Trace: perf_prepare_sample+0x7c/0x674 (unreliable) perf_event_output_forward+0x3c/0x94 __perf_event_overflow+0x74/0x14c perf_swevent_hrtimer+0xf8/0x170 __hrtimer_run_queues.constprop.0+0x160/0x318 hrtimer_interrupt+0x148/0x3b0 timer_interrupt+0xc4/0x22c Decrementer_virt+0xb8/0xbc During perf record session, perf_instruction_pointer() is called to capture the sample IP. This function in core-book3s accesses ppmu->flags. If a platform specific PMU driver is not registered, ppmu is set to NULL and accessing its members results in a crash. Fix this crash by checking if ppmu is set. Fixes: 2ca13a4cc56c ("powerpc/perf: Use regs->nip when SIAR is zero") Cc: stable@vger.kernel.org # v5.11+ Reported-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Tested-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1623952506-1431-1-git-send-email-atrajeev@linux.vnet.ibm.com
|
#
af31fd0c |
|
22-Mar-2021 |
Athira Rajeev <atrajeev@linux.vnet.ibm.com> |
powerpc/perf: Expose processor pipeline stage cycles using PERF_SAMPLE_WEIGHT_STRUCT Performance Monitoring Unit (PMU) registers in powerpc provides information on cycles elapsed between different stages in the pipeline. This can be used for application tuning. On ISA v3.1 platform, this information is exposed by sampling registers. Patch adds kernel support to capture two of the cycle counters as part of perf sample using the sample type: PERF_SAMPLE_WEIGHT_STRUCT. The power PMU function 'get_mem_weight' currently uses 64 bit weight field of perf_sample_data to capture memory latency. But following the introduction of PERF_SAMPLE_WEIGHT_TYPE, weight field could contain 64-bit or 32-bit value depending on the architexture support for PERF_SAMPLE_WEIGHT_STRUCT. Patches uses WEIGHT_STRUCT to expose the pipeline stage cycles info. Hence update the ppmu functions to work for 64-bit and 32-bit weight values. If the sample type is PERF_SAMPLE_WEIGHT, use the 64-bit weight field. if the sample type is PERF_SAMPLE_WEIGHT_STRUCT, memory subsystem latency is stored in the low 32bits of perf_sample_weight structure. Also for CPU_FTR_ARCH_31, capture the two cycle counter information in two 16 bit fields of perf_sample_weight structure. Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Reviewed-by: Madhavan Srinivasan <maddy@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1616425047-1666-2-git-send-email-atrajeev@linux.vnet.ibm.com
|
#
7153d4bf |
|
14-Apr-2021 |
Xiongwei Song <sxwjean@gmail.com> |
powerpc/traps: Enhance readability for trap types Define macros to list ppc interrupt types in interttupt.h, replace the reference of the trap hex values with these macros. Referred the hex numbers in arch/powerpc/kernel/exceptions-64e.S, arch/powerpc/kernel/exceptions-64s.S, arch/powerpc/kernel/head_*.S, arch/powerpc/kernel/head_booke.h and arch/powerpc/include/asm/kvm_asm.h. Signed-off-by: Xiongwei Song <sxwjean@gmail.com> [mpe: Resolve conflicts in nmi_disables_ftrace(), fix 40x build] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1618398033-13025-1-git-send-email-sxwjean@me.com
|
#
2e2a441d |
|
08-Apr-2021 |
Madhavan Srinivasan <maddy@linux.ibm.com> |
powerpc/perf: Infrastructure to support checking of attr.config* Introduce code to support the checking of attr.config* for values which are reserved for a given platform. Performance Monitoring Unit (PMU) configuration registers have fields that are reserved and some specific values for bit fields are reserved. For ex., MMCRA[61:62] is Random Sampling Mode (SM) and value of 0b11 for this field is reserved. Writing non-zero or invalid values in these fields will have unknown behaviours. Patch adds a generic call-back function "check_attr_config" in "struct power_pmu", to be called in event_init to check for attr.config* values for a given platform. Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210408074504.248211-1-maddy@linux.ibm.com
|
#
5ae5fbd2 |
|
25-Feb-2021 |
Athira Rajeev <atrajeev@linux.vnet.ibm.com> |
powerpc/perf: Fix handling of privilege level checks in perf interrupt context Running "perf mem record" in powerpc platforms with selinux enabled resulted in soft lockup's. Below call-trace was seen in the logs: CPU: 58 PID: 3751 Comm: sssd_nss Not tainted 5.11.0-rc7+ #2 NIP: c000000000dff3d4 LR: c000000000dff3d0 CTR: 0000000000000000 REGS: c000007fffab7d60 TRAP: 0100 Not tainted (5.11.0-rc7+) ... NIP _raw_spin_lock_irqsave+0x94/0x120 LR _raw_spin_lock_irqsave+0x90/0x120 Call Trace: 0xc00000000fd47260 (unreliable) skb_queue_tail+0x3c/0x90 audit_log_end+0x6c/0x180 common_lsm_audit+0xb0/0xe0 slow_avc_audit+0xa4/0x110 avc_has_perm+0x1c4/0x260 selinux_perf_event_open+0x74/0xd0 security_perf_event_open+0x68/0xc0 record_and_restart+0x6e8/0x7f0 perf_event_interrupt+0x22c/0x560 performance_monitor_exception0x4c/0x60 performance_monitor_common_virt+0x1c8/0x1d0 interrupt: f00 at _raw_spin_lock_irqsave+0x38/0x120 NIP: c000000000dff378 LR: c000000000b5fbbc CTR: c0000000007d47f0 REGS: c00000000fd47860 TRAP: 0f00 Not tainted (5.11.0-rc7+) ... NIP _raw_spin_lock_irqsave+0x38/0x120 LR skb_queue_tail+0x3c/0x90 interrupt: f00 0x38 (unreliable) 0xc00000000aae6200 audit_log_end+0x6c/0x180 audit_log_exit+0x344/0xf80 __audit_syscall_exit+0x2c0/0x320 do_syscall_trace_leave+0x148/0x200 syscall_exit_prepare+0x324/0x390 system_call_common+0xfc/0x27c The above trace shows that while the CPU was handling a performance monitor exception, there was a call to security_perf_event_open() function. In powerpc core-book3s, this function is called from perf_allow_kernel() check during recording of data address in the sample via perf_get_data_addr(). Commit da97e18458fb ("perf_event: Add support for LSM and SELinux checks") introduced security enhancements to perf. As part of this commit, the new security hook for perf_event_open() was added in all places where perf paranoid check was previously used. In powerpc core-book3s code, originally had paranoid checks in perf_get_data_addr() and power_pmu_bhrb_read(). So perf_paranoid_kernel() checks were replaced with perf_allow_kernel() in these PMU helper functions as well. The intention of paranoid checks in core-book3s was to verify privilege access before capturing some of the sample data. Along with paranoid checks, perf_allow_kernel() also does a security_perf_event_open(). Since these functions are accessed while recording a sample, we end up calling selinux_perf_event_open() in PMI context. Some of the security functions use spinlock like sidtab_sid2str_put(). If a perf interrupt hits under a spin lock and if we end up in calling selinux hook functions in PMI handler, this could cause a dead lock. Since the purpose of this security hook is to control access to perf_event_open(), it is not right to call this in interrupt context. The paranoid checks in powerpc core-book3s were done at interrupt time which is also not correct. Reference commits: Commit cd1231d7035f ("powerpc/perf: Prevent kernel address leak via perf_get_data_addr()") Commit bb19af816025 ("powerpc/perf: Prevent kernel address leak to userspace via BHRB buffer") We only allow creation of events that have already passed the privilege checks in perf_event_open(). So these paranoid checks are not needed at event time. As a fix, patch uses 'event->attr.exclude_kernel' check to prevent exposing kernel address for userspace only sampling. Fixes: cd1231d7035f ("powerpc/perf: Prevent kernel address leak via perf_get_data_addr()") Cc: stable@vger.kernel.org # v4.17+ Suggested-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1614247839-1428-1-git-send-email-atrajeev@linux.vnet.ibm.com
|
#
82d2c16b |
|
09-Feb-2021 |
Kajol Jain <kjain@linux.ibm.com> |
powerpc/perf: Adds support for programming of Thresholding in P10 Thresholding, a performance monitoring unit feature, can be used to identify marked instructions which take more than expected cycles between start event and end event. Threshold compare (thresh_cmp) bits are programmed in MMCRA register. In Power9, thresh_cmp bits were part of the event code. But in case of P10, thresh_cmp are not part of event code due to inclusion of MMCR3 bits. Patch here adds an option to use attr.config1 variable to be used to pass thresh_cmp value to be programmed in MMCRA register. A new ppmu flag called PPMU_HAS_ATTR_CONFIG1 has been added and this flag is used to notify the use of attr.config1 variable. Patch has extended the parameter list of 'compute_mmcr', to include power_pmu's 'flags' element and parameter list of get_constraint to include attr.config1 value. It also extend parameter list of power_check_constraints inorder to pass perf_event list. As stated by commit ef0e3b650f8d ("powerpc/perf: Fix Threshold Event Counter Multiplier width for P10"), constraint bits for thresh_cmp is also needed to be increased to 11 bits, which is handled as part of this patch. We added bit number 53 as part of constraint bits of thresh_cmp for power10 to make it an 11 bit field. Updated layout for p10: /* * Layout of constraint bits: * * 60 56 52 48 44 40 36 32 * | - - - - | - - - - | - - - - | - - - - | - - - - | - - - - | - - - - | - - - - | * [ fab_match ] [ thresh_cmp ] [ thresh_ctl ] [ ] * | | * [ thresh_cmp bits for p10] thresh_sel -* * * 28 24 20 16 12 8 4 0 * | - - - - | - - - - | - - - - | - - - - | - - - - | - - - - | - - - - | - - - - | * [ ] | [ ] | [ sample ] [ ] [6] [5] [4] [3] [2] [1] * | | | | | * BHRB IFM -* | | |*radix_scope | Count of events for each PMC. * EBB -* | | p1, p2, p3, p4, p5, p6. * L1 I/D qualifier -* | * nc - number of counters -* * * The PMC fields P1..P6, and NC, are adder fields. As we accumulate constraints * we want the low bit of each field to be added to any existing value. * * Everything else is a value field. */ Result: command#: cat /sys/devices/cpu/format/thresh_cmp config1:0-17 ex. usage: command#: perf record -I --weight -d -e cpu/event=0x67340101EC,thresh_cmp=500/ ./ebizzy -S 2 -t 1 -s 4096 1826636 records/s real 2.00 s user 2.00 s sys 0.00 s [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.038 MB perf.data (61 samples) ] Signed-off-by: Kajol Jain <kjain@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210209095234.837356-1-kjain@linux.ibm.com
|
#
d137845c |
|
05-Feb-2021 |
Athira Rajeev <atrajeev@linux.vnet.ibm.com> |
powerpc/perf: Record counter overflow always if SAMPLE_IP is unset While sampling for marked events, currently we record the sample only if the SIAR valid bit of Sampled Instruction Event Register (SIER) is set. SIAR_VALID bit is used for fetching the instruction address from Sampled Instruction Address Register(SIAR). But there are some usecases, where the user is interested only in the PMU stats at each counter overflow and the exact IP of the overflow event is not required. Dropping SIAR invalid samples will fail to record some of the counter overflows in such cases. Example of such usecase is dumping the PMU stats (event counts) after some regular amount of instructions/events from the userspace (ex: via ptrace). Here counter overflow is indicated to userspace via signal handler, and captured by monitoring and enabling I/O signaling on the event file descriptor. In these cases, we expect to get sample/overflow indication after each specified sample_period. Perf event attribute will not have PERF_SAMPLE_IP set in the sample_type if exact IP of the overflow event is not requested. So while profiling if SAMPLE_IP is not set, just record the counter overflow irrespective of SIAR_VALID check. Suggested-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> [mpe: Reflow comment and if formatting] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1612516492-1428-1-git-send-email-atrajeev@linux.vnet.ibm.com
|
#
e79b76e0 |
|
02-Feb-2021 |
Athira Rajeev <atrajeev@linux.vnet.ibm.com> |
powerpc/perf: Expose Performance Monitor Counter SPR's as part of extended regs Currently Monitor Mode Control Registers and Sampling registers are part of extended regs. Patch adds support to include Performance Monitor Counter Registers (PMC1 to PMC6 ) as part of extended registers. PMCs are saved in the perf interrupt handler as part of per-cpu array 'pmcs' in struct cpu_hw_events. While capturing the register values for extended regs, fetch these saved PMC values. Simplified the PERF_REG_PMU_MASK_300/31 definition to include PMU SPRs MMCR0 to PMC6. Exclude the unsupported SPRs (MMCR3, SIER2, SIER3) from extended mask value for CPU_FTR_ARCH_300 in the new definition. PERF_REG_EXTENDED_MAX is used to check if any index beyond the extended registers is requested in the sample. Have one PERF_REG_EXTENDED_MAX for CPU_FTR_ARCH_300/CPU_FTR_ARCH_31 since perf_reg_validate function already checks the extended mask for the presence of any unsupported register. Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1612335337-1888-3-git-send-email-atrajeev@linux.vnet.ibm.com
|
#
91f3469a |
|
02-Feb-2021 |
Athira Rajeev <atrajeev@linux.vnet.ibm.com> |
powerpc/perf: Include PMCs as part of per-cpu cpuhw_events struct To support capturing of PMC's as part of extended registers, the value of SPR's PMC1 to PMC6 has to be saved in the starting of PMI interrupt handler. This is needed since we are resetting the overflown PMC before creating sample and hence directly reading SPRN_PMCx in 'perf_reg_value' will be capturing the modified value. To solve this, add a per-cpu array as part of structure cpu_hw_events and use this array to capture PMC values in the perf interrupt handler. Patch also re-factor's the interrupt handler code to use this per-cpu array instead of current local array. Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1612335337-1888-2-git-send-email-atrajeev@linux.vnet.ibm.com
|
#
156b5371 |
|
30-Jan-2021 |
Nicholas Piggin <npiggin@gmail.com> |
powerpc/perf: move perf irq/nmi handling details into traps.c This is required in order to allow more significant differences between NMI type interrupt handlers and regular asynchronous handlers. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210130130852.2952424-20-npiggin@gmail.com
|
#
2a6c6b7d |
|
28-Jan-2021 |
Kan Liang <kan.liang@linux.intel.com> |
perf/core: Add PERF_SAMPLE_WEIGHT_STRUCT Current PERF_SAMPLE_WEIGHT sample type is very useful to expresses the cost of an action represented by the sample. This allows the profiler to scale the samples to be more informative to the programmer. It could also help to locate a hotspot, e.g., when profiling by memory latencies, the expensive load appear higher up in the histograms. But current PERF_SAMPLE_WEIGHT sample type is solely determined by one factor. This could be a problem, if users want two or more factors to contribute to the weight. For example, Golden Cove core PMU can provide both the instruction latency and the cache Latency information as factors for the memory profiling. For current X86 platforms, although meminfo::latency is defined as a u64, only the lower 32 bits include the valid data in practice (No memory access could last than 4G cycles). The higher 32 bits can be used to store new factors. Add a new sample type, PERF_SAMPLE_WEIGHT_STRUCT, to indicate the new sample weight structure. It shares the same space as the PERF_SAMPLE_WEIGHT sample type. Users can apply either the PERF_SAMPLE_WEIGHT sample type or the PERF_SAMPLE_WEIGHT_STRUCT sample type to retrieve the sample weight, but they cannot apply both sample types simultaneously. Currently, only X86 and PowerPC use the PERF_SAMPLE_WEIGHT sample type. - For PowerPC, there is nothing changed for the PERF_SAMPLE_WEIGHT sample type. There is no effect for the new PERF_SAMPLE_WEIGHT_STRUCT sample type. PowerPC can re-struct the weight field similarly later. - For X86, the same value will be dumped for the PERF_SAMPLE_WEIGHT sample type or the PERF_SAMPLE_WEIGHT_STRUCT sample type for now. The following patches will apply the new factors for the PERF_SAMPLE_WEIGHT_STRUCT sample type. The field in the union perf_sample_weight should be shared among different architectures. A generic name is required, but it's hard to abstract a name that applies to all architectures. For example, on X86, the fields are to store all kinds of latency. While on PowerPC, it stores MMCRA[TECX/TECM], which should not be latency. So a general name prefix 'var$NUM' is used here. Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/1611873611-156687-2-git-send-email-kan.liang@linux.intel.com
|
#
aa8e21c0 |
|
25-Nov-2020 |
Athira Rajeev <atrajeev@linux.vnet.ibm.com> |
powerpc/perf: Exclude kernel samples while counting events in user space. Perf event attritube supports exclude_kernel flag to avoid sampling/profiling in supervisor state (kernel). Based on this event attr flag, Monitor Mode Control Register bit is set to freeze on supervisor state. But sometimes (due to hardware limitation), Sampled Instruction Address Register (SIAR) locks on to kernel address even when freeze on supervisor is set. Patch here adds a check to drop those samples. Cc: stable@vger.kernel.org Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1606289215-1433-1-git-send-email-atrajeev@linux.vnet.ibm.com
|
#
91668ab7 |
|
26-Nov-2020 |
Athira Rajeev <atrajeev@linux.vnet.ibm.com> |
powerpc/perf: MMCR0 control for PMU registers under PMCC=00 PowerISA v3.1 introduces new control bit (PMCCEXT) for restricting access to group B PMU registers in problem state when MMCR0 PMCC=0b00. In problem state and when MMCR0 PMCC=0b00, setting the Monitor Mode Control Register bit 54 (MMCR0 PMCCEXT), will restrict read permission on Group B Performance Monitor Registers (SIER, SIAR, SDAR and MMCR1). When this bit is set to zero, group B registers will be readable. In other platforms (like power9), the older behaviour is retained where group B PMU SPRs are readable. Patch adds support for MMCR0 PMCCEXT bit in power10 by enabling this bit during boot and during the PMU event enable/disable callback functions. Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1606409684-1589-8-git-send-email-atrajeev@linux.vnet.ibm.com
|
#
f66de7ac |
|
01-Dec-2020 |
Athira Rajeev <atrajeev@linux.vnet.ibm.com> |
powerpc/perf: Invoke per-CPU variable access with disabled interrupts The power_pmu_event_init() callback access per-cpu variable (cpu_hw_events) to check for event constraints and Branch Stack (BHRB). Current usage is to disable preemption when accessing the per-cpu variable, but this does not prevent timer callback from interrupting event_init. Fix this by using 'local_irq_save/restore' to make sure the code path is invoked with disabled interrupts. This change is tested in mambo simulator to ensure that, if a timer interrupt comes in during the per-cpu access in event_init, it will be soft masked and replayed later. For testing purpose, introduced a udelay() in power_pmu_event_init() to make sure a timer interrupt arrives while in per-cpu variable access code between local_irq_save/resore. As expected the timer interrupt was replayed later during local_irq_restore called from power_pmu_event_init. This was confirmed by adding breakpoint in mambo and checking the backtrace when timer_interrupt was hit. Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1606814880-1720-1-git-send-email-atrajeev@linux.vnet.ibm.com
|
#
f75e7d73 |
|
23-Nov-2020 |
Athira Rajeev <atrajeev@linux.vnet.ibm.com> |
powerpc/perf: Fix crash with is_sier_available when pmu is not set On systems without any specific PMU driver support registered, running 'perf record' with —intr-regs will crash ( perf record -I <workload> ). The relevant portion from crash logs and Call Trace: Unable to handle kernel paging request for data at address 0x00000068 Faulting instruction address: 0xc00000000013eb18 Oops: Kernel access of bad area, sig: 11 [#1] CPU: 2 PID: 13435 Comm: kill Kdump: loaded Not tainted 4.18.0-193.el8.ppc64le #1 NIP: c00000000013eb18 LR: c000000000139f2c CTR: c000000000393d80 REGS: c0000004a07ab4f0 TRAP: 0300 Not tainted (4.18.0-193.el8.ppc64le) NIP [c00000000013eb18] is_sier_available+0x18/0x30 LR [c000000000139f2c] perf_reg_value+0x6c/0xb0 Call Trace: [c0000004a07ab770] [c0000004a07ab7c8] 0xc0000004a07ab7c8 (unreliable) [c0000004a07ab7a0] [c0000000003aa77c] perf_output_sample+0x60c/0xac0 [c0000004a07ab840] [c0000000003ab3f0] perf_event_output_forward+0x70/0xb0 [c0000004a07ab8c0] [c00000000039e208] __perf_event_overflow+0x88/0x1a0 [c0000004a07ab910] [c00000000039e42c] perf_swevent_hrtimer+0x10c/0x1d0 [c0000004a07abc50] [c000000000228b9c] __hrtimer_run_queues+0x17c/0x480 [c0000004a07abcf0] [c00000000022aaf4] hrtimer_interrupt+0x144/0x520 [c0000004a07abdd0] [c00000000002a864] timer_interrupt+0x104/0x2f0 [c0000004a07abe30] [c0000000000091c4] decrementer_common+0x114/0x120 When perf record session is started with "-I" option, capturing registers on each sample calls is_sier_available() to check for the SIER (Sample Instruction Event Register) availability in the platform. This function in core-book3s accesses 'ppmu->flags'. If a platform specific PMU driver is not registered, ppmu is set to NULL and accessing its members results in a crash. Fix the crash by returning false in is_sier_available() if ppmu is not set. Fixes: 333804dc3b7a ("powerpc/perf: Update perf_regs structure to include SIER") Reported-by: Sachin Sant <sachinp@linux.vnet.ibm.com> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1606185640-1720-1-git-send-email-atrajeev@linux.vnet.ibm.com
|
#
2ca13a4c |
|
21-Oct-2020 |
Madhavan Srinivasan <maddy@linux.ibm.com> |
powerpc/perf: Use regs->nip when SIAR is zero In power10 DD1, there is an issue where the SIAR (Sampled Instruction Address Register) is not latching to the sampled address during random sampling. This results in value of 0s in the SIAR. Add a check to use regs->nip when SIAR is zero. Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201021085329.384535-5-maddy@linux.ibm.com
|
#
d9f7088d |
|
21-Oct-2020 |
Athira Rajeev <atrajeev@linux.vnet.ibm.com> |
powerpc/perf: Use the address from SIAR register to set cpumode flags While setting the processor mode for any sample, perf_get_misc_flags() expects the privilege level to differentiate the userspace and kernel address. On power10 DD1, there is an issue that causes MSR_HV MSR_PR bits of Sampled Instruction Event Register (SIER) not to be set for marked events. Hence add a check to use the address in SIAR (Sampled Instruction Address Register) to identify the privilege level. Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201021085329.384535-3-maddy@linux.ibm.com
|
#
fdf13a65 |
|
21-Oct-2020 |
Athira Rajeev <atrajeev@linux.vnet.ibm.com> |
powerpc/perf: Drop the check for SIAR_VALID In power10 DD1, there is an issue that causes the SIAR_VALID bit of the SIER (Sampled Instruction Event Register) to not be set. But the SIAR_VALID bit is used for fetching the instruction address from the SIAR (Sampled Instruction Address Register), and marked events are sampled only if the SIAR_VALID bit is set. So drop the check for SIAR_VALID and return true always incase of power10 DD1. Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201021085329.384535-2-maddy@linux.ibm.com
|
#
4cb6a42e |
|
01-Oct-2020 |
Kan Liang <kan.liang@linux.intel.com> |
powerpc/perf: Support PERF_SAMPLE_DATA_PAGE_SIZE The new sample type, PERF_SAMPLE_DATA_PAGE_SIZE, requires the virtual address. Update the data->addr if the sample type is set. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20201001135749.2804-4-kan.liang@linux.intel.com
|
#
b460b512 |
|
01-Jun-2020 |
Alexey Kardashevskiy <aik@ozlabs.ru> |
powerpc/perf: Fix crashes with generic_compat_pmu & BHRB The bhrb_filter_map ("The Branch History Rolling Buffer") callback is only defined in raw CPUs' power_pmu structs. The "architected" CPUs use generic_compat_pmu, which does not have this callback, and crashes occur if a user tries to enable branch stack for an event. This add a NULL pointer check for bhrb_filter_map() which behaves as if the callback returned an error. This does not add the same check for config_bhrb() as the only caller checks for cpuhw->bhrb_users which remains zero if bhrb_filter_map==0. Fixes: be80e758d0c2 ("powerpc/perf: Add generic compat mode pmu driver") Cc: stable@vger.kernel.org # v5.2+ Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Madhavan Srinivasan <maddy@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200602025612.62707-1-aik@ozlabs.ru
|
#
17899eaf |
|
06-Aug-2020 |
Athira Rajeev <atrajeev@linux.vnet.ibm.com> |
powerpc/perf: Fix soft lockups due to missed interrupt accounting Performance monitor interrupt handler checks if any counter has overflown and calls record_and_restart() in core-book3s which invokes perf_event_overflow() to record the sample information. Apart from creating sample, perf_event_overflow() also does the interrupt and period checks via perf_event_account_interrupt(). Currently we record information only if the SIAR (Sampled Instruction Address Register) valid bit is set (using siar_valid() check) and hence the interrupt check. But it is possible that we do sampling for some events that are not generating valid SIAR, and hence there is no chance to disable the event if interrupts are more than max_samples_per_tick. This leads to soft lockup. Fix this by adding perf_event_account_interrupt() in the invalid SIAR code path for a sampling event. ie if SIAR is invalid, just do interrupt check and don't record the sample information. Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1596717992-7321-1-git-send-email-atrajeev@linux.vnet.ibm.com
|
#
781fa481 |
|
07-Aug-2020 |
Anju T Sudhakar <anju@linux.vnet.ibm.com> |
powerpc/perf: Add support for outputting extended regs in perf intr_regs Add support for perf extended register capability in powerpc. The capability flag PERF_PMU_CAP_EXTENDED_REGS, is used to indicate the PMU which support extended registers. The generic code define the mask of extended registers as 0 for non supported architectures. Patch adds extended regs support for power9 platform by exposing MMCR0, MMCR1 and MMCR2 registers. REG_RESERVED mask needs update to include extended regs. PERF_REG_EXTENDED_MASK, contains mask value of the supported registers, is defined at runtime in the kernel based on platform since the supported registers may differ from one processor version to another and hence the MASK value. With the patch: available registers: r0 r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 r11 r12 r13 r14 r15 r16 r17 r18 r19 r20 r21 r22 r23 r24 r25 r26 r27 r28 r29 r30 r31 nip msr orig_r3 ctr link xer ccr softe trap dar dsisr sier mmcra mmcr0 mmcr1 mmcr2 PERF_RECORD_SAMPLE(IP, 0x1): 4784/4784: 0 period: 1 addr: 0 ... intr regs: mask 0xffffffffffff ABI 64-bit .... r0 0xc00000000012b77c .... r1 0xc000003fe5e03930 .... r2 0xc000000001b0e000 .... r3 0xc000003fdcddf800 .... r4 0xc000003fc7880000 .... r5 0x9c422724be .... r6 0xc000003fe5e03908 .... r7 0xffffff63bddc8706 .... r8 0x9e4 .... r9 0x0 .... r10 0x1 .... r11 0x0 .... r12 0xc0000000001299c0 .... r13 0xc000003ffffc4800 .... r14 0x0 .... r15 0x7fffdd8b8b00 .... r16 0x0 .... r17 0x7fffdd8be6b8 .... r18 0x7e7076607730 .... r19 0x2f .... r20 0xc00000001fc26c68 .... r21 0xc0002041e4227e00 .... r22 0xc00000002018fb60 .... r23 0x1 .... r24 0xc000003ffec4d900 .... r25 0x80000000 .... r26 0x0 .... r27 0x1 .... r28 0x1 .... r29 0xc000000001be1260 .... r30 0x6008010 .... r31 0xc000003ffebb7218 .... nip 0xc00000000012b910 .... msr 0x9000000000009033 .... orig_r3 0xc00000000012b86c .... ctr 0xc0000000001299c0 .... link 0xc00000000012b77c .... xer 0x0 .... ccr 0x28002222 .... softe 0x1 .... trap 0xf00 .... dar 0x0 .... dsisr 0x80000000000 .... sier 0x0 .... mmcra 0x80000000000 .... mmcr0 0x82008090 .... mmcr1 0x1e000000 .... mmcr2 0x0 ... thread: perf:4784 Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Tested-by: Nageswara R Sastry <nasastry@in.ibm.com> Reviewed-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Reviewed-by: Kajol Jain <kjain@linux.ibm.com> Reviewed-and-tested-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1596794701-23530-2-git-send-email-atrajeev@linux.vnet.ibm.com
|
#
909adfc6 |
|
27-Jul-2020 |
Nicholas Piggin <npiggin@gmail.com> |
powerpc/64s/hash: Fix hash_preload running with interrupts enabled Commit 2f92447f9f96 ("powerpc/book3s64/hash: Use the pte_t address from the caller") removed the local_irq_disable from hash_preload, but it was required for more than just the page table walk: the hash pte busy bit is effectively a lock which may be taken in interrupt context, and the local update flag test must not be preempted before it's used. This solves apparent lockups with perf interrupting __hash_page_64K. If get_perf_callchain then also takes a hash fault on the same page while it is already locked, it will loop forever taking hash faults, which looks like this: cpu 0x49e: Vector: 100 (System Reset) at [c00000001a4f7d70] pc: c000000000072dc8: hash_page_mm+0x8/0x800 lr: c00000000000c5a4: do_hash_page+0x24/0x38 sp: c0002ac1cc69ac70 msr: 8000000000081033 current = 0xc0002ac1cc602e00 paca = 0xc00000001de1f280 irqmask: 0x03 irq_happened: 0x01 pid = 20118, comm = pread2_processe Linux version 5.8.0-rc6-00345-g1fad14f18bc6 49e:mon> t [c0002ac1cc69ac70] c00000000000c5a4 do_hash_page+0x24/0x38 (unreliable) --- Exception: 300 (Data Access) at c00000000008fa60 __copy_tofrom_user_power7+0x20c/0x7ac [link register ] c000000000335d10 copy_from_user_nofault+0xf0/0x150 [c0002ac1cc69af70] c00032bf9fa3c880 (unreliable) [c0002ac1cc69afa0] c000000000109df0 read_user_stack_64+0x70/0xf0 [c0002ac1cc69afd0] c000000000109fcc perf_callchain_user_64+0x15c/0x410 [c0002ac1cc69b060] c000000000109c00 perf_callchain_user+0x20/0x40 [c0002ac1cc69b080] c00000000031c6cc get_perf_callchain+0x25c/0x360 [c0002ac1cc69b120] c000000000316b50 perf_callchain+0x70/0xa0 [c0002ac1cc69b140] c000000000316ddc perf_prepare_sample+0x25c/0x790 [c0002ac1cc69b1a0] c000000000317350 perf_event_output_forward+0x40/0xb0 [c0002ac1cc69b220] c000000000306138 __perf_event_overflow+0x88/0x1a0 [c0002ac1cc69b270] c00000000010cf70 record_and_restart+0x230/0x750 [c0002ac1cc69b620] c00000000010d69c perf_event_interrupt+0x20c/0x510 [c0002ac1cc69b730] c000000000027d9c performance_monitor_exception+0x4c/0x60 [c0002ac1cc69b750] c00000000000b2f8 performance_monitor_common_virt+0x1b8/0x1c0 --- Exception: f00 (Performance Monitor) at c0000000000cb5b0 pSeries_lpar_hpte_insert+0x0/0x160 [link register ] c0000000000846f0 __hash_page_64K+0x210/0x540 [c0002ac1cc69ba50] 0000000000000000 (unreliable) [c0002ac1cc69bb00] c000000000073ae0 update_mmu_cache+0x390/0x3a0 [c0002ac1cc69bb70] c00000000037f024 wp_page_copy+0x364/0xce0 [c0002ac1cc69bc20] c00000000038272c do_wp_page+0xdc/0xa60 [c0002ac1cc69bc70] c0000000003857bc handle_mm_fault+0xb9c/0x1b60 [c0002ac1cc69bd50] c00000000006c434 __do_page_fault+0x314/0xc90 [c0002ac1cc69be20] c00000000000c5c8 handle_page_fault+0x10/0x2c --- Exception: 300 (Data Access) at 00007fff8c861fe8 SP (7ffff6b19660) is in userspace Fixes: 2f92447f9f96 ("powerpc/book3s64/hash: Use the pte_t address from the caller") Reported-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Reported-by: Anton Blanchard <anton@ozlabs.org> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200727060947.10060-1-npiggin@gmail.com
|
#
1cade527 |
|
17-Jul-2020 |
Athira Rajeev <atrajeev@linux.vnet.ibm.com> |
powerpc/perf: BHRB control to disable BHRB logic when not used PowerISA v3.1 has few updates for the Branch History Rolling Buffer(BHRB). BHRB disable is controlled via Monitor Mode Control Register A (MMCRA) bit, namely "BHRB Recording Disable (BHRBRD)". This field controls whether BHRB entries are written when BHRB recording is enabled by other bits. This patch implements support for this BHRB disable bit. By setting 0b1 to this bit will disable the BHRB and by setting 0b0 to this bit will have BHRB enabled. This addresses backward compatibility (for older OS), since this bit will be cleared and hardware will be writing to BHRB by default. This patch addresses changes to set MMCRA (BHRBRD) at boot for power10 (there by the core will run faster) and enable this feature only on runtime ie, on explicit need from user. Also save/restore MMCRA in the restore path of state-loss idle state to make sure we keep BHRB disabled if it was not enabled on request at runtime. Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1594996707-3727-12-git-send-email-atrajeev@linux.vnet.ibm.com
|
#
bfe3b194 |
|
17-Jul-2020 |
Athira Rajeev <atrajeev@linux.vnet.ibm.com> |
powerpc/perf: Ignore the BHRB kernel address filtering for P10 Commit bb19af816025 ("powerpc/perf: Prevent kernel address leak to userspace via BHRB buffer") added a check in bhrb_read() to filter the kernel address from BHRB buffer. This patch modified it to avoid that check for PowerISA v3.1 based processors, since PowerISA v3.1 allows only MSR[PR]=1 address to be written to BHRB buffer. Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1594996707-3727-10-git-send-email-atrajeev@linux.vnet.ibm.com
|
#
a64e697c |
|
17-Jul-2020 |
Athira Rajeev <atrajeev@linux.vnet.ibm.com> |
powerpc/perf: power10 Performance Monitoring support Base enablement patch to register performance monitoring hardware support for power10. Patch introduce the raw event encoding format, defines the supported list of events, config fields for the event attributes and their corresponding bit values which are exported via sysfs. Patch also enhances the support function in isa207_common.c to include power10 pmu hardware. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1594996707-3727-9-git-send-email-atrajeev@linux.vnet.ibm.com
|
#
9908c826 |
|
17-Jul-2020 |
Madhavan Srinivasan <maddy@linux.ibm.com> |
powerpc/perf: Add Power10 PMU feature to DT CPU features Add Power10 feature function to DT CPU features, along with a Power10 specific init() to initialize PMU SPRs, sets the oprofile_cpu_type and cpu_features. This will enable performance monitoring unit (PMU) for Power10 in CPU features with "performance-monitor-power10". For Power ISA v3.1, BHRB disable is controlled via Monitor Mode Control Register A (MMCRA) bit, namely "BHRB Recording Disable (BHRBRD)". This patch initializes MMCRA BHRBRD to disable BHRB feature at boot for Power10. Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> [mpe: Move MMCRA_BHRB_DISABLE as noted by jpn, drop CPU setup changes] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1594996707-3727-8-git-send-email-atrajeev@linux.vnet.ibm.com
|
#
c718547e |
|
17-Jul-2020 |
Madhavan Srinivasan <maddy@linux.ibm.com> |
powerpc/perf: Add support for ISA3.1 PMU SPRs PowerISA v3.1 includes new performance monitoring unit(PMU) special purpose registers (SPRs). They are Monitor Mode Control Register 3 (MMCR3) Sampled Instruction Event Register 2 (SIER2) Sampled Instruction Event Register 3 (SIER3) MMCR3 is added for further sampling related configuration control. SIER2/SIER3 are added to provide additional information about the sampled instruction. Patch adds new PPMU flag called "PPMU_ARCH_31" to support handling of these new SPRs, updates the struct thread_struct to include these new SPRs, include MMCR3 in struct mmcr_regs. This is needed to support programming of MMCR3 SPR during event_enable/disable. Patch also adds the sysfs support for the MMCR3 SPR along with SPRN_ macros for these new pmu SPRs. Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> [mpe: Rename to PPMU_ARCH_31 as noted by jpn] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1594996707-3727-5-git-send-email-atrajeev@linux.vnet.ibm.com
|
#
9d4fc86d |
|
17-Jul-2020 |
Athira Rajeev <atrajeev@linux.vnet.ibm.com> |
powerpc/perf: Update Power PMU cache_events to u64 type Events of type PERF_TYPE_HW_CACHE was described for Power PMU as: int (*cache_events)[type][op][result]; where type, op, result values unpacked from the event attribute config value is used to generate the raw event code at runtime. So far the event code values which used to create these cache-related events were within 32 bit and `int` type worked. In power10, some of the event codes are of 64-bit value and hence update the Power PMU cache_events to `u64` type in `power_pmu` struct. Also propagate this change to existing all PMU driver code paths which are using ppmu->cache_events. Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1594996707-3727-4-git-send-email-atrajeev@linux.vnet.ibm.com
|
#
78d76819 |
|
17-Jul-2020 |
Athira Rajeev <atrajeev@linux.vnet.ibm.com> |
powerpc/perf: Update cpu_hw_event to use `struct` for storing MMCR registers core-book3s currently uses array to store the MMCR registers as part of per-cpu `cpu_hw_events`. This patch does a clean up to use `struct` to store mmcr regs instead of array. This will make code easier to read and reduces chance of any subtle bug that may come in the future, say when new registers are added. Patch updates all relevant code that was using MMCR array ( cpuhw->mmcr[x]) to use newly introduced `struct`. This includes the PMU driver code for supported platforms (power5 to power9) and ISA macros for counter support functions. Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1594996707-3727-2-git-send-email-atrajeev@linux.vnet.ibm.com
|
#
c0ee37e8 |
|
17-Jun-2020 |
Christoph Hellwig <hch@lst.de> |
maccess: rename probe_user_{read,write} to copy_{from,to}_user_nofault Better describe what these functions do. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
fe557319 |
|
17-Jun-2020 |
Christoph Hellwig <hch@lst.de> |
maccess: rename probe_kernel_{read,write} to copy_{from,to}_kernel_nofault Better describe what these functions do. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
94afd069 |
|
05-May-2020 |
Jordan Niethe <jniethe5@gmail.com> |
powerpc: Use a datatype for instructions Currently unsigned ints are used to represent instructions on powerpc. This has worked well as instructions have always been 4 byte words. However, ISA v3.1 introduces some changes to instructions that mean this scheme will no longer work as well. This change is Prefixed Instructions. A prefixed instruction is made up of a word prefix followed by a word suffix to make an 8 byte double word instruction. No matter the endianness of the system the prefix always comes first. Prefixed instructions are only planned for powerpc64. Introduce a ppc_inst type to represent both prefixed and word instructions on powerpc64 while keeping it possible to exclusively have word instructions on powerpc32. Signed-off-by: Jordan Niethe <jniethe5@gmail.com> [mpe: Fix compile error in emulate_spe()] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200506034050.24806-12-jniethe5@gmail.com
|
#
bbfd5e4f |
|
27-Jan-2020 |
Kan Liang <kan.liang@linux.intel.com> |
perf/core: Add new branch sample type for HW index of raw branch records The low level index is the index in the underlying hardware buffer of the most recently captured taken branch which is always saved in branch_entries[0]. It is very useful for reconstructing the call stack. For example, in Intel LBR call stack mode, the depth of reconstructed LBR call stack limits to the number of LBR registers. With the low level index information, perf tool may stitch the stacks of two samples. The reconstructed LBR call stack can break the HW limitation. Add a new branch sample type to retrieve low level index of raw branch records. The low level index is between -1 (unknown) and max depth which can be retrieved in /sys/devices/cpu/caps/branches. Only when the new branch sample type is set, the low level index information is dumped into the PERF_SAMPLE_BRANCH_STACK output. Perf tool should check the attr.branch_sample_type, and apply the corresponding format for PERF_SAMPLE_BRANCH_STACK samples. Otherwise, some user case may be broken. For example, users may parse a perf.data, which include the new branch sample type, with an old version perf tool (without the check). Users probably get incorrect information without any warning. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lkml.kernel.org/r/20200127165355.27495-2-kan.liang@linux.intel.com
|
#
def0bfdb |
|
23-Jan-2020 |
Christophe Leroy <christophe.leroy@c-s.fr> |
powerpc: use probe_user_read() and probe_user_write() Instead of opencoding, use probe_user_read() to failessly read a user location and probe_user_write() for writing to user. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/e041f5eedb23f09ab553be8a91c3de2087147320.1579800517.git.christophe.leroy@c-s.fr
|
#
da97e184 |
|
14-Oct-2019 |
Joel Fernandes (Google) <joel@joelfernandes.org> |
perf_event: Add support for LSM and SELinux checks In current mainline, the degree of access to perf_event_open(2) system call depends on the perf_event_paranoid sysctl. This has a number of limitations: 1. The sysctl is only a single value. Many types of accesses are controlled based on the single value thus making the control very limited and coarse grained. 2. The sysctl is global, so if the sysctl is changed, then that means all processes get access to perf_event_open(2) opening the door to security issues. This patch adds LSM and SELinux access checking which will be used in Android to access perf_event_open(2) for the purposes of attaching BPF programs to tracepoints, perf profiling and other operations from userspace. These operations are intended for production systems. 5 new LSM hooks are added: 1. perf_event_open: This controls access during the perf_event_open(2) syscall itself. The hook is called from all the places that the perf_event_paranoid sysctl is checked to keep it consistent with the systctl. The hook gets passed a 'type' argument which controls CPU, kernel and tracepoint accesses (in this context, CPU, kernel and tracepoint have the same semantics as the perf_event_paranoid sysctl). Additionally, I added an 'open' type which is similar to perf_event_paranoid sysctl == 3 patch carried in Android and several other distros but was rejected in mainline [1] in 2016. 2. perf_event_alloc: This allocates a new security object for the event which stores the current SID within the event. It will be useful when the perf event's FD is passed through IPC to another process which may try to read the FD. Appropriate security checks will limit access. 3. perf_event_free: Called when the event is closed. 4. perf_event_read: Called from the read(2) and mmap(2) syscalls for the event. 5. perf_event_write: Called from the ioctl(2) syscalls for the event. [1] https://lwn.net/Articles/696240/ Since Peter had suggest LSM hooks in 2016 [1], I am adding his Suggested-by tag below. To use this patch, we set the perf_event_paranoid sysctl to -1 and then apply selinux checking as appropriate (default deny everything, and then add policy rules to give access to domains that need it). In the future we can remove the perf_event_paranoid sysctl altogether. Suggested-by: Peter Zijlstra <peterz@infradead.org> Co-developed-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: James Morris <jmorris@namei.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: rostedt@goodmis.org Cc: Yonghong Song <yhs@fb.com> Cc: Kees Cook <keescook@chromium.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: jeffv@google.com Cc: Jiri Olsa <jolsa@redhat.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: primiano@google.com Cc: Song Liu <songliubraving@fb.com> Cc: rsavitski@google.com Cc: Namhyung Kim <namhyung@kernel.org> Cc: Matthew Garrett <matthewgarrett@google.com> Link: https://lkml.kernel.org/r/20191014170308.70668-1-joel@joelfernandes.org
|
#
2874c5fd |
|
27-May-2019 |
Thomas Gleixner <tglx@linutronix.de> |
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation either version 2 of the license or at your option any later version extracted by the scancode license scanner the SPDX license identifier GPL-2.0-or-later has been chosen to replace the boilerplate/reference in 3029 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
#
3202e35e |
|
10-May-2019 |
Ravi Bangoria <ravi.bangoria@linux.ibm.com> |
powerpc/perf: Fix MMCRA corruption by bhrb_filter Consider a scenario where user creates two events: 1st event: attr.sample_type |= PERF_SAMPLE_BRANCH_STACK; attr.branch_sample_type = PERF_SAMPLE_BRANCH_ANY; fd = perf_event_open(attr, 0, 1, -1, 0); This sets cpuhw->bhrb_filter to 0 and returns valid fd. 2nd event: attr.sample_type |= PERF_SAMPLE_BRANCH_STACK; attr.branch_sample_type = PERF_SAMPLE_BRANCH_CALL; fd = perf_event_open(attr, 0, 1, -1, 0); It overrides cpuhw->bhrb_filter to -1 and returns with error. Now if power_pmu_enable() gets called by any path other than power_pmu_add(), ppmu->config_bhrb(-1) will set MMCRA to -1. Fixes: 3925f46bb590 ("powerpc/perf: Enable branch stack sampling framework") Cc: stable@vger.kernel.org # v3.10+ Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Reviewed-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
be80e758 |
|
04-Apr-2019 |
Madhavan Srinivasan <maddy@linux.vnet.ibm.com> |
powerpc/perf: Add generic compat mode pmu driver Most of the power processor generation performance monitoring unit (PMU) driver code is bundled in the kernel and one of those is enabled/registered based on the oprofile_cpu_type check at the boot. But things get little tricky incase of "compat" mode boot. IBM POWER System Server based processors has a compactibility mode feature, which simpily put is, Nth generation processor (lets say POWER8) will act and appear in a mode consistent with an earlier generation (N-1) processor (that is POWER7). And in this "compat" mode boot, kernel modify the "oprofile_cpu_type" to be Nth generation (POWER8). If Nth generation pmu driver is bundled (POWER8), it gets registered. Key dependency here is to have distro support for latest processor performance monitoring support. Patch here adds a generic "compat-mode" performance monitoring driver to be register in absence of powernv platform specific pmu driver. Driver supports only "cycles" and "instruction" events. "0x0001e" used as event code for "cycles" and "0x00002" used as event code for "instruction" events. New file called "generic-compat-pmu.c" is created to contain the driver specific code. And base raw event code format modeled on PPMU_ARCH_207S. Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> [mpe: Use SPDX tag for license] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
708597da |
|
04-Apr-2019 |
Madhavan Srinivasan <maddy@linux.vnet.ibm.com> |
powerpc/perf: init pmu from core-book3s Currenty pmu driver file for each ppc64 generation processor has a __init call in itself. Refactor the code by moving the __init call to core-books.c. This also clean's up compat mode pmu driver registration. Suggested-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> [mpe: Use SPDX tag for license] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
0c9108b0 |
|
20-Nov-2018 |
Ravi Bangoria <ravi.bangoria@linux.ibm.com> |
Powerpc/perf: Wire up PMI throttling Commit 14c63f17b1fde ("perf: Drop sample rate when sampling is too slow") introduced a way to throttle PMU interrupts if we're spending too much time just processing those. Wire up powerpc PMI handler to use this infrastructure. We have throttling of the *rate* of interrupts, but this adds throttling based on the *time taken* to process the interrupts. Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
59029136 |
|
10-Jun-2018 |
Madhavan Srinivasan <maddy@linux.vnet.ibm.com> |
powerpc/perf: Add constraints for power9 l2/l3 bus events In previous generation processors, both bus events and direct events of performance monitoring unit can be individually programmabled and monitored in PMCs. But in Power9, L2/L3 bus events are always available as a "bank" of 4 events. To obtain the counts for any of the l2/l3 bus events in a given bank, the user will have to program PMC4 with corresponding l2/l3 bus event for that bank. Patch enforce two contraints incase of L2/L3 bus events. 1)Any L2/L3 event when programmed is also expected to program corresponding PMC4 event from that group. 2)PMC4 event should always been programmed first due to group constraint logic limitation For ex. consider these L3 bus events PM_L3_PF_ON_CHIP_MEM (0x460A0), PM_L3_PF_MISS_L3 (0x160A0), PM_L3_CO_MEM (0x260A0), PM_L3_PF_ON_CHIP_CACHE (0x360A0), 1) This is an INVALID group for L3 Bus event monitoring, since it is missing PMC4 event. perf stat -e "{r160A0,r260A0,r360A0}" < > And this is a VALID group for L3 Bus events: perf stat -e "{r460A0,r160A0,r260A0,r360A0}" < > 2) This is an INVALID group for L3 Bus event monitoring, since it is missing PMC4 event. perf stat -e "{r260A0,r360A0}" < > And this is a VALID group for L3 Bus events: perf stat -e "{r460A0,r260A0,r360A0}" < > 3) This is an INVALID group for L3 Bus event monitoring, since it is missing PMC4 event. perf stat -e "{r360A0}" < > And this is a VALID group for L3 Bus events: perf stat -e "{r460A0,r360A0}" < > Patch here implements group constraint logic suggested by Michael Ellerman. Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
333804dc |
|
09-Dec-2018 |
Madhavan Srinivasan <maddy@linux.vnet.ibm.com> |
powerpc/perf: Update perf_regs structure to include SIER On each sample, Sample Instruction Event Register (SIER) content is saved in pt_regs. SIER does not have a entry as-is in the pt_regs but instead, SIER content is saved in the "dar" register of pt_regs. Patch adds another entry to the perf_regs structure to include the "SIER" printing which internally maps to the "dar" of pt_regs. It also check for the SIER availability in the platform and present value accordingly Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
2bf1071a |
|
05-Jul-2018 |
Nicholas Piggin <npiggin@gmail.com> |
powerpc/64s: Remove POWER9 DD1 support POWER9 DD1 was never a product. It is no longer supported by upstream firmware, and it is not effectively supported in Linux due to lack of testing. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Michael Ellerman <mpe@ellerman.id.au> [mpe: Remove arch_make_huge_pte() entirely] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
788faab7 |
|
08-Jul-2018 |
Tobias Tefke <tobias.tefke@gmail.com> |
perf, tools: Use correct articles in comments Some of the comments in the perf events code use articles incorrectly, using 'a' for words beginning with a vowel sound, where 'an' should be used. Signed-off-by: Tobias Tefke <tobias.tefke@tutanota.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: acme@kernel.org Cc: alexander.shishkin@linux.intel.com Cc: jolsa@redhat.com Cc: namhyung@kernel.org Link: http://lkml.kernel.org/r/20180709105715.22938-1-tobias.tefke@tutanota.com [ Fix a few more perf related 'a event' typo fixes from all around the kernel and tooling tree. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
b58064da |
|
04-Mar-2018 |
Madhavan Srinivasan <maddy@linux.vnet.ibm.com> |
powerpc/perf: Infrastructure to support addition of blacklisted events Introduce code to support addition of blacklisted events for a processor version. Blacklisted events are events that are known to not count correctly on that CPU revision, and so should be prevented from being counted so as to avoid user confusion. A 'pointer' and 'int' variable to hold the number of events are added to 'struct power_pmu', along with a generic function to loop through the list to validate the given event. Generic function 'is_event_blacklisted' is called in power_pmu_event_init() to detect and reject early. Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
cd1231d7 |
|
21-Mar-2018 |
Madhavan Srinivasan <maddy@linux.vnet.ibm.com> |
powerpc/perf: Prevent kernel address leak via perf_get_data_addr() Sampled Data Address Register (SDAR) is a 64-bit register that contains the effective address of the storage operand of an instruction that was being executed, possibly out-of-order, at or around the time that the Performance Monitor alert occurred. In certain scenario SDAR happen to contain the kernel address even for userspace only sampling. Add checks to prevent it. Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
bb19af81 |
|
21-Mar-2018 |
Madhavan Srinivasan <maddy@linux.vnet.ibm.com> |
powerpc/perf: Prevent kernel address leak to userspace via BHRB buffer The current Branch History Rolling Buffer (BHRB) code does not check for any privilege levels before updating the data from BHRB. This could leak kernel addresses to userspace even when profiling only with userspace privileges. Add proper checks to prevent it. Acked-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
e1ebd0e5 |
|
21-Mar-2018 |
Michael Ellerman <mpe@ellerman.id.au> |
powerpc/perf: Fix kernel address leak via sampling registers Current code in power_pmu_disable() does not clear the sampling registers like Sampling Instruction Address Register (SIAR) and Sampling Data Address Register (SDAR) after disabling the PMU. Since these are userspace readable and could contain kernel addresses, add code to explicitly clear the content of these registers. Also add a "context synchronizing instruction" to enforce no further updates to these registers as suggested by Power ISA v3.0B. From section 9.4, on page 1108: "If an mtspr instruction is executed that changes the value of a Performance Monitor register other than SIAR, SDAR, and SIER, the change is not guaranteed to have taken effect until after a subsequent context synchronizing instruction has been executed (see Chapter 11. "Synchronization Requirements for Context Alterations" on page 1133)." Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> [mpe: Massage change log and add ISA reference] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
edb39592 |
|
15-Mar-2018 |
Peter Zijlstra <peterz@infradead.org> |
perf: Fix sibling iteration Mark noticed that the change to sibling_list changed some iteration semantics; because previously we used group_list as list entry, sibling events would always have an empty sibling_list. But because we now use sibling_list for both list head and list entry, siblings will report as having siblings. Fix this with a custom for_each_sibling_event() iterator. Fixes: 8343aae66167 ("perf/core: Remove perf_event::group_entry") Reported-by: Mark Rutland <mark.rutland@arm.com> Suggested-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: vincent.weaver@maine.edu Cc: alexander.shishkin@linux.intel.com Cc: torvalds@linux-foundation.org Cc: alexey.budankov@linux.intel.com Cc: valery.cherepennikov@intel.com Cc: eranian@google.com Cc: acme@redhat.com Cc: linux-tip-commits@vger.kernel.org Cc: davidcc@google.com Cc: kan.liang@intel.com Cc: Dmitry.Prohorov@intel.com Cc: jolsa@redhat.com Link: https://lkml.kernel.org/r/20180315170129.GX4043@hirez.programming.kicks-ass.net
|
#
7eb709f2 |
|
15-Mar-2018 |
Peter Zijlstra <peterz@infradead.org> |
perf: Fix sibling iteration Mark noticed that the change to sibling_list changed some iteration semantics; because previously we used group_list as list entry, sibling events would always have an empty sibling_list. But because we now use sibling_list for both list head and list entry, siblings will report as having siblings. Fix this with a custom for_each_sibling_event() iterator. Fixes: 8343aae66167 ("perf/core: Remove perf_event::group_entry") Reported-by: Mark Rutland <mark.rutland@arm.com> Suggested-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: vincent.weaver@maine.edu Cc: alexander.shishkin@linux.intel.com Cc: torvalds@linux-foundation.org Cc: alexey.budankov@linux.intel.com Cc: valery.cherepennikov@intel.com Cc: eranian@google.com Cc: acme@redhat.com Cc: linux-tip-commits@vger.kernel.org Cc: davidcc@google.com Cc: kan.liang@intel.com Cc: Dmitry.Prohorov@intel.com Cc: jolsa@redhat.com Link: https://lkml.kernel.org/r/20180315170129.GX4043@hirez.programming.kicks-ass.net
|
#
8343aae6 |
|
13-Nov-2017 |
Peter Zijlstra <peterz@infradead.org> |
perf/core: Remove perf_event::group_entry Now that all the grouping is done with RB trees, we no longer need group_entry and can replace the whole thing with sibling_list. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexey Budankov <alexey.budankov@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: David Carrillo-Cisneros <davidcc@google.com> Cc: Dmitri Prokhorov <Dmitry.Prohorov@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Valery Cherepennikov <valery.cherepennikov@intel.com> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
01417c6c |
|
19-Dec-2017 |
Madhavan Srinivasan <maddy@linux.vnet.ibm.com> |
powerpc/64: Change soft_enabled from flag to bitmask "paca->soft_enabled" is used as a flag to mask some of interrupts. Currently supported flags values and their details: soft_enabled MSR[EE] 0 0 Disabled (PMI and HMI not masked) 1 1 Enabled "paca->soft_enabled" is initialized to 1 to make the interripts as enabled. arch_local_irq_disable() will toggle the value when interrupts needs to disbled. At this point, the interrupts are not actually disabled, instead, interrupt vector has code to check for the flag and mask it when it occurs. By "mask it", it update interrupt paca->irq_happened and return. arch_local_irq_restore() is called to re-enable interrupts, which checks and replays interrupts if any occured. Now, as mentioned, current logic doesnot mask "performance monitoring interrupts" and PMIs are implemented as NMI. But this patchset depends on local_irq_* for a successful local_* update. Meaning, mask all possible interrupts during local_* update and replay them after the update. So the idea here is to reserve the "paca->soft_enabled" logic. New values and details: soft_enabled MSR[EE] 1 0 Disabled (PMI and HMI not masked) 0 1 Enabled Reason for the this change is to create foundation for a third mask value "0x2" for "soft_enabled" to add support to mask PMIs. When ->soft_enabled is set to a value "3", PMI interrupts are mask and when set to a value of "1", PMI are not mask. With this patch also extends soft_enabled as interrupt disable mask. Current flags are renamed from IRQ_[EN?DIS}ABLED to IRQS_ENABLED and IRQS_DISABLED. Patch also fixes the ptrace call to force the user to see the softe value to be alway 1. Reason being, even though userspace has no business knowing about softe, it is part of pt_regs. Like-wise in signal context. Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
c2e480ba |
|
19-Dec-2017 |
Madhavan Srinivasan <maddy@linux.vnet.ibm.com> |
powerpc/64: Add #defines for paca->soft_enabled flags Two #defines IRQS_ENABLED and IRQS_DISABLED are added to be used when updating paca->soft_enabled. Replace the hardcoded values used when updating paca->soft_enabled with IRQ_(EN|DIS)ABLED #define. No logic change. Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
f41d84dd |
|
12-Dec-2017 |
Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> |
powerpc/perf: Dereference BHRB entries safely It's theoretically possible that branch instructions recorded in BHRB (Branch History Rolling Buffer) entries have already been unmapped before they are processed by the kernel. Hence, trying to dereference such memory location will result in a crash. eg: Unable to handle kernel paging request for data at address 0xd000000019c41764 Faulting instruction address: 0xc000000000084a14 NIP [c000000000084a14] branch_target+0x4/0x70 LR [c0000000000eb828] record_and_restart+0x568/0x5c0 Call Trace: [c0000000000eb3b4] record_and_restart+0xf4/0x5c0 (unreliable) [c0000000000ec378] perf_event_interrupt+0x298/0x460 [c000000000027964] performance_monitor_exception+0x54/0x70 [c000000000009ba4] performance_monitor_common+0x114/0x120 Fix it by deferefencing the addresses safely. Fixes: 691231846ceb ("powerpc/perf: Fix setting of "to" addresses for BHRB") Cc: stable@vger.kernel.org # v3.10+ Suggested-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Reviewed-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> [mpe: Use probe_kernel_read() which is clearer, tweak change log] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
5aa04b3e |
|
30-Nov-2017 |
Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> |
powerpc/perf: Fix oops when grouping different pmu events When user tries to group imc (In-Memory Collections) event with normal event, (sometime) kernel crashes with following log: Faulting instruction address: 0x00000000 [link register ] c00000000010ce88 power_check_constraints+0x128/0x980 ... c00000000010e238 power_pmu_event_init+0x268/0x6f0 c0000000002dc60c perf_try_init_event+0xdc/0x1a0 c0000000002dce88 perf_event_alloc+0x7b8/0xac0 c0000000002e92e0 SyS_perf_event_open+0x530/0xda0 c00000000000b004 system_call+0x38/0xe0 'event_base' field of 'struct hw_perf_event' is used as flags for normal hw events and used as memory address for imc events. While grouping these two types of events, collect_events() tries to interpret imc 'event_base' as a flag, which causes a corruption resulting in a crash. Consider only those events which belongs to 'perf_hw_context' in collect_events(). Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Reviewed-By: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
4917fcb5 |
|
19-Sep-2017 |
Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> |
powerpc/sysrq: Fix oops whem ppmu is not registered Kernel crashes if power pmu is not registered and user tries to dump regs with 'echo p > /proc/sysrq-trigger'. Sample log: Unable to handle kernel paging request for data at address 0x00000008 Faulting instruction address: 0xc0000000000d52f0 NIP [c0000000000d52f0] perf_event_print_debug+0x10/0x230 LR [c00000000058a938] sysrq_handle_showregs+0x38/0x50 Call Trace: printk+0x38/0x4c (unreliable) __handle_sysrq+0xe4/0x270 write_sysrq_trigger+0x64/0x80 proc_reg_write+0x80/0xd0 __vfs_write+0x40/0x200 vfs_write+0xc8/0x240 SyS_write+0x60/0x110 system_call+0x58/0x6c Fixes: 5f6d0380c640 ("powerpc/perf: Define perf_event_print_debug() to print PMU register values") Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Reviewed-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
fc7ce9c7 |
|
28-Aug-2017 |
Kan Liang <kan.liang@intel.com> |
perf/core, x86: Add PERF_SAMPLE_PHYS_ADDR For understanding how the workload maps to memory channels and hardware behavior, it's very important to collect address maps with physical addresses. For example, 3D XPoint access can only be found by filtering the physical address. Add a new sample type for physical address. perf already has a facility to collect data virtual address. This patch introduces a function to convert the virtual address to physical address. The function is quite generic and can be extended to any architecture as long as a virtual address is provided. - For kernel direct mapping addresses, virt_to_phys is used to convert the virtual addresses to physical address. - For user virtual addresses, __get_user_pages_fast is used to walk the pages tables for user physical address. - This does not work for vmalloc addresses right now. These are not resolved, but code to do that could be added. The new sample type requires collecting the virtual address. The virtual address will not be output unless SAMPLE_ADDR is applied. For security, the physical address can only be exposed to root or privileged user. Tested-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Kan Liang <kan.liang@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: acme@kernel.org Cc: mpe@ellerman.id.au Link: http://lkml.kernel.org/r/1503967969-48278-1-git-send-email-kan.liang@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
170a315f |
|
10-Apr-2017 |
Madhavan Srinivasan <maddy@linux.vnet.ibm.com> |
powerpc/perf: Support to export MMCRA[TEC*] field to userspace Threshold feature when used with MMCRA [Threshold Event Counter Event], MMCRA[Threshold Start event] and MMCRA[Threshold End event] will update MMCRA[Threashold Event Counter Exponent] and MMCRA[Threshold Event Counter Multiplier] with the corresponding threshold event count values. Patch to export MMCRA[TECX/TECM] to userspace in 'weight' field of struct perf_sample_data. Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
79e96f8f |
|
10-Apr-2017 |
Madhavan Srinivasan <maddy@linux.vnet.ibm.com> |
powerpc/perf: Export memory hierarchy info to user space The LDST field and DATA_SRC in SIER identifies the memory hierarchy level (eg: L1, L2 etc), from which a data-cache miss for a marked instruction was satisfied. Use the 'perf_mem_data_src' object to export this hierarchy level to user space. Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
f04d1080 |
|
20-Feb-2017 |
Madhavan Srinivasan <maddy@linux.vnet.ibm.com> |
powerpc/perf: Fix perf_get_data_addr() for power9 DD1 Power9 DD1 do not support PMU_HAS_SIER flag and sdsync in perf_get_data_addr() defaults to MMCRA_SDSYNC which is wrong. Since power9 MMCRA does not support SDSYNC bit, patch includes PPMU_NO_SIAR flag to the check and set the sdsync with MMCRA_SAMPLE_ENABLE; Fixes: 27593d72c4ad ("powerpc/perf: Use MSR to report privilege level on P9 DD1") Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
a2391b35 |
|
23-Dec-2016 |
Madhavan Srinivasan <maddy@linux.vnet.ibm.com> |
powerpc/perf: use is_kernel_addr macro in perf_get_misc_flags() Cleanup to use is_kernel_addr macro. Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
356d8ce3 |
|
12-Feb-2017 |
Madhavan Srinivasan <maddy@linux.vnet.ibm.com> |
powerpc/perf: Use Instruction Counter value Since PM_INST_DISP include speculative instruction, based on the workload the dispatch count could vary considerably. Hence as an alternative, for completed instruction counting, program the PM_INST_DISP event to the MMCR* but use Instruction Counter register value. Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
27593d72 |
|
17-Jan-2017 |
Madhavan Srinivasan <maddy@linux.vnet.ibm.com> |
powerpc/perf: Use MSR to report privilege level on P9 DD1 SIER and SIAR are not updated correctly for some samples, so force the use of MSR and regs->nip instead for misc_flag updates. This is done by adding a new ppmu flag and updating the use_siar logic in perf_read_regs() to use it, and dropping the PPMU_HAS_SIER flag. Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> [mpe: Rename flag to PPMU_NO_SIAR, and also drop PPMU_HAS_SIER] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
73c1b41e |
|
21-Dec-2016 |
Thomas Gleixner <tglx@linutronix.de> |
cpu/hotplug: Cleanup state names When the state names got added a script was used to add the extra argument to the calls. The script basically converted the state constant to a string, but the cleanup to convert these strings into meaningful ones did not happen. Replace all the useless strings with 'subsys/xxx/yyy:state' strings which are used in all the other places already. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Siewior <bigeasy@linutronix.de> Link: http://lkml.kernel.org/r/20161221192112.085444152@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
7c98bd72 |
|
05-Sep-2016 |
Daniel Axtens <dja@axtens.net> |
powerpc/sparse: Make a bunch of things static Squash a bunch of sparse warnings by making things static. Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
57ecde42 |
|
13-Jul-2016 |
Thomas Gleixner <tglx@linutronix.de> |
powerpc/perf: Convert book3s notifier to state machine callbacks Install the callbacks via the state machine and let the core invoke the callbacks on the already online CPUs. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Cc: linuxppc-dev@lists.ozlabs.org Cc: rt@linutronix.de Link: http://lkml.kernel.org/r/20160713153334.345786236@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
027dfac6 |
|
01-Jun-2016 |
Michael Ellerman <mpe@ellerman.id.au> |
powerpc: Various typo fixes Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
58bffb5b |
|
03-Mar-2016 |
Madhavan Srinivasan <maddy@linux.vnet.ibm.com> |
powerpc/perf: Fix misleading comment in pmao_restore_workaround() The current comment in pmao_restore_workaround() regarding hard_irq_disable() is wrong. It should say to hard *disable* interrupts instead of *enable*. Fix it. Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
8f3e5684 |
|
03-Sep-2015 |
Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> |
perf/core: Drop PERF_EVENT_TXN We currently use PERF_EVENT_TXN flag to determine if we are in the middle of a transaction. If in a transaction, we defer the schedulability checks from pmu->add() operation to the pmu->commit() operation. Now that we have "transaction types" (PERF_PMU_TXN_ADD, PERF_PMU_TXN_READ) we can use the type to determine if we are in a transaction and drop the PERF_EVENT_TXN flag. When PERF_EVENT_TXN is dropped, the cpuhw->group_flag on some architectures becomes unused, so drop that field as well. This is an extension of the Powerpc patch from Peter Zijlstra to s390, Sparc and x86 architectures. Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/1441336073-22750-11-git-send-email-sukadev@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
fbbe0701 |
|
03-Sep-2015 |
Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> |
perf/core: Add a 'flags' parameter to the PMU transactional interfaces Currently, the PMU interface allows reading only one counter at a time. But some PMUs like the 24x7 counters in Power, support reading several counters at once. To leveage this functionality, extend the transaction interface to support a "transaction type". The first type, PERF_PMU_TXN_ADD, refers to the existing transactions, i.e. used to _schedule_ all the events on the PMU as a group. A second transaction type, PERF_PMU_TXN_READ, will be used in a follow-on patch, by the 24x7 counters to read several counters at once. Extend the transaction interfaces to the PMU to accept a 'txn_flags' parameter and use this parameter to ignore any transactions that are not of type PERF_PMU_TXN_ADD. Thanks to Peter Zijlstra for his input. Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> [peterz: s390 compile fix] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/1441336073-22750-3-git-send-email-sukadev@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
f0322f7f |
|
30-Jun-2015 |
Anshuman Khandual <khandual@linux.vnet.ibm.com> |
powerpc/perf: Change type of the bhrb_users variable This patch just changes data type of bhrb_users variable from int to unsigned int because it never contains a negative value. Reported-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
72e349f1 |
|
25-May-2015 |
Anton Blanchard <anton@samba.org> |
powerpc/perf: Fix book3s kernel to userspace backtraces When we take a PMU exception or a software event we call perf_read_regs(). This overloads regs->result with a boolean that describes if we should use the sampled instruction address register (SIAR) or the regs. If the exception is in kernel, we start with the kernel regs and backtrace through the kernel stack. At this point we switch to the userspace regs and backtrace the user stack with perf_callchain_user(). Unfortunately these regs have not got the perf_read_regs() treatment, so regs->result could be anything. If it is non zero, perf_instruction_pointer() decides to use the SIAR, and we get issues like this: 0.11% qemu-system-ppc [kernel.kallsyms] [k] _raw_spin_lock_irqsave | ---_raw_spin_lock_irqsave | |--52.35%-- 0 | | | |--46.39%-- __hrtimer_start_range_ns | | kvmppc_run_core | | kvmppc_vcpu_run_hv | | kvmppc_vcpu_run | | kvm_arch_vcpu_ioctl_run | | kvm_vcpu_ioctl | | do_vfs_ioctl | | sys_ioctl | | system_call | | | | | |--67.08%-- _raw_spin_lock_irqsave <--- hi mum | | | | | | | --100.00%-- 0x7e714 | | | 0x7e714 Notice the bogus _raw_spin_irqsave when we transition from kernel (system_call) to userspace (0x7e714). We inserted what was in the SIAR. Add a check in regs_use_siar() to check that the regs in question are from a PMU exception. With this fix the backtrace makes sense: 0.47% qemu-system-ppc [kernel.vmlinux] [k] _raw_spin_lock_irqsave | ---_raw_spin_lock_irqsave | |--53.83%-- 0 | | | |--44.73%-- hrtimer_try_to_cancel | | kvmppc_start_thread | | kvmppc_run_core | | kvmppc_vcpu_run_hv | | kvmppc_vcpu_run | | kvm_arch_vcpu_ioctl_run | | kvm_vcpu_ioctl | | do_vfs_ioctl | | sys_ioctl | | system_call | | __ioctl | | 0x7e714 | | 0x7e714 Cc: stable@vger.kernel.org Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
68de8867 |
|
24-Mar-2015 |
Jan Stancek <jstancek@redhat.com> |
powerpc/perf: add missing put_cpu_var in power_pmu_event_init One path in power_pmu_event_init() calls get_cpu_var(), but is missing matching call to put_cpu_var(), which causes preemption imbalance and crash in user-space: Page fault in user mode with in_atomic() = 1 mm = c000001fefa5a280 NIP = 3fff9bf2cae0 MSR = 900000014280f032 Oops: Weird page fault, sig: 11 [#23] SMP NR_CPUS=2048 NUMA PowerNV Modules linked in: <snip> CPU: 43 PID: 10285 Comm: a.out Tainted: G D 4.0.0-rc5+ #1 task: c000001fe82c9200 ti: c000001fe835c000 task.ti: c000001fe835c000 NIP: 00003fff9bf2cae0 LR: 00003fff9bee4898 CTR: 00003fff9bf2cae0 REGS: c000001fe835fea0 TRAP: 0401 Tainted: G D (4.0.0-rc5+) MSR: 900000014280f032 <SF,HV,VEC,VSX,EE,PR,FP,ME,IR,DR,RI> CR: 22000028 XER: 00000000 CFAR: 00003fff9bee4894 SOFTE: 1 GPR00: 00003fff9bee494c 00003fffe01c2ee0 00003fff9c084410 0000000010020068 GPR04: 0000000000000000 0000000000000002 0000000000000008 0000000000000001 GPR08: 0000000000000001 00003fff9c074a30 00003fff9bf2cae0 00003fff9bf2cd70 GPR12: 0000000052000022 00003fff9c10b700 NIP [00003fff9bf2cae0] 0x3fff9bf2cae0 LR [00003fff9bee4898] 0x3fff9bee4898 Call Trace: ---[ end trace 5d3d952b5d4185d4 ]--- BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:41 in_atomic(): 1, irqs_disabled(): 0, pid: 10285, name: a.out INFO: lockdep is turned off. CPU: 43 PID: 10285 Comm: a.out Tainted: G D 4.0.0-rc5+ #1 Call Trace: [c000001fe835f990] [c00000000089c014] .dump_stack+0x98/0xd4 (unreliable) [c000001fe835fa10] [c0000000000e4138] .___might_sleep+0x1d8/0x2e0 [c000001fe835faa0] [c000000000888da8] .down_read+0x38/0x110 [c000001fe835fb30] [c0000000000bf2f4] .exit_signals+0x24/0x160 [c000001fe835fbc0] [c0000000000abde0] .do_exit+0xd0/0xe70 [c000001fe835fcb0] [c00000000001f4c4] .die+0x304/0x450 [c000001fe835fd60] [c00000000088e1f4] .do_page_fault+0x2d4/0x900 [c000001fe835fe30] [c000000000008664] handle_page_fault+0x10/0x30 note: a.out[10285] exited with preempt_count 1 Reproducer: #include <stdio.h> #include <unistd.h> #include <syscall.h> #include <sys/types.h> #include <sys/stat.h> #include <linux/perf_event.h> #include <linux/hw_breakpoint.h> static struct perf_event_attr event = { .type = PERF_TYPE_RAW, .size = sizeof(struct perf_event_attr), .sample_type = PERF_SAMPLE_BRANCH_STACK, .branch_sample_type = PERF_SAMPLE_BRANCH_ANY_RETURN, }; int main() { syscall(__NR_perf_event_open, &event, 0, -1, -1, 0); } Signed-off-by: Jan Stancek <jstancek@redhat.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
acba3c7e |
|
14-Jan-2015 |
Peter Zijlstra <peterz@infradead.org> |
perf, powerpc: Fix up flush_branch_stack() users The recent LBR rework for x86 left a stray flush_branch_stack() user in the PowerPC code, fix that up. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com> Cc: Anton Blanchard <anton@samba.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Christoph Lameter <cl@linux.com> Cc: Joel Stanley <joel@jms.id.au> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michael Neuling <mikey@neuling.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Tejun Heo <tj@kernel.org> Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
69111bac |
|
21-Oct-2014 |
Christoph Lameter <cl@linux.com> |
powerpc: Replace __get_cpu_var uses This still has not been merged and now powerpc is the only arch that does not have this change. Sorry about missing linuxppc-dev before. V2->V2 - Fix up to work against 3.18-rc1 __get_cpu_var() is used for multiple purposes in the kernel source. One of them is address calculation via the form &__get_cpu_var(x). This calculates the address for the instance of the percpu variable of the current processor based on an offset. Other use cases are for storing and retrieving data from the current processors percpu area. __get_cpu_var() can be used as an lvalue when writing data or on the right side of an assignment. __get_cpu_var() is defined as : __get_cpu_var() always only does an address determination. However, store and retrieve operations could use a segment prefix (or global register on other platforms) to avoid the address calculation. this_cpu_write() and this_cpu_read() can directly take an offset into a percpu area and use optimized assembly code to read and write per cpu variables. This patch converts __get_cpu_var into either an explicit address calculation using this_cpu_ptr() or into a use of this_cpu operations that use the offset. Thereby address calculations are avoided and less registers are used when code is generated. At the end of the patch set all uses of __get_cpu_var have been removed so the macro is removed too. The patch set includes passes over all arches as well. Once these operations are used throughout then specialized macros can be defined in non -x86 arches as well in order to optimize per cpu access by f.e. using a global register that may be set to the per cpu base. Transformations done to __get_cpu_var() 1. Determine the address of the percpu instance of the current processor. DEFINE_PER_CPU(int, y); int *x = &__get_cpu_var(y); Converts to int *x = this_cpu_ptr(&y); 2. Same as #1 but this time an array structure is involved. DEFINE_PER_CPU(int, y[20]); int *x = __get_cpu_var(y); Converts to int *x = this_cpu_ptr(y); 3. Retrieve the content of the current processors instance of a per cpu variable. DEFINE_PER_CPU(int, y); int x = __get_cpu_var(y) Converts to int x = __this_cpu_read(y); 4. Retrieve the content of a percpu struct DEFINE_PER_CPU(struct mystruct, y); struct mystruct x = __get_cpu_var(y); Converts to memcpy(&x, this_cpu_ptr(&y), sizeof(x)); 5. Assignment to a per cpu variable DEFINE_PER_CPU(int, y) __get_cpu_var(y) = x; Converts to __this_cpu_write(y, x); 6. Increment/Decrement etc of a per cpu variable DEFINE_PER_CPU(int, y); __get_cpu_var(y)++ Converts to __this_cpu_inc(y) Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> CC: Paul Mackerras <paulus@samba.org> Signed-off-by: Christoph Lameter <cl@linux.com> [mpe: Fix build errors caused by set/or_softirq_pending(), and rework assignment in __set_breakpoint() to use memcpy().] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
e51df2c1 |
|
19-Aug-2014 |
Anton Blanchard <anton@samba.org> |
powerpc: Make a bunch of things static Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
23f66e2d |
|
27-Aug-2014 |
Tejun Heo <tj@kernel.org> |
Revert "powerpc: Replace __get_cpu_var uses" This reverts commit 5828f666c069af74e00db21559f1535103c9f79a due to build failure after merging with pending powerpc changes. Link: http://lkml.kernel.org/g/20140827142243.6277eaff@canb.auug.org.au Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
5828f666 |
|
16-Aug-2014 |
Christoph Lameter <cl@linux.com> |
powerpc: Replace __get_cpu_var uses __get_cpu_var() is used for multiple purposes in the kernel source. One of them is address calculation via the form &__get_cpu_var(x). This calculates the address for the instance of the percpu variable of the current processor based on an offset. Other use cases are for storing and retrieving data from the current processors percpu area. __get_cpu_var() can be used as an lvalue when writing data or on the right side of an assignment. __get_cpu_var() is defined as : #define __get_cpu_var(var) (*this_cpu_ptr(&(var))) __get_cpu_var() always only does an address determination. However, store and retrieve operations could use a segment prefix (or global register on other platforms) to avoid the address calculation. this_cpu_write() and this_cpu_read() can directly take an offset into a percpu area and use optimized assembly code to read and write per cpu variables. This patch converts __get_cpu_var into either an explicit address calculation using this_cpu_ptr() or into a use of this_cpu operations that use the offset. Thereby address calculations are avoided and less registers are used when code is generated. At the end of the patch set all uses of __get_cpu_var have been removed so the macro is removed too. The patch set includes passes over all arches as well. Once these operations are used throughout then specialized macros can be defined in non -x86 arches as well in order to optimize per cpu access by f.e. using a global register that may be set to the per cpu base. Transformations done to __get_cpu_var() 1. Determine the address of the percpu instance of the current processor. DEFINE_PER_CPU(int, y); int *x = &__get_cpu_var(y); Converts to int *x = this_cpu_ptr(&y); 2. Same as #1 but this time an array structure is involved. DEFINE_PER_CPU(int, y[20]); int *x = __get_cpu_var(y); Converts to int *x = this_cpu_ptr(y); 3. Retrieve the content of the current processors instance of a per cpu variable. DEFINE_PER_CPU(int, y); int x = __get_cpu_var(y) Converts to int x = __this_cpu_read(y); 4. Retrieve the content of a percpu struct DEFINE_PER_CPU(struct mystruct, y); struct mystruct x = __get_cpu_var(y); Converts to memcpy(&x, this_cpu_ptr(&y), sizeof(x)); 5. Assignment to a per cpu variable DEFINE_PER_CPU(int, y) __get_cpu_var(y) = x; Converts to __this_cpu_write(y, x); 6. Increment/Decrement etc of a per cpu variable DEFINE_PER_CPU(int, y); __get_cpu_var(y)++ Converts to __this_cpu_inc(y) tj: Folded a fix patch. http://lkml.kernel.org/g/alpine.DEB.2.11.1408172143020.9652@gentwo.org Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> CC: Paul Mackerras <paulus@samba.org> Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Tejun Heo <tj@kernel.org>
|
#
9de5cb0f |
|
23-Jul-2014 |
Michael Ellerman <mpe@ellerman.id.au> |
powerpc/perf: Add per-event excludes on Power8 Power8 has a new register (MMCR2), which contains individual freeze bits for each counter. This is an improvement on previous chips as it means we can have multiple events on the PMU at the same time with different exclude_{user,kernel,hv} settings. Previously we had to ensure all events on the PMU had the same exclude settings. The core of the patch is fairly simple. We use the 207S feature flag to indicate that the PMU backend supports per-event excludes, if it's set we skip the generic logic that enforces the equality of excludes between events. We also use that flag to skip setting the freeze bits in MMCR0, the PMU backend is expected to have handled setting them in MMCR2. The complication arises with EBB. The FCxP bits in MMCR2 are accessible R/W to a task using EBB. Which means a task using EBB will be able to see that we are using MMCR2 for freezing, whereas the old logic which used MMCR0 is not user visible. The task can not see or affect exclude_kernel & exclude_hv, so we only need to consider exclude_user. The table below summarises the behaviour both before and after this commit is applied: exclude_user true false ------------------------------------ | User visible | N N Before | Can freeze | Y Y | Can unfreeze | N Y ------------------------------------ | User visible | Y Y After | Can freeze | Y Y | Can unfreeze | Y/N Y ------------------------------------ So firstly I assert that the simple visibility of the exclude_user setting in MMCR2 is a non-issue. The event belongs to the task, and was most likely created by the task. So the exclude_user setting is not privileged information in any way. Secondly, the behaviour in the exclude_user = false case is unchanged. This is important as it is the case that is actually useful, ie. the event is created with no exclude setting and the task uses MMCR2 to implement exclusion manually. For exclude_user = true there is no meaningful change to freezing the event. Previously the task could use MMCR2 to freeze the event, though it was already frozen with MMCR0. With the new code the task can use MMCR2 to freeze the event, though it was already frozen with MMCR2. The only real change is when exclude_user = true and the task tries to use MMCR2 to unfreeze the event. Previously this had no effect, because the event was already frozen in MMCR0. With the new code the task can unfreeze the event in MMCR2, but at some indeterminate time in the future the kernel will overwrite its setting and refreeze the event. Therefore my final assertion is that any task using exclude_user = true and also fiddling with MMCR2 was deeply confused before this change, and remains so after it. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
8abd818f |
|
23-Jul-2014 |
Michael Ellerman <mpe@ellerman.id.au> |
powerpc/perf: Pass the struct perf_events down to compute_mmcr() To support per-event exclude settings on Power8 we need access to the struct perf_events in compute_mmcr(). Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
79a4cb28 |
|
23-Jul-2014 |
Michael Ellerman <mpe@ellerman.id.au> |
powerpc/perf: Clear all MMCR settings before calling compute_mmcr() Because we reuse cpuhw->mmcr on each call to compute_mmcr() there's a risk that we could forget to set one of the values and use whatever value was in there previously. Currently all the implementations are careful to set all the values, but it's safer to clear them all before we call compute_mmcr(). Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
8903461c |
|
23-Jul-2014 |
Michael Ellerman <mpe@ellerman.id.au> |
powerpc/perf: Fix MMCR2 handling for EBB In the recent commit b50a6c584bb4 "Clear MMCR2 when enabling PMU", I screwed up the handling of MMCR2 for tasks using EBB. We must make sure we set MMCR2 *before* ebb_switch_in(), otherwise we overwrite the value of MMCR2 that userspace may have written. That potentially breaks a task that uses EBB and manually uses MMCR2 for event freezing. Fixes: b50a6c584bb4 ("powerpc/perf: Clear MMCR2 when enabling PMU") Cc: stable@vger.kernel.org Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
f5602941 |
|
28-May-2014 |
Anton Blanchard <anton@samba.org> |
powerpc/perf: Never program book3s PMCs with values >= 0x80000000 We are seeing a lot of PMU warnings on POWER8: Can't find PMC that caused IRQ Looking closer, the active PMC is 0 at this point and we took a PMU exception on the transition from negative to 0. Some versions of POWER8 have an issue where they edge detect and not level detect PMC overflows. A number of places program the PMC with (0x80000000 - period_left), where period_left can be negative. We can either fix all of these or just ensure that period_left is always >= 1. This patch takes the second option. Cc: <stable@vger.kernel.org> Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
b50a6c58 |
|
08-Jul-2014 |
Joel Stanley <joel@jms.id.au> |
powerpc/perf: Clear MMCR2 when enabling PMU On POWER8 when switching to a KVM guest we set bits in MMCR2 to freeze the PMU counters. Aside from on boot they are then never reset, resulting in stuck perf counters for any user in the guest or host. We now set MMCR2 to 0 whenever enabling the PMU, which provides a sane state for perf to use the PMU counters under either the guest or the host. This was manifesting as a bug with ppc64_cpu --frequency: $ sudo ppc64_cpu --frequency WARNING: couldn't run on cpu 0 WARNING: couldn't run on cpu 8 ... WARNING: couldn't run on cpu 144 WARNING: couldn't run on cpu 152 min: 18446744073.710 GHz (cpu -1) max: 0.000 GHz (cpu -1) avg: 0.000 GHz The command uses a perf counter to measure CPU cycles over a fixed amount of time, in order to approximate the frequency of the machine. The counters were returning zero once a guest was started, regardless of weather it was still running or had been shut down. By dumping the value of MMCR2, it was observed that once a guest is running MMCR2 is set to 1s - which stops counters from running: $ sudo sh -c 'echo p > /proc/sysrq-trigger' CPU: 0 PMU registers, ppmu = POWER8 n_counters = 6 PMC1: 5b635e38 PMC2: 00000000 PMC3: 00000000 PMC4: 00000000 PMC5: 1bf5a646 PMC6: 5793d378 PMC7: deadbeef PMC8: deadbeef MMCR0: 0000000080000000 MMCR1: 000000001e000000 MMCRA: 0000040000000000 MMCR2: fffffffffffffc00 EBBHR: 0000000000000000 EBBRR: 0000000000000000 BESCR: 0000000000000000 SIAR: 00000000000a51cc SDAR: c00000000fc40000 SIER: 0000000001000000 This is done unconditionally in book3s_hv_interrupts.S upon entering the guest, and the original value is only save/restored if the host has indicated it was using the PMU. This is okay, however the user of the PMU needs to ensure that it is in a defined state when it starts using it. Fixes: e05b9b9e5c10 ("powerpc/perf: Power8 PMU support") Cc: stable@vger.kernel.org Signed-off-by: Joel Stanley <joel@jms.id.au> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
4d9690dd |
|
08-Jul-2014 |
Joel Stanley <joel@jms.id.au> |
powerpc/perf: Add PPMU_ARCH_207S define Instead of separate bits for every POWER8 PMU feature, have a single one for v2.07 of the architecture. This saves us adding a MMCR2 define for a future patch. Cc: stable@vger.kernel.org Signed-off-by: Joel Stanley <joel@jms.id.au> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
76cb8a78 |
|
13-Mar-2014 |
Michael Ellerman <mpe@ellerman.id.au> |
powerpc/perf: Enable BHRB access for EBB events The previous commit added constraint and register handling to allow processes using EBB (Event Based Branches) to request access to the BHRB (Branch History Rolling Buffer). With that in place we can allow processes using EBB to access the BHRB. This is achieved by setting BHRBA in MMCR0 when we enable EBB access. We must also clear BHRBA when we are disabling. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
58b5fb00 |
|
13-Mar-2014 |
Michael Ellerman <mpe@ellerman.id.au> |
powerpc/perf: Reject EBB events which specify a sample_type Although we already block EBB events which request sampling using sample_period, technically it's possible for an event to set sample_type but not sample_period. Nothing terrible will happen if an EBB event does specify sample_type, but it signals a major confusion on the part of userspace, and so we do them the favor of rejecting it. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
c2e37a26 |
|
13-Mar-2014 |
Michael Ellerman <mpe@ellerman.id.au> |
powerpc/perf: Add lost exception workaround Some power8 revisions have a hardware bug where we can lose a PMU exception, this commit adds a workaround to detect the bad condition and rectify the situation. See the comment in the commit for a full description. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
5f6d0380 |
|
13-Mar-2014 |
Anshuman Khandual <khandual@linux.vnet.ibm.com> |
powerpc/perf: Define perf_event_print_debug() to print PMU register values Currently the sysrq ShowRegs command does not print any PMU registers as we have an empty definition for perf_event_print_debug(). This patch defines perf_event_print_debug() to print various PMU registers. Example output: CPU: 0 PMU registers, ppmu = POWER7 n_counters = 6 PMC1: 00000000 PMC2: 00000000 PMC3: 00000000 PMC4: 00000000 PMC5: 00000000 PMC6: 00000000 PMC7: deadbeef PMC8: deadbeef MMCR0: 0000000080000000 MMCR1: 0000000000000000 MMCRA: 0f00000001000000 SIAR: 0000000000000000 SDAR: 0000000000000000 SIER: 0000000000000000 Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com> [mpe: Fix 32 bit build and rework formatting for compactness] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
b4d6c06c |
|
17-Dec-2013 |
Anshuman Khandual <khandual@linux.vnet.ibm.com> |
powerpc/perf: Configure BHRB filter before enabling PMU interrupts Right now the config_bhrb() PMU specific call happens after write_mmcr0(), which actually enables the PMU for event counting and interrupts. So there is a small window of time where the PMU and BHRB runs without the required HW branch filter (if any) enabled in BHRB. This can cause some of the branch samples to be collected through BHRB without any filter applied and hence affects the correctness of the results. This patch moves the BHRB config function call before enabling interrupts. Here are some data points captured via trace prints which depicts how we could get PMU interrupts with BHRB filter NOT enabled with a standard perf record command line (asking for branch record information as well). $ perf record -j any_call ls Before the patch:- ls-1962 [003] d... 2065.299590: .perf_event_interrupt: MMCRA: 40000000000 ls-1962 [003] d... 2065.299603: .perf_event_interrupt: MMCRA: 40000000000 ... All the PMU interrupts before this point did not have the requested HW branch filter enabled in the MMCRA. ls-1962 [003] d... 2065.299647: .perf_event_interrupt: MMCRA: 40040000000 ls-1962 [003] d... 2065.299662: .perf_event_interrupt: MMCRA: 40040000000 After the patch:- ls-1850 [008] d... 190.311828: .perf_event_interrupt: MMCRA: 40040000000 ls-1850 [008] d... 190.311848: .perf_event_interrupt: MMCRA: 40040000000 All the PMU interrupts have the requested HW BHRB branch filter enabled in MMCRA. Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com> [mpe: Fixed up whitespace and cleaned up changelog] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
b0d436c7 |
|
06-Aug-2013 |
Anton Blanchard <anton@samba.org> |
powerpc: Fix a number of sparse warnings Address some of the trivial sparse warnings in arch/powerpc. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
8d7c55d0 |
|
23-Jul-2013 |
Michael Ellerman <michael@ellerman.id.au> |
powerpc/perf: Export PERF_EVENT_CONFIG_EBB_SHIFT to userspace We use bit 63 of the event code for userspace to request that the event be counted using EBB (Event Based Branches). Export this value, making it part of the API - though only on processors that support EBB. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
ff3d79dc |
|
09-Jun-2013 |
Anshuman Khandual <khandual@linux.vnet.ibm.com> |
powerpc/perf: BHRB filter configuration should follow the task When the task moves around the system, the corresponding cpuhw per cpu strcuture should be popullated with the BHRB filter request value so that PMU could be configured appropriately with that during the next call into power_pmu_enable(). Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com> Acked-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
330a1eb7 |
|
28-Jun-2013 |
Michael Ellerman <michael@ellerman.id.au> |
powerpc/perf: Core EBB support for 64-bit book3s Add support for EBB (Event Based Branches) on 64-bit book3s. See the included documentation for more details. EBBs are a feature which allows the hardware to branch directly to a specified user space address when a PMU event overflows. This can be used by programs for self-monitoring with no kernel involvement in the inner loop. Most of the logic is in the generic book3s code, primarily to avoid a proliferation of PMU callbacks. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
4ea355b5 |
|
28-Jun-2013 |
Michael Ellerman <michael@ellerman.id.au> |
powerpc/perf: Don't enable if we have zero events In power_pmu_enable() we still enable the PMU even if we have zero events. This should have no effect but doesn't make much sense. Instead just return after telling the hypervisor that we are not using the PMCs. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> CC: <stable@vger.kernel.org> [v3.10] Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
0a48843d |
|
28-Jun-2013 |
Michael Ellerman <michael@ellerman.id.au> |
powerpc/perf: Use existing out label in power_pmu_enable() In power_pmu_enable() we can use the existing out label to reduce the number of return paths. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> CC: <stable@vger.kernel.org> [v3.10] Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
7a7a41f9 |
|
28-Jun-2013 |
Michael Ellerman <michael@ellerman.id.au> |
powerpc/perf: Freeze PMC5/6 if we're not using them On Power8 we can freeze PMC5 and 6 if we're not using them. Normally they run all the time. As noticed by Anshuman, we should unfreeze them when we disable the PMU as there are legacy tools which expect them to run all the time. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> CC: <stable@vger.kernel.org> [v3.10] Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
378a6ee9 |
|
28-Jun-2013 |
Michael Ellerman <michael@ellerman.id.au> |
powerpc/perf: Rework disable logic in pmu_disable() In pmu_disable() we disable the PMU by setting the FC (Freeze Counters) bit in MMCR0. In order to do this we have to read/modify/write MMCR0. It's possible that we read a value from MMCR0 which has PMAO (PMU Alert Occurred) set. When we write that value back it will cause an interrupt to occur. We will then end up in the PMU interrupt handler even though we are supposed to have just disabled the PMU. We can avoid this by making sure we never write PMAO back. We should not lose interrupts because when the PMU is re-enabled the overflowed values will cause another interrupt. We also reorder the clearing of SAMPLE_ENABLE so that is done after the PMU is frozen. Otherwise there is a small window between the clearing of SAMPLE_ENABLE and the setting of FC where we could take an interrupt and incorrectly see SAMPLE_ENABLE not set. This would for example change the logic in perf_read_regs(). Signed-off-by: Michael Ellerman <michael@ellerman.id.au> CC: <stable@vger.kernel.org> [v3.10] Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
061d19f2 |
|
24-Jun-2013 |
Paul Gortmaker <paul.gortmaker@windriver.com> |
powerpc: Delete __cpuinit usage from all users The __cpuinit type of throwaway sections might have made sense some time ago when RAM was more constrained, but now the savings do not offset the cost and complications. For example, the fix in commit 5e427ec2d0 ("x86: Fix bit corruption at CPU resume time") is a good example of the nasty type of bugs that can be created with improper use of the various __init prefixes. After a discussion on LKML[1] it was decided that cpuinit should go the way of devinit and be phased out. Once all the users are gone, we can then finally remove the macros themselves from linux/init.h. This removes all the powerpc uses of the __cpuinit macros. There are no __CPUINIT users in assembly files in powerpc. [1] https://lkml.org/lkml/2013/5/20/589 Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Josh Boyer <jwboyer@gmail.com> Cc: Matt Porter <mporter@kernel.crashing.org> Cc: Kumar Gala <galak@kernel.crashing.org> Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
6772faa1 |
|
05-Jun-2013 |
Michael Ellerman <michael@ellerman.id.au> |
powerpc/perf: Fix deadlock caused by calling printk() in PMU exception In commit bc09c21 "Fix finding overflowed PMC in interrupt" we added a printk() to the PMU exception handler. Unfortunately that is not safe. The problem is that the PMU exception may run even when interrupts are soft disabled, aka NMI context. We do this so that we can profile parts of the kernel that have interrupts soft-disabled. But by calling printk() from the exception handler, we can potentially deadlock in the printk code on logbuf_lock, eg: [c00000038ba575c0] c000000000081928 .vprintk_emit+0xa8/0x540 [c00000038ba576a0] c0000000007bcde8 .printk+0x48/0x58 [c00000038ba57710] c000000000076504 .perf_event_interrupt+0x2d4/0x490 [c00000038ba57810] c00000000001f6f8 .performance_monitor_exception+0x48/0x60 [c00000038ba57880] c0000000000032cc performance_monitor_common+0x14c/0x180 --- Exception: f01 (Performance Monitor) at c0000000007b25d4 ._raw_spin_lock_irq +0x64/0xc0 [c00000038ba57bf0] c00000000007ed90 .devkmsg_read+0xd0/0x5a0 [c00000038ba57d00] c0000000001c2934 .vfs_read+0xc4/0x1e0 [c00000038ba57d90] c0000000001c2cd8 .SyS_read+0x58/0xd0 [c00000038ba57e30] c000000000009d54 syscall_exit+0x0/0x98 --- Exception: c01 (System Call) at 00001fffffbf6f7c SP (3ffff6d4de10) is in userspace Fix it by making sure we only call printk() when we are not in NMI context. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Cc: <stable@vger.kernel.org> # 3.9 Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
58a032c3 |
|
15-May-2013 |
Michael Ellerman <michael@ellerman.id.au> |
powerpc/perf: Add missing SIER support Commit 8f61aa3 "Add support for SIER" missed updates to siar_valid() and perf_get_data_addr(). In both cases we need to check the SIER instead of mmcra. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
cbda6aa1 |
|
15-May-2013 |
Michael Ellerman <michael@ellerman.id.au> |
powerpc/perf: Revert to original NO_SIPR logic This is a revert and then some of commit 860aad7 "Add regs_no_sipr()". This workaround was only needed on early chip versions. As before NO_SIPR becomes a static flag of the PMU struct. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
69123184 |
|
13-May-2013 |
Michael Neuling <mikey@neuling.org> |
powerpc/perf: Fix setting of "to" addresses for BHRB Currently we only set the "to" address in the branch stack when the CPU explicitly gives us a value. Unfortunately it only does this for XL form branches (eg blr, bctr, bctar) and not I and B form branches (eg b, bc). Fortunately if we read the instruction from memory we can extract the offset of a branch and calculate the target address. This adds a function power_pmu_bhrb_to() to calculate the target/to address of the corresponding I and B form branches. It handles branches in both user and kernel spaces. It also plumbs this into the perf brhb reading code. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
506e70d1 |
|
13-May-2013 |
Michael Neuling <mikey@neuling.org> |
powerpc/pmu: Fix order of interpreting BHRB target entries The current Branch History Rolling Buffer (BHRB) code misinterprets the order of entries in the hardware buffer. It assumes that a branch target address will be read _after_ its corresponding branch. In reality the branch target comes before (lower mfbhrb entry) it's corresponding branch. This is a rewrite of the code to take this into account. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
d52f2dc4 |
|
13-May-2013 |
Michael Neuling <mikey@neuling.org> |
powerpc/perf: Move BHRB code into CONFIG_PPC64 region The new Branch History Rolling buffer (BHRB) code is only useful on 64bit processors, so move it into the #ifdef CONFIG_PPC64 region. This avoids code bloat on 32bit systems. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
3925f46b |
|
22-Apr-2013 |
Anshuman Khandual <khandual@linux.vnet.ibm.com> |
powerpc/perf: Enable branch stack sampling framework Provides basic enablement for perf branch stack sampling framework on POWER8 processor based platforms. Adds new BHRB related elements into cpu_hw_event structure to represent current BHRB config, BHRB filter configuration, manage context and to hold output BHRB buffer during PMU interrupt before passing to the user space. This also enables processing of BHRB data and converts them into generic perf branch stack data format. Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
8f61aa32 |
|
25-Apr-2013 |
Michael Ellerman <michael@ellerman.id.au> |
powerpc/perf: Add support for SIER On power8 we have a new SIER (Sampled Instruction Event Register), which captures information about instructions when we have random sampling enabled. Add support for loading the SIER into pt_regs, overloading regs->dar. Also set the new NO_SIPR flag in regs->result if we don't have SIPR. Update regs_sihv/sipr() to look for SIPR/SIHV in SIER. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
860aad71 |
|
25-Apr-2013 |
Michael Ellerman <michael@ellerman.id.au> |
powerpc/perf: Add regs_no_sipr() On power8 the presence or absence of SIPR depends on settings at runtime, so convert to using a dynamic flag for NO_SIPR. Existing backends that set NO_SIPR unconditionally set the dynamic flag obviously. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
33904054 |
|
25-Apr-2013 |
Michael Ellerman <michael@ellerman.id.au> |
powerpc/perf: Add an accessor for regs->result Add an accessor for regs->result so we can use it to store more flags in future. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
5682c460 |
|
25-Apr-2013 |
Michael Ellerman <michael@ellerman.id.au> |
powerpc/perf: Convert mmcra_sipr/sihv() to regs_sipr/sihv() On power8 the SIPR and SIHV are not in MMCRA, so convert the routines to take regs and change the names accordingly. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
7a786832 |
|
25-Apr-2013 |
Michael Ellerman <michael@ellerman.id.au> |
powerpc/perf: Add an explict flag indicating presence of SLOT field In perf_ip_adjust() we potentially use the MMCRA[SLOT] field to adjust the reported IP of a sampled instruction. Currently the logic is written so that if the backend does NOT have the PPMU_ALT_SIPR flag set then we assume MMCRA[SLOT] exists. However on power8 we do not want to set ALT_SIPR (it's in a third location), and we also do not have MMCRA[SLOT]. So add a new flag which only indicates whether MMCRA[SLOT] exists. Naively we'd set it on everything except power6/7, because they set ALT_SIPR, and we've reversed the polarity of the flag. But it's more complicated than that. mpc7450 is 32-bit, and uses its own version of perf_ip_adjust() which doesn't use MMCRA[SLOT], so it doesn't need the new flag set and the behaviour is unchanged. PPC970 (and I assume power4) don't have MMCRA[SLOT], so shouldn't have the new flag set. This is a behaviour change on those cpus, though we were probably getting lucky and the bits in question were 0. power5 and power5+ set the new flag, behaviour unchanged. power6 & power7 do not set the new flag, behaviour unchanged. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
1c53a270 |
|
22-Jan-2013 |
Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> |
perf/POWER7: Make generic event translations available in sysfs Make the generic perf events in POWER7 available via sysfs. $ ls /sys/bus/event_source/devices/cpu/events branch-instructions branch-misses cache-misses cache-references cpu-cycles instructions stalled-cycles-backend stalled-cycles-frontend $ cat /sys/bus/event_source/devices/cpu/events/cache-misses event=0x400f0 This patch is based on commits that implement this functionality on x86. Eg: commit a47473939db20e3961b200eb00acf5fcf084d755 Author: Jiri Olsa <jolsa@redhat.com> Date: Wed Oct 10 14:53:11 2012 +0200 perf/x86: Make hardware event translations available in sysfs Changelog:[v2] [Jiri Osla] Drop EVENT_ID() macro since it is only used once. Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Anton Blanchard <anton@au1.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Robert Richter <robert.richter@amd.com> Cc: Stephane Eranian <eranian@google.com> Cc: linuxppc-dev@ozlabs.org Link: http://lkml.kernel.org/r/20130123062454.GD13720@us.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
f53d168c |
|
24-Jan-2013 |
sukadev@linux.vnet.ibm.com <sukadev@linux.vnet.ibm.com> |
perf/Power: PERF_EVENT_IOC_ENABLE does not reenable event perf/Power: PERF_EVENT_IOC_ENABLE does not reenable event If we disable a perf event because we exceeded the specified ->event_limit, power_pmu_stop() sets the PERF_HES_STOPPED flag on the event. If the application then re-enables the event using PERF_EVENT_IOC_ENABLE ioctl, we don't ever clear this STOPPED flag. Consequently, the user space is never notified of the event. Following message has more background and test case. http://lists.eecs.utk.edu/pipermail/ptools-perfapi/2012-October/002528.html Used the following test cases to verify that this patch works on latest PAPI. $ papi.git/src/ctests/nonthread PAPI_TOT_CYC@5000000 $ papi.git/src/ctests/overflow_single_event Changelog[v2]: - [Paul Mackerras] Also clear PERF_HES_UPTODATE flag since we are restarting the event; cleanup comments and patch description. Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
e13e895f |
|
05-Nov-2012 |
Michael Neuling <mikey@neuling.org> |
powerpc/perf: Fix for PMCs not making progress On POWER7 when we have really small counts left before overflow, we can take a PMU IRQ, but the PMC gets wound back to just before the overflow. If the kernel is setting the PMC to a value just before the overflow, we can get interrupted again without the PMC making any progress (ie another buggy overflow). In this case, we can end up making no forward progress, with the PMC interrupt returning us to the same count over and over. The below detects when we are making no forward progress (ie. delta = 0) and then increases the amount left before the overflow. This stops us from locking up. Signed-off-by: Michael Neuling <mikey@neuling.org> Reviewed-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> cc: Paul Mackerras <paulus@samba.org> cc: Anton Blanchard <anton@samba.org> cc: Linux PPC dev <linuxppc-dev@ozlabs.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
bc09c219 |
|
05-Nov-2012 |
Michael Neuling <mikey@neuling.org> |
powerpc/perf: Fix finding overflowed PMC in interrupt If a PMC is about to overflow on a counter that's on an active perf event (ie. less than 256 from the end) and a _different_ PMC overflows just at this time (a PMC that's not on an active perf event), we currently mark the event as found, but in reality it's not as it's likely the other PMC that caused the IRQ. Since we mark it as found the second catch all for overflows doesn't run, and we don't reset the overflowing PMC ever. Hence we keep hitting that same PMC IRQ over and over and don't reset the actual overflowing counter. This is a rewrite of the perf interrupt handler for book3s to get around this. We now check to see if any of the PMCs have actually overflowed (ie >= 0x80000000). If yes, record it for active counters and just reset it for inactive counters. If it's not overflowed, then we check to see if it's one of the buggy power7 counters and if it is, record it and continue. If none of the PMCs match this, then we make note that we couldn't find the PMC that caused the IRQ. Signed-off-by: Michael Neuling <mikey@neuling.org> Reviewed-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> cc: Paul Mackerras <paulus@samba.org> cc: Anton Blanchard <anton@samba.org> cc: Linux PPC dev <linuxppc-dev@ozlabs.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
72523d80 |
|
17-Oct-2012 |
Benjamin Herrenschmidt <benh@kernel.crashing.org> |
Revert "powerpc/perf: Use pmc_overflow() to detect rolled back events" This reverts commit 813312110bede27bffd082c25cd31730bd567beb. This revert was requested by the author of the patch as it seems to cause system hangs with some low frequency events
|
#
e6878835 |
|
18-Sep-2012 |
sukadev@linux.vnet.ibm.com <sukadev@linux.vnet.ibm.com> |
powerpc/perf: Sample only if SIAR-Valid bit is set in P7+ powerpc/perf: Sample only if SIAR-Valid bit is set in P7+ On POWER7+ two new bits (mmcra[35] and mmcra[36]) indicate whether the contents of SIAR and SDAR are valid. For marked instructions on P7+, we must save the contents of SIAR and SDAR registers only if these new bits are set. This code/check for the SIAR-Valid bit is specific to P7+, so rather than waste a CPU-feature bit use the PVR flag. Note that Carl Love proposed a similar change for oprofile: https://lkml.org/lkml/2012/6/22/309 Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
d3dbeef6 |
|
19-Aug-2012 |
Michael Ellerman <michael@ellerman.id.au> |
powerpc: Rename 64-bit PVR constants to PVR_foo We have an old FIXME in reg.h which points out that we should standardise on PVR_foo for our PVR #defines. Currently we use PVR_ on 32-bit and PV_ on 64-bit. So do that rename and remove the FIXME. Seeing as we're touching all but one usage of __is_processor(), rename it to something less ugly and more indicative of what it does, which is simply to check the PVR version. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
81331211 |
|
07-Aug-2012 |
Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> |
powerpc/perf: Use pmc_overflow() to detect rolled back events For certain speculative events on Power7, 'perf stat' reports far higher event count than 'perf record' for the same event. As described in following commit, a performance monitor exception is raised even when the the performance events are rolled back. commit 0837e3242c73566fc1c0196b4ec61779c25ffc93 Author: Anton Blanchard <anton@samba.org> Date: Wed Mar 9 14:38:42 2011 +1100 perf_event_interrupt() records an event only when an overflow occurs. But this check for overflow is a simple 'if (val < 0)'. Because the events are rolled back, this check for overflow fails and the event is not recorded. perf_event_interrupt() later uses pmc_overflow() to detect the overflow and resets the counters and the events are lost completely. To properly detect the overflow of rolled back events, use pmc_overflow() even when recording events. To reproduce: $ cat strcpy.c #include <stdio.h> #include <string.h> main() { char buf[256]; alarm(5); while(1) strcpy(buf, "string1"); } $ perf record -e r20014 ./strcpy $ perf report -n > report.1 $ perf stat -e r20014 > report.2 # Compare report.1 and report.2 Reported-by: Maynard Johnson <mpjohn@us.ibm.com> Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
5c093efa |
|
25-Jun-2012 |
Anton Blanchard <anton@samba.org> |
powerpc/perf: Always use pt_regs for userspace samples At the moment we always use the SIAR if the PMU supports continuous sampling. Unfortunately the SIAR and the PMU exception are not synchronised for non marked events so we can end up with callchains that dont make sense. The following patch checks the HV and PR bits for samples coming from userspace and always uses pt_regs for them. Userspace will never have interrupts off so there is no real advantage to using the SIAR for non marked events in userspace. I had experimented with a patch that did a similar thing for kernel samples but we lost a significant amount of information. I was unable to profile any of our early exception code for example. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
75382aa7 |
|
25-Jun-2012 |
Anton Blanchard <anton@samba.org> |
powerpc/perf: Move code to select SIAR or pt_regs into perf_read_regs The logic to choose whether to use the SIAR or get the information out of pt_regs is going to get more complicated, so do it once in perf_read_regs. We overload regs->result which is gross but we are already doing it with regs->dsisr. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
68b30bb9 |
|
25-Jun-2012 |
Anton Blanchard <anton@samba.org> |
powerpc/perf: Create mmcra_sihv/mmcra_sipv helpers We want to access the MMCRA_SIHV and MMCRA_SIPR bits elsewhere so create mmcra_sihv and mmcra_sipr which hide the differences between the old and new layout of the bits. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
fd0d000b |
|
02-Apr-2012 |
Robert Richter <robert.richter@amd.com> |
perf: Pass last sampling period to perf_sample_data_init() We always need to pass the last sample period to perf_sample_data_init(), otherwise the event distribution will be wrong. Thus, modifiyng the function interface with the required period as argument. So basically a pattern like this: perf_sample_data_init(&data, ~0ULL); data.period = event->hw.last_period; will now be like that: perf_sample_data_init(&data, ~0ULL, event->hw.last_period); Avoids unininitialized data.period and simplifies code. Signed-off-by: Robert Richter <robert.richter@amd.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1333390758-10893-3-git-send-email-robert.richter@amd.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
1ce447b9 |
|
26-Mar-2012 |
Benjamin Herrenschmidt <benh@kernel.crashing.org> |
powerpc/perf: Fix instruction address sampling on 970 and Power4 970 and Power4 don't support "continuous sampling" which means that when we aren't in marked instruction sampling mode (marked events), SIAR isn't updated with the last instruction sampled before the perf interrupt. On those processors, we must thus use the exception SRR0 value as the sampled instruction pointer. Those processors also don't support the SIPR and SIHV bits in MMCRA which means we need some kind of heuristic to decide if SIAR values represent kernel or user addresses. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
f2699491 |
|
20-Feb-2012 |
Michael Ellerman <michael@ellerman.id.au> |
powerpc/perf: Move perf core & PMU code into a subdirectory The perf code has grown a lot since it started, and is big enough to warrant its own subdirectory. For reference it's ~60% bigger than the oprofile code. It declutters the kernel directory, makes it simpler to grep for "just perf stuff", and allows us to shorten some filenames. While we're at it, make it more obvious that we have two implementations of the core perf logic. One for (roughly) Book3S CPUs, which was the original implementation, and the other for Freescale embedded CPUs. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|