History log of /linux-master/arch/sparc/kernel/perf_event.c
Revision Date Author Comments
# 3cc208ff 03-Jan-2024 Bjorn Helgaas <bhelgaas@google.com>

sparc: Fix typos

Fix typos, most reported by "codespell arch/sparc". Only touches
comments, no code changes.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: sparclinux@vger.kernel.org
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andreas Larsson <andreas@gaisler.com>
Link: https://lore.kernel.org/r/20240103231605.1801364-9-helgaas@kernel.org


# 56cd0aef 28-May-2019 Young Xiao <92siuyang@gmail.com>

sparc: perf: fix updated event period in response to PERF_EVENT_IOC_PERIOD

The PERF_EVENT_IOC_PERIOD ioctl command can be used to change the
sample period of a running perf_event. Consequently, when calculating
the next event period, the new period will only be considered after the
previous one has overflowed.

This patch changes the calculation of the remaining event ticks so that
they are offset if the period has changed.

See commit 3581fe0ef37c ("ARM: 7556/1: perf: fix updated event period in
response to PERF_EVENT_IOC_PERIOD") for details.

Signed-off-by: Young Xiao <92siuyang@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 945626db 06-Dec-2018 Steven Rostedt (VMware) <rostedt@goodmis.org>

sparc64: Use ftrace_graph_get_ret_stack() instead of curr_ret_stack

The structure of the ret_stack array on the task struct is going to
change, and accessing it directly via the curr_ret_stack index will no
longer give the ret_stack entry that holds the return address. To access
that, architectures must now use ftrace_graph_get_ret_stack() to get the
associated ret_stack that matches the saved return address.

Cc: sparclinux@vger.kernel.org
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 0f0a691f 29-Oct-2018 David S. Miller <davem@davemloft.net>

sparc64: Remvoe set_fs() from perf_callchain_user().

Ever since commit 88b0193d9418 ("perf/callchain: Force USER_DS when
invoking perf_callchain_user()") the caller does this for us.

Signed-off-by: David S. Miller <davem@davemloft.net>


# 455adb31 12-Oct-2018 David S. Miller <davem@davemloft.net>

sparc: Throttle perf events properly.

Like x86 and arm, call perf_sample_event_took() in perf event
NMI interrupt handler.

Signed-off-by: David S. Miller <davem@davemloft.net>


# cfdc3170 12-Oct-2018 David S. Miller <davem@davemloft.net>

sparc: Fix single-pcr perf event counter management.

It is important to clear the hw->state value for non-stopped events
when they are added into the PMU. Otherwise when the event is
scheduled out, we won't read the counter because HES_UPTODATE is still
set. This breaks 'perf stat' and similar use cases, causing all the
events to show zero.

This worked for multi-pcr because we make explicit sparc_pmu_start()
calls in calculate_multiple_pcrs(). calculate_single_pcr() doesn't do
this because the idea there is to accumulate all of the counter
settings into the single pcr value. So we have to add explicit
hw->state handling there.

Like x86, we use the PERF_HES_ARCH bit to track truly stopped events
so that we don't accidently start them on a reload.

Related to all of this, sparc_pmu_start() is missing a userpage update
so add it.

Signed-off-by: David S. Miller <davem@davemloft.net>


# edb39592 15-Mar-2018 Peter Zijlstra <peterz@infradead.org>

perf: Fix sibling iteration

Mark noticed that the change to sibling_list changed some iteration
semantics; because previously we used group_list as list entry,
sibling events would always have an empty sibling_list.

But because we now use sibling_list for both list head and list entry,
siblings will report as having siblings.

Fix this with a custom for_each_sibling_event() iterator.

Fixes: 8343aae66167 ("perf/core: Remove perf_event::group_entry")
Reported-by: Mark Rutland <mark.rutland@arm.com>
Suggested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: vincent.weaver@maine.edu
Cc: alexander.shishkin@linux.intel.com
Cc: torvalds@linux-foundation.org
Cc: alexey.budankov@linux.intel.com
Cc: valery.cherepennikov@intel.com
Cc: eranian@google.com
Cc: acme@redhat.com
Cc: linux-tip-commits@vger.kernel.org
Cc: davidcc@google.com
Cc: kan.liang@intel.com
Cc: Dmitry.Prohorov@intel.com
Cc: jolsa@redhat.com
Link: https://lkml.kernel.org/r/20180315170129.GX4043@hirez.programming.kicks-ass.net


# 7eb709f2 15-Mar-2018 Peter Zijlstra <peterz@infradead.org>

perf: Fix sibling iteration

Mark noticed that the change to sibling_list changed some iteration
semantics; because previously we used group_list as list entry,
sibling events would always have an empty sibling_list.

But because we now use sibling_list for both list head and list entry,
siblings will report as having siblings.

Fix this with a custom for_each_sibling_event() iterator.

Fixes: 8343aae66167 ("perf/core: Remove perf_event::group_entry")
Reported-by: Mark Rutland <mark.rutland@arm.com>
Suggested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: vincent.weaver@maine.edu
Cc: alexander.shishkin@linux.intel.com
Cc: torvalds@linux-foundation.org
Cc: alexey.budankov@linux.intel.com
Cc: valery.cherepennikov@intel.com
Cc: eranian@google.com
Cc: acme@redhat.com
Cc: linux-tip-commits@vger.kernel.org
Cc: davidcc@google.com
Cc: kan.liang@intel.com
Cc: Dmitry.Prohorov@intel.com
Cc: jolsa@redhat.com
Link: https://lkml.kernel.org/r/20180315170129.GX4043@hirez.programming.kicks-ass.net


# 8343aae6 13-Nov-2017 Peter Zijlstra <peterz@infradead.org>

perf/core: Remove perf_event::group_entry

Now that all the grouping is done with RB trees, we no longer need
group_entry and can replace the whole thing with sibling_list.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Carrillo-Cisneros <davidcc@google.com>
Cc: Dmitri Prokhorov <Dmitry.Prohorov@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Valery Cherepennikov <valery.cherepennikov@intel.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# b2441318 01-Nov-2017 Greg Kroah-Hartman <gregkh@linuxfoundation.org>

License cleanup: add SPDX GPL-2.0 license identifier to files with no license

Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.

By default all files without license information are under the default
license of the kernel, which is GPL version 2.

Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.

This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.

How this work was done:

Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,

Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.

The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.

The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.

Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if <5
lines).

All documentation files were explicitly excluded.

The following heuristics were used to determine which SPDX license
identifiers to apply.

- when both scanners couldn't find any license traces, file was
considered to have no license information in it, and the top level
COPYING file license applied.

For non */uapi/* files that summary was:

SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 11139

and resulted in the first patch in this series.

If that file was a */uapi/* path one, it was "GPL-2.0 WITH
Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was:

SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 WITH Linux-syscall-note 930

and resulted in the second patch in this series.

- if a file had some form of licensing information in it, and was one
of the */uapi/* ones, it was denoted with the Linux-syscall-note if
any GPL family license was found in the file or had no licensing in
it (per prior point). Results summary:

SPDX license identifier # files
---------------------------------------------------|------
GPL-2.0 WITH Linux-syscall-note 270
GPL-2.0+ WITH Linux-syscall-note 169
((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
LGPL-2.1+ WITH Linux-syscall-note 15
GPL-1.0+ WITH Linux-syscall-note 14
((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
LGPL-2.0+ WITH Linux-syscall-note 4
LGPL-2.1 WITH Linux-syscall-note 3
((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1

and that resulted in the third patch in this series.

- when the two scanners agreed on the detected license(s), that became
the concluded license(s).

- when there was disagreement between the two scanners (one detected a
license but the other didn't, or they both detected different
licenses) a manual inspection of the file occurred.

- In most cases a manual inspection of the information in the file
resulted in a clear resolution of the license that should apply (and
which scanner probably needed to revisit its heuristics).

- When it was not immediately clear, the license identifier was
confirmed with lawyers working with the Linux Foundation.

- If there was any question as to the appropriate license identifier,
the file was flagged for further research and to be revisited later
in time.

In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.

Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights. The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.

Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.

In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.

Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
- a full scancode scan run, collecting the matched texts, detected
license ids and scores
- reviewing anything where there was a license detected (about 500+
files) to ensure that the applied SPDX license was correct
- reviewing anything where there was no detection but the patch license
was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
SPDX license was correct

This produced a worksheet with 20 files needing minor correction. This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.

These .csv files were then reviewed by Greg. Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected. This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.) Finally Greg ran the script using the .csv files to
generate the patches.

Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# 3b1fff08 10-May-2016 Arnaldo Carvalho de Melo <acme@redhat.com>

perf core: Add a 'nr' field to perf_event_callchain_context

We will use it to count how many addresses are in the entry->ip[] array,
excluding PERF_CONTEXT_{KERNEL,USER,etc} entries, so that we can really
return the number of entries specified by the user via the relevant
sysctl, kernel.perf_event_max_contexts, or via the per event
perf_event_attr.sample_max_stack knob.

This way we keep the perf_sample->ip_callchain->nr meaning, that is the
number of entries, be it real addresses or PERF_CONTEXT_ entries, while
honouring the max_stack knobs, i.e. the end result will be max_stack
entries if we have at least that many entries in a given stack trace.

Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-s8teto51tdqvlfhefndtat9r@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# cfbcf468 27-Apr-2016 Arnaldo Carvalho de Melo <acme@redhat.com>

perf core: Pass max stack as a perf_callchain_entry context

This makes perf_callchain_{user,kernel}() receive the max stack
as context for the perf_callchain_entry, instead of accessing
the global sysctl_perf_event_max_stack.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Milian Wolff <milian.wolff@kdab.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: Wang Nan <wangnan0@huawei.com>
Cc: Zefan Li <lizefan@huawei.com>
Link: http://lkml.kernel.org/n/tip-kolmn1yo40p7jhswxwrc7rrd@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# c5dfd78e 20-Apr-2016 Arnaldo Carvalho de Melo <acme@redhat.com>

perf core: Allow setting up max frame stack depth via sysctl

The default remains 127, which is good for most cases, and not even hit
most of the time, but then for some cases, as reported by Brendan, 1024+
deep frames are appearing on the radar for things like groovy, ruby.

And in some workloads putting a _lower_ cap on this may make sense. One
that is per event still needs to be put in place tho.

The new file is:

# cat /proc/sys/kernel/perf_event_max_stack
127

Chaging it:

# echo 256 > /proc/sys/kernel/perf_event_max_stack
# cat /proc/sys/kernel/perf_event_max_stack
256

But as soon as there is some event using callchains we get:

# echo 512 > /proc/sys/kernel/perf_event_max_stack
-bash: echo: write error: Device or resource busy
#

Because we only allocate the callchain percpu data structures when there
is a user, which allows for changing the max easily, its just a matter
of having no callchain users at that point.

Reported-and-Tested-by: Brendan Gregg <brendan.d.gregg@gmail.com>
Reviewed-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: David Ahern <dsahern@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Milian Wolff <milian.wolff@kdab.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: Wang Nan <wangnan0@huawei.com>
Cc: Zefan Li <lizefan@huawei.com>
Link: http://lkml.kernel.org/r/20160426002928.GB16708@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# 83352694 22-Dec-2015 Rob Gardner <rob.gardner@oracle.com>

sparc64: Perf should save/restore fault info

There have been several reports of random processes being killed with
a bus error or segfault during userspace stack walking in perf. One
of the root causes of this problem is an asynchronous modification to
thread_info fault_address and fault_code, which stems from a perf
counter interrupt arriving during kernel processing of a "benign"
fault, such as a TSB miss. Since perf_callchain_user() invokes
copy_from_user() to read user stacks, a fault is not only possible,
but probable. Validity checks on the stack address merely cover up the
problem and reduce its frequency.

The solution here is to save and restore fault_address and fault_code
in perf_callchain_user() so that the benign fault handler is not
disturbed by a perf interrupt.

Signed-off-by: Rob Gardner <rob.gardner@oracle.com>
Signed-off-by: Dave Aldridge <david.j.aldridge@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 3f74306a 22-Dec-2015 Rob Gardner <rob.gardner@oracle.com>

sparc64: Ensure perf can access user stacks

When an interrupt (such as a perf counter interrupt) is delivered
while executing in user space, the trap entry code puts ASI_AIUS in
%asi so that copy_from_user() and copy_to_user() will access the
correct memory. But if a perf counter interrupt is delivered while the
cpu is already executing in kernel space, then the trap entry code
will put ASI_P in %asi, and this will prevent copy_from_user() from
reading any useful stack data in either of the perf_callchain_user_X
functions, and thus no user callgraph data will be collected for this
sample period. An additional problem is that a fault is guaranteed
to occur, and though it will be silently covered up, it wastes time
and could perturb state.

In perf_callchain_user(), we ensure that %asi contains ASI_AIUS
because we know for a fact that the subsequent calls to
copy_from_user() are intended to read the user's stack.

[ Use get_fs()/set_fs() -DaveM ]

Signed-off-by: Rob Gardner <rob.gardner@oracle.com>
Signed-off-by: Dave Aldridge <david.j.aldridge@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 90eec103 16-Nov-2015 Peter Zijlstra <peterz@infradead.org>

treewide: Remove old email address

There were still a number of references to my old Red Hat email
address in the kernel source. Remove these while keeping the
Red Hat copyright notices intact.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 8f3e5684 03-Sep-2015 Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>

perf/core: Drop PERF_EVENT_TXN

We currently use PERF_EVENT_TXN flag to determine if we are in the middle
of a transaction. If in a transaction, we defer the schedulability checks
from pmu->add() operation to the pmu->commit() operation.

Now that we have "transaction types" (PERF_PMU_TXN_ADD, PERF_PMU_TXN_READ)
we can use the type to determine if we are in a transaction and drop the
PERF_EVENT_TXN flag.

When PERF_EVENT_TXN is dropped, the cpuhw->group_flag on some architectures
becomes unused, so drop that field as well.

This is an extension of the Powerpc patch from Peter Zijlstra to s390,
Sparc and x86 architectures.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1441336073-22750-11-git-send-email-sukadev@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# fbbe0701 03-Sep-2015 Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>

perf/core: Add a 'flags' parameter to the PMU transactional interfaces

Currently, the PMU interface allows reading only one counter at a time.
But some PMUs like the 24x7 counters in Power, support reading several
counters at once. To leveage this functionality, extend the transaction
interface to support a "transaction type".

The first type, PERF_PMU_TXN_ADD, refers to the existing transactions,
i.e. used to _schedule_ all the events on the PMU as a group. A second
transaction type, PERF_PMU_TXN_READ, will be used in a follow-on patch,
by the 24x7 counters to read several counters at once.

Extend the transaction interfaces to the PMU to accept a 'txn_flags'
parameter and use this parameter to ignore any transactions that are
not of type PERF_PMU_TXN_ADD.

Thanks to Peter Zijlstra for his input.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
[peterz: s390 compile fix]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1441336073-22750-3-git-send-email-sukadev@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 84558376 03-Sep-2015 Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>

sparc, perf/sparc: Remove unnecessary assignment

In ->commit_txn() 'cpuc' is already initialized when it is
declared, so we can remove the duplicate assignment.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: sparclinux@vger.kernel.org
Link: http://lkml.kernel.org/r/1441336073-22750-2-git-send-email-sukadev@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 2d89cd86 15-Jun-2015 David Ahern <david.ahern@oracle.com>

sparc64: perf: Use UREG_FP rather than UREG_I6

perf walks userspace callchains by following frame pointers. Use the
UREG_FP macro to make it clearer that the %fp is being used.

Signed-off-by: David Ahern <david.ahern@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# b69fb769 15-Jun-2015 David Ahern <david.ahern@oracle.com>

sparc64: perf: Add sanity checking on addresses in user stack

Processes are getting killed (sigbus or segv) while walking userspace
callchains when using perf. In some instances I have seen ufp = 0x7ff
which does not seem like a proper stack address.

This patch adds a function to run validity checks against the address
before attempting the copy_from_user. The checks are copied from the
x86 version as a start point with the addition of a 4-byte alignment
check.

Signed-off-by: David Ahern <david.ahern@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c17af4dd 15-Jun-2015 David Ahern <david.ahern@oracle.com>

sparc: perf: Disable pagefaults while walking userspace stacks

Page faults generated walking userspace stacks can call schedule to switch
out the task. When collecting callchains for scheduler tracepoints this
causes a deadlock as the tracepoints can be hit with the runqueue lock held:

[ 8138.159054] WARNING: CPU: 758 PID: 12488 at /opt/dahern/linux.git/arch/sparc/kernel/nmi.c:80 perfctr_irq+0x1f8/0x2b4()

[ 8138.203152] Watchdog detected hard LOCKUP on cpu 758

[ 8138.410969] CPU: 758 PID: 12488 Comm: perf Not tainted 4.0.0-rc6+ #6
[ 8138.437146] Call Trace:
[ 8138.447193] [000000000045cdd4] warn_slowpath_common+0x7c/0xa0
[ 8138.471238] [000000000045ce90] warn_slowpath_fmt+0x30/0x40
[ 8138.494189] [0000000000983e38] perfctr_irq+0x1f8/0x2b4
[ 8138.515716] [00000000004209f4] tl0_irq15+0x14/0x20
[ 8138.535791] [00000000009839ec] _raw_spin_trylock_bh+0x68/0x108
[ 8138.560180] [0000000000980018] __schedule+0xcc/0x710
[ 8138.580981] [00000000009806dc] preempt_schedule_common+0x10/0x3c
[ 8138.606082] [000000000098077c] _cond_resched+0x34/0x44
[ 8138.627603] [0000000000565990] kmem_cache_alloc_node+0x24/0x1a0
[ 8138.652345] [0000000000450b60] tsb_grow+0xac/0x488
[ 8138.672429] [0000000000985040] do_sparc64_fault+0x4dc/0x6e4
[ 8138.695736] [0000000000407c2c] sparc64_realfault_common+0x10/0x20
[ 8138.721202] [00000000006f2e24] NG4copy_from_user+0xa4/0x3c0
[ 8138.744510] [000000000044f900] perf_callchain_user+0x5c/0x6c
[ 8138.768182] [0000000000517b5c] perf_callchain+0x16c/0x19c
[ 8138.790774] [0000000000515f84] perf_prepare_sample+0x68/0x218
[ 8138.814801] ---[ end trace 42ca6294b1ff7573 ]---

As with PowerPC (b59a1bfcc240, "powerpc/perf: Disable pagefaults during
callchain stack read") disable pagefaults while walking userspace stacks.

Signed-off-by: David Ahern <david.ahern@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# df386375 21-Apr-2015 David S. Miller <davem@davemloft.net>

sparc64: Use M7 PMC write on all chips T4 and onward.

They both work equally well, and the M7 implementation is
simpler and cheaper (less register writes).

With help from David Ahern.

Signed-off-by: David S. Miller <davem@davemloft.net>


# b5aff55d 19-Mar-2015 David Ahern <david.ahern@oracle.com>

sparc: perf: Add support M7 processor

The M7 processor has a different hypervisor group id and different PCR fast
trap values. PIC read/write functions and PCR bit fields are the same as
the T4 so those are reused.

Signed-off-by: David Ahern <david.ahern@oracle.com>
Acked-by: Bob Picco <bob.picco@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# d51291cb 19-Mar-2015 David Ahern <david.ahern@oracle.com>

sparc: perf: Make counting mode actually work

Currently perf-stat (aka, counting mode) does not work:

$ perf stat ls
...
Performance counter stats for 'ls':

1.585665 task-clock (msec) # 0.580 CPUs utilized
24 context-switches # 0.015 M/sec
0 cpu-migrations # 0.000 K/sec
86 page-faults # 0.054 M/sec
<not supported> cycles
<not supported> stalled-cycles-frontend
<not supported> stalled-cycles-backend
<not supported> instructions
<not supported> branches
<not supported> branch-misses

0.002735100 seconds time elapsed

The reason is that state is never reset (stays with PERF_HES_UPTODATE set).
Add a call to sparc_pmu_enable_event during the added_event handling.
Clean up the encoding since pmu_start calls sparc_pmu_enable_event which
does the same. Passing PERF_EF_RELOAD to sparc_pmu_start means the call
to sparc_perf_event_set_period can be removed as well.

With this patch:

$ perf stat ls
...
Performance counter stats for 'ls':

1.552890 task-clock (msec) # 0.552 CPUs utilized
24 context-switches # 0.015 M/sec
0 cpu-migrations # 0.000 K/sec
86 page-faults # 0.055 M/sec
5,748,997 cycles # 3.702 GHz
<not supported> stalled-cycles-frontend:HG
<not supported> stalled-cycles-backend:HG
1,684,362 instructions:HG # 0.29 insns per cycle
295,133 branches:HG # 190.054 M/sec
28,007 branch-misses:HG # 9.49% of all branches

0.002815665 seconds time elapsed

Signed-off-by: David Ahern <david.ahern@oracle.com>
Acked-by: Bob Picco <bob.picco@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 5b0d4b55 19-Mar-2015 David Ahern <david.ahern@oracle.com>

sparc: perf: Remove redundant perf_pmu_{en|dis}able calls

perf_pmu_disable is called by core perf code before pmu->del and the
enable function is called by core perf code afterwards. No need to
call again within sparc_pmu_del.

Ditto for pmu->add and sparc_pmu_add.

Signed-off-by: David Ahern <david.ahern@oracle.com>
Acked-by: Bob Picco <bob.picco@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 05aa1651 16-Sep-2014 bob picco <bpicco@meloft.net>

sparc64: T5 PMU

The T5 (niagara5) has different PCR related HV fast trap values and a new
HV API Group. This patch utilizes these and shares when possible with niagara4.

We use the same sparc_pmu niagara4_pmu. Should there be new effort to
obtain the MCU perf statistics then this would have to be changed.

Cc: sparclinux@vger.kernel.org
Signed-off-by: Bob Picco <bob.picco@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 494fc421 16-Aug-2014 Christoph Lameter <cl@linux.com>

sparc: Replace __get_cpu_var uses

__get_cpu_var() is used for multiple purposes in the kernel source. One of
them is address calculation via the form &__get_cpu_var(x). This calculates
the address for the instance of the percpu variable of the current processor
based on an offset.

Other use cases are for storing and retrieving data from the current
processors percpu area. __get_cpu_var() can be used as an lvalue when
writing data or on the right side of an assignment.

__get_cpu_var() is defined as :

#define __get_cpu_var(var) (*this_cpu_ptr(&(var)))

__get_cpu_var() always only does an address determination. However, store
and retrieve operations could use a segment prefix (or global register on
other platforms) to avoid the address calculation.

this_cpu_write() and this_cpu_read() can directly take an offset into a
percpu area and use optimized assembly code to read and write per cpu
variables.

This patch converts __get_cpu_var into either an explicit address
calculation using this_cpu_ptr() or into a use of this_cpu operations that
use the offset. Thereby address calculations are avoided and less registers
are used when code is generated.

At the end of the patch set all uses of __get_cpu_var have been removed so
the macro is removed too.

The patch set includes passes over all arches as well. Once these operations
are used throughout then specialized macros can be defined in non -x86
arches as well in order to optimize per cpu access by f.e. using a global
register that may be set to the per cpu base.

Transformations done to __get_cpu_var()

1. Determine the address of the percpu instance of the current processor.

DEFINE_PER_CPU(int, y);
int *x = &__get_cpu_var(y);

Converts to

int *x = this_cpu_ptr(&y);

2. Same as #1 but this time an array structure is involved.

DEFINE_PER_CPU(int, y[20]);
int *x = __get_cpu_var(y);

Converts to

int *x = this_cpu_ptr(y);

3. Retrieve the content of the current processors instance of a per cpu
variable.

DEFINE_PER_CPU(int, y);
int x = __get_cpu_var(y)

Converts to

int x = __this_cpu_read(y);

4. Retrieve the content of a percpu struct

DEFINE_PER_CPU(struct mystruct, y);
struct mystruct x = __get_cpu_var(y);

Converts to

memcpy(&x, this_cpu_ptr(&y), sizeof(x));

5. Assignment to a per cpu variable

DEFINE_PER_CPU(int, y)
__get_cpu_var(y) = x;

Converts to

__this_cpu_write(y, x);

6. Increment/Decrement etc of a per cpu variable

DEFINE_PER_CPU(int, y);
__get_cpu_var(y)++

Converts to

__this_cpu_inc(y)

Cc: sparclinux@vger.kernel.org
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Tejun Heo <tj@kernel.org>


# 8bccf5b3 11-Aug-2014 David S. Miller <davem@davemloft.net>

sparc64: Fix pcr_ops initialization and usage bugs.

Christopher reports that perf_event_print_debug() can crash in uniprocessor
builds. The crash is due to pcr_ops being NULL.

This happens because pcr_arch_init() is only invoked by smp_cpus_done() which
only executes in SMP builds.

init_hw_perf_events() is closely intertwined with pcr_ops being setup properly,
therefore:

1) Call pcr_arch_init() early on from init_hw_perf_events(), instead of
from smp_cpus_done().

2) Do not hook up a PMU type if pcr_ops is NULL after pcr_arch_init().

3) Move init_hw_perf_events to a later initcall so that it we will be
sure to invoke pcr_arch_init() after all cpus are brought up.

Finally, guard the one naked sequence of pcr_ops dereferences in
__global_pmu_self() with an appropriate NULL check.

Reported-by: Christopher Alexander Tobias Schulze <cat.schulze@alice-dsl.net>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 265c1ffa 16-May-2014 Sam Ravnborg <sam@ravnborg.org>

sparc64: fix sparse warnings in perf_event.c

Fix following sparse warnings:
kernel/perf_event.c:113:1: warning: symbol 'cpu_hw_events' was not declared. Should it be static?
kernel/perf_event.c:1156:6: warning: symbol 'perf_event_grab_pmc' was not declared. Should it be static?
kernel/perf_event.c:1172:6: warning: symbol 'perf_event_release_pmc' was not declared. Should it be static?
kernel/perf_event.c:1672:12: warning: symbol 'init_hw_perf_events' was not declared. Should it be static?
kernel/perf_event.c:1749:52: warning: incorrect type in argument 2 (different address spaces)
kernel/perf_event.c:1772:60: warning: incorrect type in argument 2 (different address spaces)
kernel/perf_event.c:1779:60: warning: incorrect type in argument 2 (different address spaces)

Define the functions static as they are not used outside this file.
Fix it so copy_from_user are supplied with pointers annotated _user

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 517ffce4 26-Oct-2012 David S. Miller <davem@davemloft.net>

sparc64: Make montmul/montsqr/mpmul usable in 32-bit threads.

The Montgomery Multiply, Montgomery Square, and Multiple-Precision
Multiply instructions work by loading a combination of the floating
point and multiple register windows worth of integer registers
with the inputs.

These values are 64-bit. But for 32-bit userland processes we only
save the low 32-bits of each integer register during a register spill.
This is because the register window save area is in the user stack and
has a fixed layout.

Therefore, the only way to use these instruction in 32-bit mode is to
perform the following sequence:

1) Load the top-32bits of a choosen integer register with a sentinel,
say "-1". This will be in the outer-most register window.

The idea is that we're trying to see if the outer-most register
window gets spilled, and thus the 64-bit values were truncated.

2) Load all the inputs for the montmul/montsqr/mpmul instruction,
down to the inner-most register window.

3) Execute the opcode.

4) Traverse back up to the outer-most register window.

5) Check the sentinel, if it's still "-1" store the results.
Otherwise retry the entire sequence.

This retry is extremely troublesome. If you're just unlucky and an
interrupt or other trap happens, it'll push that outer-most window to
the stack and clear the sentinel when we restore it.

We could retry forever and never make forward progress if interrupts
arrive at a fast enough rate (consider perf events as one example).
So we have do limited retries and fallback to software which is
extremely non-deterministic.

Luckily it's very straightforward to provide a mechanism to let
32-bit applications use a 64-bit stack. Stacks in 64-bit mode are
biased by 2047 bytes, which means that the lowest bit is set in the
actual %sp register value.

So if we see bit zero set in a 32-bit application's stack we treat
it like a 64-bit stack.

Runtime detection of such a facility is tricky, and cumbersome at
best. For example, just trying to use a biased stack and seeing if it
works is hard to recover from (the signal handler will need to use an
alt stack, plus something along the lines of longjmp). Therefore, we
add a system call to report a bitmask of arch specific features like
this in a cheap and less hairy way.

With help from Andy Polyakov.

Signed-off-by: David S. Miller <davem@davemloft.net>


# e793d8c6 16-Oct-2012 David S. Miller <davem@davemloft.net>

sparc64: Fix bit twiddling in sparc_pmu_enable_event().

There was a serious disconnect in the logic happening in
sparc_pmu_disable_event() vs. sparc_pmu_enable_event().

Event disable is implemented by programming a NOP event into the PCR.

However, event enable was not reversing this operation. Instead, it
was setting the User/Priv/Hypervisor trace enable bits.

That's not sparc_pmu_enable_event()'s job, that's what
sparc_pmu_enable() and sparc_pmu_disable() do .

The intent of sparc_pmu_enable_event() is clear, since it first clear
out the event type encoding field. So fix this by OR'ing in the event
encoding rather than the trace enable bits.

Signed-off-by: David S. Miller <davem@davemloft.net>


# 08280e6c 14-Oct-2012 David S. Miller <davem@davemloft.net>

sparc64: Like x86 we should check current->mm during perf backtrace generation.

If the MM is not active, only report the top-level PC. Do not try to
access the address space.

Signed-off-by: David S. Miller <davem@davemloft.net>


# bab96bda 19-Aug-2012 David S. Miller <davem@davemloft.net>

sparc64: Update generic comments in perf event code to match reality.

Describe how we support two types of PMU setups, one with a single control
register and two counters stored in a single register, and another with
one control register per counter and each counter living in it's own
register.

Signed-off-by: David S. Miller <davem@davemloft.net>


# 035ea28d 18-Aug-2012 David S. Miller <davem@davemloft.net>

sparc64: Add SPARC-T4 perf event support.

Signed-off-by: David S. Miller <davem@davemloft.net>


# 7a37a0b8 17-Aug-2012 David S. Miller <davem@davemloft.net>

sparc64: Support perf event encoding for multi-PCR PMUs.

Signed-off-by: David S. Miller <davem@davemloft.net>


# b4f061a4 17-Aug-2012 David S. Miller <davem@davemloft.net>

sparc64: Make sparc_pmu_{enable,disable}_event() multi-pcr aware.

Signed-off-by: David S. Miller <davem@davemloft.net>


# 5ab96841 17-Aug-2012 David S. Miller <davem@davemloft.net>

sparc64: Rework sparc_pmu_enable() so that the side effects are clearer.

When cpuc->n_events is zero, we actually don't do anything and we just
write the cpuc->pcr[0] value as-is without any modifications.

The "pcr = 0;" assignment there was just useless and confusing.

Signed-off-by: David S. Miller <davem@davemloft.net>


# 3f1a2097 17-Aug-2012 David S. Miller <davem@davemloft.net>

sparc64: Prepare perf event layer for handling multiple PCR registers.

Make the per-cpu pcr save area an array instead of one u64.

Describe how many PCR and PIC registers the chip has in the sparc_pmu
descriptor.

Signed-off-by: David S. Miller <davem@davemloft.net>


# 7ac2ed28 17-Aug-2012 David S. Miller <davem@davemloft.net>

sparc64: Specify user and supervisor trace PCR bits in sparc_pmu.

Signed-off-by: David S. Miller <davem@davemloft.net>


# 5344303c 17-Aug-2012 David S. Miller <davem@davemloft.net>

sparc64: Abstract PMC read/write behind sparc_pmu.

Signed-off-by: David S. Miller <davem@davemloft.net>


# 59660495 17-Aug-2012 David S. Miller <davem@davemloft.net>

sparc64: Allow max hw perf events to be variable.

Now specified in sparc_pmu descriptor.

Signed-off-by: David S. Miller <davem@davemloft.net>


# b38e99f5 17-Aug-2012 David S. Miller <davem@davemloft.net>

sparc64: Add perf_event abstractions for orthogonal PMUs.

Starting with SPARC-T4 we have a seperate PCR control register
for each performance counter, and there are absolutely no
restrictions on what events can run on which counters.

Add flags that we can use to elide the conflict and dependency
logic used to handle older chips.

Signed-off-by: David S. Miller <davem@davemloft.net>


# 09d053c7 17-Aug-2012 David S. Miller <davem@davemloft.net>

sparc64: Abstract away PIC register accesses.

And, like for the PCR, allow indexing of different PIC register
numbers.

This also removes all of the non-__KERNEL__ bits from asm/perfctr.h,
nothing kernel side should include it any more.

Signed-off-by: David S. Miller <davem@davemloft.net>


# 0bab20ba 16-Aug-2012 David S. Miller <davem@davemloft.net>

sparc64: Add 'reg_num' argument to pcr_ops methods.

SPARC-T4 and later have multiple PCR registers, one for each
PIC counter.

Signed-off-by: David S. Miller <davem@davemloft.net>


# fd0d000b 02-Apr-2012 Robert Richter <robert.richter@amd.com>

perf: Pass last sampling period to perf_sample_data_init()

We always need to pass the last sample period to
perf_sample_data_init(), otherwise the event distribution will be
wrong. Thus, modifiyng the function interface with the required period
as argument. So basically a pattern like this:

perf_sample_data_init(&data, ~0ULL);
data.period = event->hw.last_period;

will now be like that:

perf_sample_data_init(&data, ~0ULL, event->hw.last_period);

Avoids unininitialized data.period and simplifies code.

Signed-off-by: Robert Richter <robert.richter@amd.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1333390758-10893-3-git-send-email-robert.richter@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# d550bbd4 28-Mar-2012 David Howells <dhowells@redhat.com>

Disintegrate asm/system.h for Sparc

Disintegrate asm/system.h for Sparc.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: sparclinux@vger.kernel.org


# 2481c5fa 09-Feb-2012 Stephane Eranian <eranian@google.com>

perf: Disable PERF_SAMPLE_BRANCH_* when not supported

PERF_SAMPLE_BRANCH_* is disabled for:

- SW events (sw counters, tracepoints)
- HW breakpoints
- ALL but Intel x86 architecture
- AMD64 processors

Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1328826068-11713-10-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 4ba991d3 27-Jul-2011 David S. Miller <davem@davemloft.net>

sparc: Detect and handle UltraSPARC-T3 cpu types.

The cpu compatible string we look for is "SPARC-T3".

As far as memset/memcpy optimizations go, we treat this chip the same
as Niagara-T2/T2+. Use cache initializing stores for memset, and use
perfetch, FPU block loads, cache initializing stores, and block stores
for copies.

We use the Niagara-T2 perf support, since T3 is a close relative in
this regard. Later we'll add support for the new events T3 can
report, plus enable T3's new "sample" mode.

For now I haven't added any new ELF hwcap flags. We probably need
to add a couple, for example:

T2 and T3 both support the population count instruction in hardware.

T3 supports VIS3 instructions, including support (finally) for
partitioned shift. One can also now move directly between float
and integer registers.

T3 supports instructions meant to help with Galois Field and other HPC
calculations, such as XOR multiply. Also there are "OP and negate"
instructions, for example "fnmul" which is multiply-and-negate.

T3 recognizes the transactional memory opcodes, however since
transactional memory isn't supported: 1) 'commit' behaves as a NOP and
2) 'chkpt' always branches 3) 'rdcps' returns all zeros and 4) 'wrcps'
behaves as a NOP.

So we'll need about 3 new elf capability flags in the end to represent
all of these things.

Signed-off-by: David S. Miller <davem@davemloft.net>


# 60063497 26-Jul-2011 Arun Sharma <asharma@fb.com>

atomic: use <linux/atomic.h>

This allows us to move duplicated code in <asm/atomic.h>
(atomic_inc_not_zero() for now) to <linux/atomic.h>

Signed-off-by: Arun Sharma <asharma@fb.com>
Reviewed-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: David Miller <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 89d6c0b5 22-Apr-2011 Peter Zijlstra <peterz@infradead.org>

perf, arch: Add generic NODE cache events

Add a NODE level to the generic cache events which is used to measure
local vs remote memory accesses. Like all other cache events, an
ACCESS is HIT+MISS, if there is no way to distinguish between reads
and writes do reads only etc..

The below needs filling out for !x86 (which I filled out with
unsupported events).

I'm fairly sure ARM can leave it like that since it doesn't strike me as
an architecture that even has NUMA support. SH might have something since
it does appear to have some NUMA bits.

Sparc64, PowerPC and MIPS certainly want a good look there since they
clearly are NUMA capable.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: David Miller <davem@davemloft.net>
Cc: Anton Blanchard <anton@samba.org>
Cc: David Daney <ddaney@caviumnetworks.com>
Cc: Deng-Cheng Zhu <dengcheng.zhu@gmail.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1303508226.4865.8.camel@laptop
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# a8b0ca17 27-Jun-2011 Peter Zijlstra <a.p.zijlstra@chello.nl>

perf: Remove the nmi parameter from the swevent and overflow interface

The nmi parameter indicated if we could do wakeups from the current
context, if not, we would set some state and self-IPI and let the
resulting interrupt do the wakeup.

For the various event classes:

- hardware: nmi=0; PMI is in fact an NMI or we run irq_work_run from
the PMI-tail (ARM etc.)
- tracepoint: nmi=0; since tracepoint could be from NMI context.
- software: nmi=[0,1]; some, like the schedule thing cannot
perform wakeups, and hence need 0.

As one can see, there is very little nmi=1 usage, and the down-side of
not using it is that on some platforms some software events can have a
jiffy delay in wakeup (when arch_irq_work_raise isn't implemented).

The up-side however is that we can remove the nmi parameter and save a
bunch of conditionals in fast paths.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Michael Cree <mcree@orcon.net.nz>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Deng-Cheng Zhu <dengcheng.zhu@gmail.com>
Cc: Anton Blanchard <anton@samba.org>
Cc: Eric B Munson <emunson@mgebm.net>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jason Wessel <jason.wessel@windriver.com>
Cc: Don Zickus <dzickus@redhat.com>
Link: http://lkml.kernel.org/n/tip-agjev8eu666tvknpb3iaj0fg@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# cb1b8209 21-Apr-2011 Sam Ravnborg <sam@ravnborg.org>

sparc: consolidate show_cpuinfo in cpu.c

We have all the cpu related info in cpu.c - so move
the remaining functions to support /proc/cpuinfo to this file.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 25985edc 30-Mar-2011 Lucas De Marchi <lucas.demarchi@profusion.mobi>

Fix common misspellings

Fixes generated by 'codespell' and manually reviewed.

Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>


# 2e80a82a 17-Nov-2010 Peter Zijlstra <a.p.zijlstra@chello.nl>

perf: Dynamic pmu types

Extend the perf_pmu_register() interface to allow for named and
dynamic pmu types.

Because we need to support the existing static types we cannot use
dynamic types for everything, hence provide a type argument.

If we want to enumerate the PMUs they need a name, provide one.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20101117222056.259707703@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# efc70d24 09-Dec-2010 Ingo Molnar <mingo@elte.hu>

perf, sparc: Fix CONFIG_PERF_EVENTS=y build error

Fix a typo in:

004417a6d468: perf, arch: Cleanup perf-pmu init vs lockup-detector

Which caused a build failure on Sparc, reported by Stephen Rothwell.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 004417a6 25-Nov-2010 Peter Zijlstra <a.p.zijlstra@chello.nl>

perf, arch: Cleanup perf-pmu init vs lockup-detector

The perf hardware pmu got initialized at various points in the boot,
some before early_initcall() some after (notably arch_initcall).

The problem is that the NMI lockup detector is ran from early_initcall()
and expects the hardware pmu to be present.

Sanitize this by moving all architecture hardware pmu implementations to
initialize at early_initcall() and move the lockup detector to an explicit
initcall right after that.

Cc: paulus <paulus@samba.org>
Cc: davem <davem@davemloft.net>
Cc: Michael Cree <mcree@orcon.net.nz>
Cc: Deng-Cheng Zhu <dengcheng.zhu@gmail.com>
Acked-by: Paul Mundt <lethal@linux-sh.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1290707759.2145.119.camel@laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# b343ae51 12-Sep-2010 David S. Miller <davem@davemloft.net>

sparc64: Support RAW perf events.

Encoding is "(encoding << 16) | pic_mask"

Signed-off-by: David S. Miller <davem@davemloft.net>


# 15ac9a39 06-Sep-2010 Peter Zijlstra <a.p.zijlstra@chello.nl>

perf: Remove the sysfs bits

Neither the overcommit nor the reservation sysfs parameter were
actually working, remove them as they'll only get in the way.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: paulus <paulus@samba.org>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# a4eaf7f1 16-Jun-2010 Peter Zijlstra <a.p.zijlstra@chello.nl>

perf: Rework the PMU methods

Replace pmu::{enable,disable,start,stop,unthrottle} with
pmu::{add,del,start,stop}, all of which take a flags argument.

The new interface extends the capability to stop a counter while
keeping it scheduled on the PMU. We replace the throttled state with
the generic stopped state.

This also allows us to efficiently stop/start counters over certain
code paths (like IRQ handlers).

It also allows scheduling a counter without it starting, allowing for
a generic frozen state (useful for rotating stopped counters).

The stopped state is implemented in two different ways, depending on
how the architecture implemented the throttled state:

1) We disable the counter:
a) the pmu has per-counter enable bits, we flip that
b) we program a NOP event, preserving the counter state

2) We store the counter state and ignore all read/overflow events

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: paulus <paulus@samba.org>
Cc: stephane eranian <eranian@googlemail.com>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Lin Ming <ming.m.lin@intel.com>
Cc: Yanmin <yanmin_zhang@linux.intel.com>
Cc: Deng-Cheng Zhu <dengcheng.zhu@gmail.com>
Cc: David Miller <davem@davemloft.net>
Cc: Michael Cree <mcree@orcon.net.nz>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 33696fc0 14-Jun-2010 Peter Zijlstra <a.p.zijlstra@chello.nl>

perf: Per PMU disable

Changes perf_disable() into perf_pmu_disable().

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: paulus <paulus@samba.org>
Cc: stephane eranian <eranian@googlemail.com>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Lin Ming <ming.m.lin@intel.com>
Cc: Yanmin <yanmin_zhang@linux.intel.com>
Cc: Deng-Cheng Zhu <dengcheng.zhu@gmail.com>
Cc: David Miller <davem@davemloft.net>
Cc: Michael Cree <mcree@orcon.net.nz>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 24cd7f54 11-Jun-2010 Peter Zijlstra <a.p.zijlstra@chello.nl>

perf: Reduce perf_disable() usage

Since the current perf_disable() usage is only an optimization,
remove it for now. This eases the removal of the __weak
hw_perf_enable() interface.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: paulus <paulus@samba.org>
Cc: stephane eranian <eranian@googlemail.com>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Lin Ming <ming.m.lin@intel.com>
Cc: Yanmin <yanmin_zhang@linux.intel.com>
Cc: Deng-Cheng Zhu <dengcheng.zhu@gmail.com>
Cc: David Miller <davem@davemloft.net>
Cc: Michael Cree <mcree@orcon.net.nz>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# b0a873eb 11-Jun-2010 Peter Zijlstra <a.p.zijlstra@chello.nl>

perf: Register PMU implementations

Simple registration interface for struct pmu, this provides the
infrastructure for removing all the weak functions.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: paulus <paulus@samba.org>
Cc: stephane eranian <eranian@googlemail.com>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Lin Ming <ming.m.lin@intel.com>
Cc: Yanmin <yanmin_zhang@linux.intel.com>
Cc: Deng-Cheng Zhu <dengcheng.zhu@gmail.com>
Cc: David Miller <davem@davemloft.net>
Cc: Michael Cree <mcree@orcon.net.nz>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 51b0fe39 11-Jun-2010 Peter Zijlstra <a.p.zijlstra@chello.nl>

perf: Deconstify struct pmu

sed -ie 's/const struct pmu\>/struct pmu/g' `git grep -l "const struct pmu\>"`

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: paulus <paulus@samba.org>
Cc: stephane eranian <eranian@googlemail.com>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Lin Ming <ming.m.lin@intel.com>
Cc: Yanmin <yanmin_zhang@linux.intel.com>
Cc: Deng-Cheng Zhu <dengcheng.zhu@gmail.com>
Cc: David Miller <davem@davemloft.net>
Cc: Michael Cree <mcree@orcon.net.nz>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# f72c1a93 30-Jun-2010 Frederic Weisbecker <fweisbec@gmail.com>

perf: Factorize callchain context handling

Store the kernel and user contexts from the generic layer instead
of archs, this gathers some repetitive code.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Tested-by: Will Deacon <will.deacon@arm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: David Miller <davem@davemloft.net>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Borislav Petkov <bp@amd64.org>


# 56962b444 30-Jun-2010 Frederic Weisbecker <fweisbec@gmail.com>

perf: Generalize some arch callchain code

- Most archs use one callchain buffer per cpu, except x86 that needs
to deal with NMIs. Provide a default perf_callchain_buffer()
implementation that x86 overrides.

- Centralize all the kernel/user regs handling and invoke new arch
handlers from there: perf_callchain_user() / perf_callchain_kernel()
That avoid all the user_mode(), current->mm checks and so...

- Invert some parameters in perf_callchain_*() helpers: entry to the
left, regs to the right, following the traditional (dst, src).

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Tested-by: Will Deacon <will.deacon@arm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: David Miller <davem@davemloft.net>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Borislav Petkov <bp@amd64.org>


# 70791ce9 29-Jun-2010 Frederic Weisbecker <fweisbec@gmail.com>

perf: Generalize callchain_store()

callchain_store() is the same on every archs, inline it in
perf_event.h and rename it to perf_callchain_store() to avoid
any collision.

This removes repetitive code.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Tested-by: Will Deacon <will.deacon@arm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: David Miller <davem@davemloft.net>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Borislav Petkov <bp@amd64.org>


# b7d45c3f 23-Jun-2010 David S. Miller <davem@davemloft.net>

sparc64: Fix maybe_change_configuration() PCR setting.

Need to mask out the existing event bits before OR'ing in
the new ones.

Noticed by Peter Zijlstra.

Signed-off-by: David S. Miller <davem@davemloft.net>


# e7850595 21-May-2010 Peter Zijlstra <a.p.zijlstra@chello.nl>

perf: Convert perf_event to local_t

Since now all modification to event->count (and ->prev_count
and ->period_left) are local to a cpu, change then to local64_t so we
avoid the LOCK'ed ops.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 8d2cacbb 25-May-2010 Peter Zijlstra <a.p.zijlstra@chello.nl>

perf: Cleanup {start,commit,cancel}_txn details

Clarify some of the transactional group scheduling API details
and change it so that a successfull ->commit_txn also closes
the transaction.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <1274803086.5882.1752.camel@twins>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# a13c3afd 22-Apr-2010 Lin Ming <ming.m.lin@intel.com>

perf, sparc: Implement group scheduling transactional APIs

Convert to the transactional PMU API and remove the duplication of
group_sched_in().

[cross build only]

Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Acked-by: David Miller <davem@davemloft.net>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1272002193.5707.65.camel@minggr.sh.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 667f0cee 21-Apr-2010 David S. Miller <davem@davemloft.net>

sparc64: Fix stack dumping and tracing when function graph is enabled.

Like x86, when the function graph tracer is enabled, emit the ftrace
stub as well as the program counter it will be transformed back into.

We duplicate a lot of similar stack walking logic in 3 or 4 spots, so
eventually we should consolidate things like x86 does.

Thanks to Frederic Weisbecker for pointing this out.

Signed-off-by: David S. Miller <davem@davemloft.net>


# 9e8307ec 29-Mar-2010 David S. Miller <davem@davemloft.net>

sparc64: Properly truncate pt_regs framepointer in perf callback.

For 32-bit processes, we save the full 64-bits of the regs in pt_regs.

But unlike when the userspace actually does load and store
instructions, the top 32-bits don't get automatically truncated by the
cpu in kernel mode (because the kernel doesn't execute with PSTATE_AM
address masking enabled).

So we have to do it by hand.

Reported-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# dc1d628a 03-Mar-2010 Peter Zijlstra <a.p.zijlstra@chello.nl>

perf: Provide generic perf_sample_data initialization

This makes it easier to extend perf_sample_data and fixes a bug on arm
and sparc, which failed to set ->raw to NULL, which can cause crashes
when combined with PERF_SAMPLE_RAW.

It also optimizes PowerPC and tracepoint, because the struct
initialization is forced to zero out the whole structure.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Jean Pihet <jpihet@mvista.com>
Reviewed-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Cc: Jamie Iles <jamie.iles@picochip.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: stable@kernel.org
LKML-Reference: <20100304140100.315416040@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 6e37738a 11-Feb-2010 Peter Zijlstra <peterz@infradead.org>

perf_events: Simplify code by removing cpu argument to hw_perf_group_sched_in()

Since the cpu argument to hw_perf_group_sched_in() is always
smp_processor_id(), simplify the code a little by removing this argument
and using the current cpu where needed.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: David Miller <davem@davemloft.net>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <1265890918.5396.3.camel@laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 3ad2f3fb 02-Feb-2010 Daniel Mack <daniel@caiaq.de>

tree-wide: Assorted spelling fixes

In particular, several occurances of funny versions of 'success',
'unknown', 'therefore', 'acknowledge', 'argument', 'achieve', 'address',
'beginning', 'desirable', 'separate' and 'necessary' are fixed.

Signed-off-by: Daniel Mack <daniel@caiaq.de>
Cc: Joe Perches <joe@perches.com>
Cc: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>


# e7bef6b0 20-Jan-2010 David S. Miller <davem@davemloft.net>

sparc64: Fully support both performance counters.

Add the rest of the conflict detection and resolution logic necessary
to support more than one counter at a time on sparc64.

The structure and implementation closely mimicks that of powerpc.

Signed-off-by: David S. Miller <davem@davemloft.net>


# 4f6dbe4a 19-Jan-2010 David S. Miller <davem@davemloft.net>

sparc64: Add perf callchain support.

Pretty straightforward, and it should be easy to add accurate
walk through of signal stack frames in userspace.

Signed-off-by: David S. Miller <davem@davemloft.net>
Tested-by: Jens Axboe <jens.axboe@oracle.com>


# e04ed38d 05-Jan-2010 David S. Miller <davem@davemloft.net>

sparc64: Fix Niagara2 perf event handling.

For chips like Niagara2 that have true overflow indications
in the %pcr (which we don't actually need and don't use)
the interrupt signal persists until the overflow bits are
cleared by an explicit %pcr write.

Signed-off-by: David S. Miller <davem@davemloft.net>


# de23cf3c 09-Oct-2009 David S. Miller <davem@davemloft.net>

sparc64: Fix niagara2 perf IRQ bits.

Signed-off-by: David S. Miller <davem@davemloft.net>


# d1751388 29-Sep-2009 David S. Miller <davem@davemloft.net>

sparc64: Cache per-cpu %pcr register value in perf code.

Signed-off-by: David S. Miller <davem@davemloft.net>


# 6e804251 29-Sep-2009 David S. Miller <davem@davemloft.net>

sparc64: Fix comment typo in perf_event.c

Signed-off-by: David S. Miller <davem@davemloft.net>


# d29862f0 28-Sep-2009 David S. Miller <davem@davemloft.net>

sparc64: Minor coding style fixups in perf code.

These got introduced during the counter --> event tree-wide
renaming.

Signed-off-by: David S. Miller <davem@davemloft.net>


# a72a8a5f 28-Sep-2009 David S. Miller <davem@davemloft.net>

sparc64: Add a basic conflict engine in preparation for multi-counter support.

The hardware counter ->event_base state records and encoding of
the "struct perf_event_map" entry used for the event.

We use this to make sure that when we have more than 1 event,
both can be scheduled into the hardware at the same time.

As usual, structure of code is largely cribbed from powerpc.

Signed-off-by: David S. Miller <davem@davemloft.net>


# 01552f76 27-Sep-2009 David S. Miller <davem@davemloft.net>

sparc64: Add initial perf event conflict resolution and checks.

Cribbed from powerpc code, as usual. :-)

Currently it is only used to validate that all counters
have the same user/kernel/hv attributes.

Signed-off-by: David S. Miller <davem@davemloft.net>


# 7eebda60 26-Sep-2009 David S. Miller <davem@davemloft.net>

sparc: Niagara1 perf event support.

This chip is extremely limited, and many of the events supported
are approximations at best.

Signed-off-by: David S. Miller <davem@davemloft.net>


# d0b86480 26-Sep-2009 David S. Miller <davem@davemloft.net>

sparc: Add Niagara2 HW cache event support.

Signed-off-by: David S. Miller <davem@davemloft.net>


# 28e8f9be 26-Sep-2009 David S. Miller <davem@davemloft.net>

sparc: Support all ultra3 and ultra4 derivatives.

For the generic events we support, all of these chips have
the same encodings as ultra3i.

Signed-off-by: David S. Miller <davem@davemloft.net>


# 2ce4da2e 26-Sep-2009 David S. Miller <davem@davemloft.net>

sparc: Support HW cache events.

First supported chip for HW cache events is Ultra-IIIi.

Signed-off-by: David S. Miller <davem@davemloft.net>


# cdd6c482 20-Sep-2009 Ingo Molnar <mingo@elte.hu>

perf: Do the big rename: Performance Counters -> Performance Events

Bye-bye Performance Counters, welcome Performance Events!

In the past few months the perfcounters subsystem has grown out its
initial role of counting hardware events, and has become (and is
becoming) a much broader generic event enumeration, reporting, logging,
monitoring, analysis facility.

Naming its core object 'perf_counter' and naming the subsystem
'perfcounters' has become more and more of a misnomer. With pending
code like hw-breakpoints support the 'counter' name is less and
less appropriate.

All in one, we've decided to rename the subsystem to 'performance
events' and to propagate this rename through all fields, variables
and API names. (in an ABI compatible fashion)

The word 'event' is also a bit shorter than 'counter' - which makes
it slightly more convenient to write/handle as well.

Thanks goes to Stephane Eranian who first observed this misnomer and
suggested a rename.

User-space tooling and ABI compatibility is not affected - this patch
should be function-invariant. (Also, defconfigs were not touched to
keep the size down.)

This patch has been generated via the following script:

FILES=$(find * -type f | grep -vE 'oprofile|[^K]config')

sed -i \
-e 's/PERF_EVENT_/PERF_RECORD_/g' \
-e 's/PERF_COUNTER/PERF_EVENT/g' \
-e 's/perf_counter/perf_event/g' \
-e 's/nb_counters/nb_events/g' \
-e 's/swcounter/swevent/g' \
-e 's/tpcounter_event/tp_event/g' \
$FILES

for N in $(find . -name perf_counter.[ch]); do
M=$(echo $N | sed 's/perf_counter/perf_event/g')
mv $N $M
done

FILES=$(find . -name perf_event.*)

sed -i \
-e 's/COUNTER_MASK/REG_MASK/g' \
-e 's/COUNTER/EVENT/g' \
-e 's/\<event\>/event_id/g' \
-e 's/counter/event/g' \
-e 's/Counter/Event/g' \
$FILES

... to keep it as correct as possible. This script can also be
used by anyone who has pending perfcounters patches - it converts
a Linux kernel tree over to the new naming. We tried to time this
change to the point in time where the amount of pending patches
is the smallest: the end of the merge window.

Namespace clashes were fixed up in a preparatory patch - and some
stylistic fallout will be fixed up in a subsequent patch.

( NOTE: 'counters' are still the proper terminology when we deal
with hardware registers - and these sed scripts are a bit
over-eager in renaming them. I've undone some of that, but
in case there's something left where 'counter' would be
better than 'event' we can undo that on an individual basis
instead of touching an otherwise nicely automated patch. )

Suggested-by: Stephane Eranian <eranian@google.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Paul Mackerras <paulus@samba.org>
Reviewed-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: <linux-arch@vger.kernel.org>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>