History log of /linux-master/kernel/trace/trace_entries.h
Revision Date Author Comments
# e7186af7 13-Jul-2023 Steven Rostedt (Google) <rostedt@goodmis.org>

tracing: Add back FORTIFY_SOURCE logic to kernel_stack event structure

For backward compatibility, older tooling expects to see the kernel_stack
event with a "caller" field that is a fixed size array of 8 addresses. The
code now supports more than 8 with an added "size" field that states the
real number of entries. But the "caller" field still just looks like a
fixed size to user space.

Since the tracing macros that create the user space format files also
creates the structures that those files represent, the kernel_stack event
structure had its "caller" field a fixed size of 8, but in reality, when
it is allocated on the ring buffer, it can hold more if the stack trace is
bigger that 8 functions. The copying of these entries was simply done with
a memcpy():

size = nr_entries * sizeof(unsigned long);
memcpy(entry->caller, fstack->calls, size);

The FORTIFY_SOURCE logic noticed at runtime that when the nr_entries was
larger than 8, that the memcpy() was writing more than what the structure
stated it can hold and it complained about it. This is because the
FORTIFY_SOURCE code is unaware that the amount allocated is actually
enough to hold the size. It does not expect that a fixed size field will
hold more than the fixed size.

This was originally solved by hiding the caller assignment with some
pointer arithmetic.

ptr = ring_buffer_data();
entry = ptr;

ptr += offsetof(typeof(*entry), caller);
memcpy(ptr, fstack->calls, size);

But it is considered bad form to hide from kernel hardening. Instead, make
it work nicely with FORTIFY_SOURCE by adding a new __stack_array() macro
that is specific for this one special use case. The macro will take 4
arguments: type, item, len, field (whereas the __array() macro takes just
the first three). This macro will act just like the __array() macro when
creating the code to deal with the format file that is exposed to user
space. But for the kernel, it will turn the caller field into:

type item[] __counted_by(field);

or for this instance:

unsigned long caller[] __counted_by(size);

Now the kernel code can expose the assignment of the caller to the
FORTIFY_SOURCE and everyone is happy!

Link: https://lore.kernel.org/linux-trace-kernel/20230712105235.5fc441aa@gandalf.local.home/
Link: https://lore.kernel.org/linux-trace-kernel/20230713092605.2ddb9788@rorschach.local.home

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Suggested-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Reviewed-by: Kees Cook <keescook@chromium.org>


# a1be9ccc 08-Apr-2023 Donglin Peng <pengdonglin@sangfor.com.cn>

function_graph: Support recording and printing the return value of function

Analyzing system call failures with the function_graph tracer can be a
time-consuming process, particularly when locating the kernel function
that first returns an error in the trace logs. This change aims to
simplify the process by recording the function return value to the
'retval' member of 'ftrace_graph_ret' and printing it when outputting
the trace log.

We have introduced new trace options: funcgraph-retval and
funcgraph-retval-hex. The former controls whether to display the return
value, while the latter controls the display format.

Please note that even if a function's return type is void, a return
value will still be printed. You can simply ignore it.

This patch only establishes the fundamental infrastructure. Subsequent
patches will make this feature available on some commonly used processor
architectures.

Here is an example:

I attempted to attach the demo process to a cpu cgroup, but it failed:

echo `pidof demo` > /sys/fs/cgroup/cpu/test/tasks
-bash: echo: write error: Invalid argument

The strace logs indicate that the write system call returned -EINVAL(-22):
...
write(1, "273\n", 4) = -1 EINVAL (Invalid argument)
...

To capture trace logs during a write system call, use the following
commands:

cd /sys/kernel/debug/tracing/
echo 0 > tracing_on
echo > trace
echo *sys_write > set_graph_function
echo *spin* > set_graph_notrace
echo *rcu* >> set_graph_notrace
echo *alloc* >> set_graph_notrace
echo preempt* >> set_graph_notrace
echo kfree* >> set_graph_notrace
echo $$ > set_ftrace_pid
echo function_graph > current_tracer
echo 1 > options/funcgraph-retval
echo 0 > options/funcgraph-retval-hex
echo 1 > tracing_on
echo `pidof demo` > /sys/fs/cgroup/cpu/test/tasks
echo 0 > tracing_on
cat trace > ~/trace.log

To locate the root cause, search for error code -22 directly in the file
trace.log and identify the first function that returned -22. Once you
have identified this function, examine its code to determine the root
cause.

For example, in the trace log below, cpu_cgroup_can_attach
returned -22 first, so we can focus our analysis on this function to
identify the root cause.

...

1) | cgroup_migrate() {
1) 0.651 us | cgroup_migrate_add_task(); /* = 0xffff93fcfd346c00 */
1) | cgroup_migrate_execute() {
1) | cpu_cgroup_can_attach() {
1) | cgroup_taskset_first() {
1) 0.732 us | cgroup_taskset_next(); /* = 0xffff93fc8fb20000 */
1) 1.232 us | } /* cgroup_taskset_first = 0xffff93fc8fb20000 */
1) 0.380 us | sched_rt_can_attach(); /* = 0x0 */
1) 2.335 us | } /* cpu_cgroup_can_attach = -22 */
1) 4.369 us | } /* cgroup_migrate_execute = -22 */
1) 7.143 us | } /* cgroup_migrate = -22 */

...

Link: https://lkml.kernel.org/r/1fc502712c981e0e6742185ba242992170ac9da8.1680954589.git.pengdonglin@sangfor.com.cn

Tested-by: Florian Kauer <florian.kauer@linutronix.de>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Donglin Peng <pengdonglin@sangfor.com.cn>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>


# a955d7ea 22-Jun-2021 Daniel Bristot de Oliveira <bristot@redhat.com>

trace: Add timerlat tracer

The timerlat tracer aims to help the preemptive kernel developers to
found souces of wakeup latencies of real-time threads. Like cyclictest,
the tracer sets a periodic timer that wakes up a thread. The thread then
computes a *wakeup latency* value as the difference between the *current
time* and the *absolute time* that the timer was set to expire. The main
goal of timerlat is tracing in such a way to help kernel developers.

Usage

Write the ASCII text "timerlat" into the current_tracer file of the
tracing system (generally mounted at /sys/kernel/tracing).

For example:

[root@f32 ~]# cd /sys/kernel/tracing/
[root@f32 tracing]# echo timerlat > current_tracer

It is possible to follow the trace by reading the trace trace file:

[root@f32 tracing]# cat trace
# tracer: timerlat
#
# _-----=> irqs-off
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# || /
# |||| ACTIVATION
# TASK-PID CPU# |||| TIMESTAMP ID CONTEXT LATENCY
# | | | |||| | | | |
<idle>-0 [000] d.h1 54.029328: #1 context irq timer_latency 932 ns
<...>-867 [000] .... 54.029339: #1 context thread timer_latency 11700 ns
<idle>-0 [001] dNh1 54.029346: #1 context irq timer_latency 2833 ns
<...>-868 [001] .... 54.029353: #1 context thread timer_latency 9820 ns
<idle>-0 [000] d.h1 54.030328: #2 context irq timer_latency 769 ns
<...>-867 [000] .... 54.030330: #2 context thread timer_latency 3070 ns
<idle>-0 [001] d.h1 54.030344: #2 context irq timer_latency 935 ns
<...>-868 [001] .... 54.030347: #2 context thread timer_latency 4351 ns

The tracer creates a per-cpu kernel thread with real-time priority that
prints two lines at every activation. The first is the *timer latency*
observed at the *hardirq* context before the activation of the thread.
The second is the *timer latency* observed by the thread, which is the
same level that cyclictest reports. The ACTIVATION ID field
serves to relate the *irq* execution to its respective *thread* execution.

The irq/thread splitting is important to clarify at which context
the unexpected high value is coming from. The *irq* context can be
delayed by hardware related actions, such as SMIs, NMIs, IRQs
or by a thread masking interrupts. Once the timer happens, the delay
can also be influenced by blocking caused by threads. For example, by
postponing the scheduler execution via preempt_disable(), by the
scheduler execution, or by masking interrupts. Threads can
also be delayed by the interference from other threads and IRQs.

The timerlat can also take advantage of the osnoise: traceevents.
For example:

[root@f32 ~]# cd /sys/kernel/tracing/
[root@f32 tracing]# echo timerlat > current_tracer
[root@f32 tracing]# echo osnoise > set_event
[root@f32 tracing]# echo 25 > osnoise/stop_tracing_total_us
[root@f32 tracing]# tail -10 trace
cc1-87882 [005] d..h... 548.771078: #402268 context irq timer_latency 1585 ns
cc1-87882 [005] dNLh1.. 548.771082: irq_noise: local_timer:236 start 548.771077442 duration 4597 ns
cc1-87882 [005] dNLh2.. 548.771083: irq_noise: reschedule:253 start 548.771083017 duration 56 ns
cc1-87882 [005] dNLh2.. 548.771086: irq_noise: call_function_single:251 start 548.771083811 duration 2048 ns
cc1-87882 [005] dNLh2.. 548.771088: irq_noise: call_function_single:251 start 548.771086814 duration 1495 ns
cc1-87882 [005] dNLh2.. 548.771091: irq_noise: call_function_single:251 start 548.771089194 duration 1558 ns
cc1-87882 [005] dNLh2.. 548.771094: irq_noise: call_function_single:251 start 548.771091719 duration 1932 ns
cc1-87882 [005] dNLh2.. 548.771096: irq_noise: call_function_single:251 start 548.771094696 duration 1050 ns
cc1-87882 [005] d...3.. 548.771101: thread_noise: cc1:87882 start 548.771078243 duration 10909 ns
timerlat/5-1035 [005] ....... 548.771103: #402268 context thread timer_latency 25960 ns

For further information see: Documentation/trace/timerlat-tracer.rst

Link: https://lkml.kernel.org/r/71f18efc013e1194bcaea1e54db957de2b19ba62.1624372313.git.bristot@redhat.com

Cc: Phil Auld <pauld@redhat.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Kate Carcia <kcarcia@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Alexandre Chartre <alexandre.chartre@oracle.com>
Cc: Clark Willaims <williams@redhat.com>
Cc: John Kacur <jkacur@redhat.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: linux-doc@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# bce29ac9 22-Jun-2021 Daniel Bristot de Oliveira <bristot@redhat.com>

trace: Add osnoise tracer

In the context of high-performance computing (HPC), the Operating System
Noise (*osnoise*) refers to the interference experienced by an application
due to activities inside the operating system. In the context of Linux,
NMIs, IRQs, SoftIRQs, and any other system thread can cause noise to the
system. Moreover, hardware-related jobs can also cause noise, for example,
via SMIs.

The osnoise tracer leverages the hwlat_detector by running a similar
loop with preemption, SoftIRQs and IRQs enabled, thus allowing all
the sources of *osnoise* during its execution. Using the same approach
of hwlat, osnoise takes note of the entry and exit point of any
source of interferences, increasing a per-cpu interference counter. The
osnoise tracer also saves an interference counter for each source of
interference. The interference counter for NMI, IRQs, SoftIRQs, and
threads is increased anytime the tool observes these interferences' entry
events. When a noise happens without any interference from the operating
system level, the hardware noise counter increases, pointing to a
hardware-related noise. In this way, osnoise can account for any
source of interference. At the end of the period, the osnoise tracer
prints the sum of all noise, the max single noise, the percentage of CPU
available for the thread, and the counters for the noise sources.

Usage

Write the ASCII text "osnoise" into the current_tracer file of the
tracing system (generally mounted at /sys/kernel/tracing).

For example::

[root@f32 ~]# cd /sys/kernel/tracing/
[root@f32 tracing]# echo osnoise > current_tracer

It is possible to follow the trace by reading the trace trace file::

[root@f32 tracing]# cat trace
# tracer: osnoise
#
# _-----=> irqs-off
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth MAX
# || / SINGLE Interference counters:
# |||| RUNTIME NOISE % OF CPU NOISE +-----------------------------+
# TASK-PID CPU# |||| TIMESTAMP IN US IN US AVAILABLE IN US HW NMI IRQ SIRQ THREAD
# | | | |||| | | | | | | | | | |
<...>-859 [000] .... 81.637220: 1000000 190 99.98100 9 18 0 1007 18 1
<...>-860 [001] .... 81.638154: 1000000 656 99.93440 74 23 0 1006 16 3
<...>-861 [002] .... 81.638193: 1000000 5675 99.43250 202 6 0 1013 25 21
<...>-862 [003] .... 81.638242: 1000000 125 99.98750 45 1 0 1011 23 0
<...>-863 [004] .... 81.638260: 1000000 1721 99.82790 168 7 0 1002 49 41
<...>-864 [005] .... 81.638286: 1000000 263 99.97370 57 6 0 1006 26 2
<...>-865 [006] .... 81.638302: 1000000 109 99.98910 21 3 0 1006 18 1
<...>-866 [007] .... 81.638326: 1000000 7816 99.21840 107 8 0 1016 39 19

In addition to the regular trace fields (from TASK-PID to TIMESTAMP), the
tracer prints a message at the end of each period for each CPU that is
running an osnoise/CPU thread. The osnoise specific fields report:

- The RUNTIME IN USE reports the amount of time in microseconds that
the osnoise thread kept looping reading the time.
- The NOISE IN US reports the sum of noise in microseconds observed
by the osnoise tracer during the associated runtime.
- The % OF CPU AVAILABLE reports the percentage of CPU available for
the osnoise thread during the runtime window.
- The MAX SINGLE NOISE IN US reports the maximum single noise observed
during the runtime window.
- The Interference counters display how many each of the respective
interference happened during the runtime window.

Note that the example above shows a high number of HW noise samples.
The reason being is that this sample was taken on a virtual machine,
and the host interference is detected as a hardware interference.

Tracer options

The tracer has a set of options inside the osnoise directory, they are:

- osnoise/cpus: CPUs at which a osnoise thread will execute.
- osnoise/period_us: the period of the osnoise thread.
- osnoise/runtime_us: how long an osnoise thread will look for noise.
- osnoise/stop_tracing_us: stop the system tracing if a single noise
higher than the configured value happens. Writing 0 disables this
option.
- osnoise/stop_tracing_total_us: stop the system tracing if total noise
higher than the configured value happens. Writing 0 disables this
option.
- tracing_threshold: the minimum delta between two time() reads to be
considered as noise, in us. When set to 0, the default value will
be used, which is currently 5 us.

Additional Tracing

In addition to the tracer, a set of tracepoints were added to
facilitate the identification of the osnoise source.

- osnoise:sample_threshold: printed anytime a noise is higher than
the configurable tolerance_ns.
- osnoise:nmi_noise: noise from NMI, including the duration.
- osnoise:irq_noise: noise from an IRQ, including the duration.
- osnoise:softirq_noise: noise from a SoftIRQ, including the
duration.
- osnoise:thread_noise: noise from a thread, including the duration.

Note that all the values are *net values*. For example, if while osnoise
is running, another thread preempts the osnoise thread, it will start a
thread_noise duration at the start. Then, an IRQ takes place, preempting
the thread_noise, starting a irq_noise. When the IRQ ends its execution,
it will compute its duration, and this duration will be subtracted from
the thread_noise, in such a way as to avoid the double accounting of the
IRQ execution. This logic is valid for all sources of noise.

Here is one example of the usage of these tracepoints::

osnoise/8-961 [008] d.h. 5789.857532: irq_noise: local_timer:236 start 5789.857529929 duration 1845 ns
osnoise/8-961 [008] dNh. 5789.858408: irq_noise: local_timer:236 start 5789.858404871 duration 2848 ns
migration/8-54 [008] d... 5789.858413: thread_noise: migration/8:54 start 5789.858409300 duration 3068 ns
osnoise/8-961 [008] .... 5789.858413: sample_threshold: start 5789.858404555 duration 8723 ns interferences 2

In this example, a noise sample of 8 microseconds was reported in the last
line, pointing to two interferences. Looking backward in the trace, the
two previous entries were about the migration thread running after a
timer IRQ execution. The first event is not part of the noise because
it took place one millisecond before.

It is worth noticing that the sum of the duration reported in the
tracepoints is smaller than eight us reported in the sample_threshold.
The reason roots in the overhead of the entry and exit code that happens
before and after any interference execution. This justifies the dual
approach: measuring thread and tracing.

Link: https://lkml.kernel.org/r/e649467042d60e7b62714c9c6751a56299d15119.1624372313.git.bristot@redhat.com

Cc: Phil Auld <pauld@redhat.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Kate Carcia <kcarcia@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Alexandre Chartre <alexandre.chartre@oracle.com>
Cc: Clark Willaims <williams@redhat.com>
Cc: John Kacur <jkacur@redhat.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: linux-doc@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com>
[
Made the following functions static:
trace_irqentry_callback()
trace_irqexit_callback()
trace_intel_irqentry_callback()
trace_intel_irqexit_callback()

Added to include/trace.h:
osnoise_arch_register()
osnoise_arch_unregister()

Fixed define logic for LATENCY_FS_NOTIFY

Reported-by: kernel test robot <lkp@intel.com>
]
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# f689e4f2 15-Apr-2021 Yordan Karadzhov (VMware) <y.karadz@gmail.com>

tracing: Define new ftrace event "func_repeats"

The event aims to consolidate the function tracing record in the cases
when a single function is called number of times consecutively.

while (cond)
do_func();

This may happen in various scenarios (busy waiting for example).
The new ftrace event can be used to show repeated function events with
a single event and save space on the ring buffer

Link: https://lkml.kernel.org/r/20210415181854.147448-3-y.karadz@gmail.com

Signed-off-by: Yordan Karadzhov (VMware) <y.karadz@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 2b5894cc 29-Oct-2020 Qiujun Huang <hqjagain@gmail.com>

tracing: Fix some typos in comments

s/detetector/detector/
s/enfoced/enforced/
s/writen/written/
s/actualy/actually/
s/bascially/basically/
s/Regarldess/Regardless/
s/zeroes/zeros/
s/followd/followed/
s/incrememented/incremented/
s/separatelly/separately/
s/accesible/accessible/
s/sythetic/synthetic/
s/enabed/enabled/
s/heurisitc/heuristic/
s/assocated/associated/
s/otherwides/otherwise/
s/specfied/specified/
s/seaching/searching/
s/hierachry/hierarchy/
s/internel/internal/
s/Thise/This/

Link: https://lkml.kernel.org/r/20201029150554.3354-1-hqjagain@gmail.com

Signed-off-by: Qiujun Huang <hqjagain@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 60602cb5 28-Oct-2020 Steven Rostedt (VMware) <rostedt@goodmis.org>

fgraph: Make overruns 4 bytes in graph stack structure

Inspecting the data structures of the function graph tracer, I found that
the overrun value is unsigned long, which is 8 bytes on a 64 bit machine,
and not only that, the depth is an int (4 bytes). The overrun can be simply
an unsigned int (4 bytes) and pack the ftrace_graph_ret structure better.

The depth is moved up next to the func, as it is used more often with func,
and improves cache locality.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 4649079b 09-Jun-2020 Steven Rostedt (VMware) <rostedt@goodmis.org>

tracing: Make ftrace packed events have align of 1

When using trace-cmd on 5.6-rt for the function graph tracer, the output was
corrupted. It gave output like this:

funcgraph_entry: func=0xffffffff depth=38982
funcgraph_entry: func=0x1ffffffff depth=16044
funcgraph_exit: func=0xffffffff overrun=0x92539aaf00000000 calltime=0x92539c9900000072 rettime=0x100000072 depth=11084
funcgraph_exit: func=0xffffffff overrun=0x9253946e00000000 calltime=0x92539e2100000072 rettime=0x72 depth=26033702
funcgraph_entry: func=0xffffffff depth=85798
funcgraph_entry: func=0x1ffffffff depth=12044

The reason was because the tracefs/events/ftrace/funcgraph_entry/exit format
file was incorrect. The -rt kernel adds more common fields to the trace
events. Namely, common_migrate_disable and common_preempt_lazy_count. Each
is one byte in size. This changes the alignment of the normal payload. Most
events are aligned normally, but the function and function graph events are
defined with a "PACKED" macro, that packs their payload. As the offsets
displayed in the format files are now calculated by an aligned field, the
aligned field for function and function graph events should be 1, not their
normal alignment.

With aligning of the funcgraph_entry event, the format file has:

field:unsigned short common_type; offset:0; size:2; signed:0;
field:unsigned char common_flags; offset:2; size:1; signed:0;
field:unsigned char common_preempt_count; offset:3; size:1; signed:0;
field:int common_pid; offset:4; size:4; signed:1;
field:unsigned char common_migrate_disable; offset:8; size:1; signed:0;
field:unsigned char common_preempt_lazy_count; offset:9; size:1; signed:0;

field:unsigned long func; offset:16; size:8; signed:0;
field:int depth; offset:24; size:4; signed:1;

But the actual alignment is:

field:unsigned short common_type; offset:0; size:2; signed:0;
field:unsigned char common_flags; offset:2; size:1; signed:0;
field:unsigned char common_preempt_count; offset:3; size:1; signed:0;
field:int common_pid; offset:4; size:4; signed:1;
field:unsigned char common_migrate_disable; offset:8; size:1; signed:0;
field:unsigned char common_preempt_lazy_count; offset:9; size:1; signed:0;

field:unsigned long func; offset:12; size:8; signed:0;
field:int depth; offset:20; size:4; signed:1;

Link: https://lkml.kernel.org/r/20200609220041.2a3b527f@oasis.local.home

Cc: stable@vger.kernel.org
Fixes: 04ae87a52074e ("ftrace: Rework event_create_dir()")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# b396bfde 11-Feb-2020 Steven Rostedt (VMware) <rostedt@goodmis.org>

tracing: Have hwlat ts be first instance and record count of instances

The hwlat tracer runs a loop of width time during a given window. It then
reports the max latency over a given threshold and records a timestamp. But
this timestamp is the time after the width has finished, and not the time it
actually triggered.

Record the actual time when the latency was greater than the threshold as
well as the number of times it was greater in a given width per window.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# cbc3b92c 24-Sep-2014 Josef Bacik <jbacik@fb.com>

tracing: Set kernel_stack's caller size properly

I noticed when trying to use the trace-cmd python interface that reading the raw
buffer wasn't working for kernel_stack events. This is because it uses a
stubbed version of __dynamic_array that doesn't do the __data_loc trick and
encode the length of the array into the field. Instead it just shows up as a
size of 0. So change this to __array and set the len to FTRACE_STACK_ENTRIES
since this is what we actually do in practice and matches how user_stack_trace
works.

Link: http://lkml.kernel.org/r/1411589652-1318-1-git-send-email-jbacik@fb.com

Signed-off-by: Josef Bacik <jbacik@fb.com>
[ Pulled from the archeological digging of my INBOX ]
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 04ae87a5 24-Oct-2019 Peter Zijlstra <peterz@infradead.org>

ftrace: Rework event_create_dir()

Rework event_create_dir() to use an array of static data instead of
function pointers where possible.

The problem is that it would call the function pointer on module load
before parse_args(), possibly even before jump_labels were initialized.
Luckily the generated functions don't use jump_labels but it still seems
fragile. It also gets in the way of changing when we make the module map
executable.

The generated function are basically calling trace_define_field() with a
bunch of static arguments. So instead of a function, capture these
arguments in a static array, avoiding the function call.

Now there are a number of cases where the fields are dynamic (syscall
arguments, kprobes and uprobes), in which case a static array does not
work, for these we preserve the function call. Luckily all these cases
are not related to modules and so we can retain the function call for
them.

Also fix up all broken tracepoint definitions that now generate a
compile error.

Tested-by: Alexei Starovoitov <ast@kernel.org>
Tested-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20191111132458.342979914@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 85acbb21 09-Feb-2019 Changbin Du <changbin.du@intel.com>

tracing: Change the function format to display function names by perf

Here is an example for this change.

$ sudo perf record -e 'ftrace:function' --filter='ip==schedule'
$ sudo perf report

The output of perf before this patch:

\# Samples: 100 of event 'ftrace:function'
\# Event count (approx.): 100
\#
\# Overhead Trace output
\# ........ ......................................
\#
51.00% ffffffff81f6aaa0 <-- ffffffff81158e8d
29.00% ffffffff81f6aaa0 <-- ffffffff8116ccb2
8.00% ffffffff81f6aaa0 <-- ffffffff81f6f2ed
4.00% ffffffff81f6aaa0 <-- ffffffff811628db
4.00% ffffffff81f6aaa0 <-- ffffffff81f6ec5b
2.00% ffffffff81f6aaa0 <-- ffffffff81f6f21a
1.00% ffffffff81f6aaa0 <-- ffffffff811b04af
1.00% ffffffff81f6aaa0 <-- ffffffff8143ce17

After this patch:

\# Samples: 36 of event 'ftrace:function'
\# Event count (approx.): 36
\#
\# Overhead Trace output
\# ........ ............................................
\#
38.89% schedule <-- schedule_hrtimeout_range_clock
27.78% schedule <-- worker_thread
13.89% schedule <-- schedule_timeout
11.11% schedule <-- smpboot_thread_fn
5.56% schedule <-- rcu_gp_kthread
2.78% schedule <-- exit_to_usermode_loop

Link: http://lkml.kernel.org/r/20190209161919.32350-1-changbin.du@gmail.com

Signed-off-by: Changbin Du <changbin.du@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# bb730b58 16-Aug-2018 Steven Rostedt (VMware) <rostedt@goodmis.org>

tracing: Fix SPDX format headers to use C++ style comments

The Linux kernel adopted the SPDX License format headers to ease license
compliance management, and uses the C++ '//' style comments for the SPDX
header tags. Some files in the tracing directory used the C style /* */
comments for them. To be consistent across all files, replace the /* */
C style SPDX tags with the C++ // SPDX tags.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 3dd80953 09-May-2018 Steven Rostedt (VMware) <rostedt@goodmis.org>

tracing: Add trigger file for trace_markers tracefs/ftrace/print

Allow writing to the trace_markers file initiate triggers defined in
tracefs/ftrace/print/trigger file. This will allow of user space to trigger
the same type of triggers (including histograms) that the trace events use.

Had to create a ftrace_event_register() function that will become the
trace_marker print event's reg() function. This is required because of how
triggers are enabled:

event_trigger_write() {
event_trigger_regex_write() {
trigger_process_regex() {
for p in trigger_commands {
p->func(); /* trigger_snapshot_cmd->func */
event_trigger_callback() {
cmd_ops->reg() /* register_trigger() */ {
trace_event_trigger_enable_disable() {
trace_event_enable_disable() {
call->class->reg();

Without the reg() function, the trigger code will call a NULL pointer and
crash the system.

Cc: Tom Zanussi <tom.zanussi@linux.intel.com>
Cc: Clark Williams <williams@redhat.com>
Cc: Karim Yaghmour <karim.yaghmour@opersys.com>
Cc: Brendan Gregg <bgregg@netflix.com>
Suggested-by: Joel Fernandes <joelaf@google.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 9a0fd675 15-Mar-2018 Peter Xu <peterx@redhat.com>

tracing: Fix missing tab for hwlat_detector print format

It's been missing for a while but no one is touching that up. Fix it.

Link: http://lkml.kernel.org/r/20180315060639.9578-1-peterx@redhat.com

CC: Ingo Molnar <mingo@kernel.org>
Cc:stable@vger.kernel.org
Fixes: 7b2c86250122d ("tracing: Add NMI tracing in hwlat detector")
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# b2441318 01-Nov-2017 Greg Kroah-Hartman <gregkh@linuxfoundation.org>

License cleanup: add SPDX GPL-2.0 license identifier to files with no license

Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.

By default all files without license information are under the default
license of the kernel, which is GPL version 2.

Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.

This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.

How this work was done:

Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,

Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.

The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.

The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.

Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if <5
lines).

All documentation files were explicitly excluded.

The following heuristics were used to determine which SPDX license
identifiers to apply.

- when both scanners couldn't find any license traces, file was
considered to have no license information in it, and the top level
COPYING file license applied.

For non */uapi/* files that summary was:

SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 11139

and resulted in the first patch in this series.

If that file was a */uapi/* path one, it was "GPL-2.0 WITH
Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was:

SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 WITH Linux-syscall-note 930

and resulted in the second patch in this series.

- if a file had some form of licensing information in it, and was one
of the */uapi/* ones, it was denoted with the Linux-syscall-note if
any GPL family license was found in the file or had no licensing in
it (per prior point). Results summary:

SPDX license identifier # files
---------------------------------------------------|------
GPL-2.0 WITH Linux-syscall-note 270
GPL-2.0+ WITH Linux-syscall-note 169
((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
LGPL-2.1+ WITH Linux-syscall-note 15
GPL-1.0+ WITH Linux-syscall-note 14
((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
LGPL-2.0+ WITH Linux-syscall-note 4
LGPL-2.1 WITH Linux-syscall-note 3
((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1

and that resulted in the third patch in this series.

- when the two scanners agreed on the detected license(s), that became
the concluded license(s).

- when there was disagreement between the two scanners (one detected a
license but the other didn't, or they both detected different
licenses) a manual inspection of the file occurred.

- In most cases a manual inspection of the information in the file
resulted in a clear resolution of the license that should apply (and
which scanner probably needed to revisit its heuristics).

- When it was not immediately clear, the license identifier was
confirmed with lawyers working with the Linux Foundation.

- If there was any question as to the appropriate license identifier,
the file was flagged for further research and to be revisited later
in time.

In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.

Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights. The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.

Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.

In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.

Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
- a full scancode scan run, collecting the matched texts, detected
license ids and scores
- reviewing anything where there was a license detected (about 500+
files) to ensure that the applied SPDX license was correct
- reviewing anything where there was no detection but the patch license
was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
SPDX license was correct

This produced a worksheet with 20 files needing minor correction. This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.

These .csv files were then reviewed by Greg. Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected. This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.) Finally Greg ran the script using the .csv files to
generate the patches.

Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# 51aad0ae 08-May-2017 Deepa Dinamani <deepa.kernel@gmail.com>

trace: make trace_hwlat timestamp y2038 safe

struct timespec is not y2038 safe on 32 bit machines and needs to be
replaced by struct timespec64 in order to represent times beyond year
2038 on such machines.

Fix all the timestamp representation in struct trace_hwlat and all the
corresponding implementations.

Link: http://lkml.kernel.org/r/1491613030-11599-3-git-send-email-deepa.kernel@gmail.com
Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 068f530b 19-Jan-2017 Steven Rostedt (VMware) <rostedt@goodmis.org>

tracing: Add the constant count for branch tracer

The unlikely/likely branch profiler now gets called even if the if statement
is a constant (always goes in one direction without a compare). Add a value
to denote this in the likely/unlikely tracer as well.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# fa32e855 06-Jul-2016 Steven Rostedt <rostedt@goodmis.org>

tracing: Add new trace_marker_raw

A new file is created:

/sys/kernel/debug/tracing/trace_marker_raw

This allows for appications to create data structures and write the binary
data directly into it, and then read the trace data out from trace_pipe_raw
into the same type of data structure. This saves on converting numbers into
ASCII that would be required by trace_marker.

Suggested-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 7b2c8625 03-Aug-2016 Steven Rostedt (Red Hat) <rostedt@goodmis.org>

tracing: Add NMI tracing in hwlat detector

As NMIs can also cause latency when interrupts are disabled, the hwlat
detectory has no way to know if the latency it detects is from an NMI or an
SMI or some other hardware glitch.

As ftrace_nmi_enter/exit() funtions are no longer used (except for sh, which
isn't supported anymore), I converted those to "arch_ftrace_nmi_enter/exit"
and use ftrace_nmi_enter/exit() to check if hwlat detector is tracing or
not, and if so, it calls into the hwlat utility.

Since the hwlat detector only has a single kthread that is spinning with
interrupts disabled, it marks what CPU it is on, and if the NMI callback
happens on that CPU, it records the time spent in that NMI. This is added to
the output that is generated by the hwlat detector as:

#3 inner/outer(us): 9/9 ts:1470836488.206734548
#4 inner/outer(us): 0/8 ts:1470836497.140808588
#5 inner/outer(us): 0/6 ts:1470836499.140825168 nmi-total:5 nmi-count:1
#6 inner/outer(us): 9/9 ts:1470836501.140841748

All time is still tracked in microseconds.

The NMI information is only shown when an NMI occurred during the sample.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# e7c15cd8 22-Jun-2016 Steven Rostedt (Red Hat) <rostedt@goodmis.org>

tracing: Added hardware latency tracer

The hardware latency tracer has been in the PREEMPT_RT patch for some time.
It is used to detect possible SMIs or any other hardware interruptions that
the kernel is unaware of. Note, NMIs may also be detected, but that may be
good to note as well.

The logic is pretty simple. It simply creates a thread that spins on a
single CPU for a specified amount of time (width) within a periodic window
(window). These numbers may be adjusted by their cooresponding names in

/sys/kernel/tracing/hwlat_detector/

The defaults are window = 1000000 us (1 second)
width = 500000 us (1/2 second)

The loop consists of:

t1 = trace_clock_local();
t2 = trace_clock_local();

Where trace_clock_local() is a variant of sched_clock().

The difference of t2 - t1 is recorded as the "inner" timestamp and also the
timestamp t1 - prev_t2 is recorded as the "outer" timestamp. If either of
these differences are greater than the time denoted in
/sys/kernel/tracing/tracing_thresh then it records the event.

When this tracer is started, and tracing_thresh is zero, it changes to the
default threshold of 10 us.

The hwlat tracer in the PREEMPT_RT patch was originally written by
Jon Masters. I have modified it quite a bit and turned it into a
tracer.

Based-on-code-by: Jon Masters <jcm@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# a4a551b8 29-Jun-2016 Namhyung Kim <namhyung@kernel.org>

ftrace: Reduce size of function graph entries

Currently ftrace_graph_ent{,_entry} and ftrace_graph_ret{,_entry} struct
can have padding bytes at the end due to alignment in 64-bit data type.
As these data are recorded so frequently, those paddings waste
non-negligible space. As the ring buffer maintains alignment properly
for each architecture, just to remove the extra padding using 'packed'
attribute.

ftrace_graph_ent_entry: 24 -> 20
ftrace_graph_ret_entry: 48 -> 44

Also I moved the 'overrun' field in struct ftrace_graph_ret to minimize
the padding in the middle.

Tested on x86_64 only.

Link: http://lkml.kernel.org/r/1467197808-13578-1-git-send-email-namhyung@kernel.org

Cc: Ingo Molnar <mingo@kernel.org>
Cc: linux-arch@vger.kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# bbedb179 11-Mar-2015 Scott Wood <scottwood@freescale.com>

tracing: %pF is only for function pointers

Use %pS for actual addresses, otherwise you'll get bad output
on arches like ppc64 where %pF expects a function descriptor.

Link: http://lkml.kernel.org/r/1426130037-17956-22-git-send-email-scottwood@freescale.com

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 09ae7234 08-Mar-2013 Steven Rostedt (Red Hat) <rostedt@goodmis.org>

tracing: Add trace_puts() for even faster trace_printk() tracing

The trace_printk() is extremely fast and is very handy as it can be
used in any context (including NMIs!). But it still requires scanning
the fmt string for parsing the args. Even the trace_bprintk() requires
a scan to know what args will be saved, although it doesn't copy the
format string itself.

Several times trace_printk() has no args, and wastes cpu cycles scanning
the fmt string.

Adding trace_puts() allows the developer to use an even faster
tracing method that only saves the pointer to the string in the
ring buffer without doing any format parsing at all. This will
help remove even more of the "Heisenbug" effect, when debugging.

Also fixed up the F_printk()s for the ftrace internal bprint and print events.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 01de982a 22-Mar-2012 Wolfgang Mauerer <wolfgang.mauerer@siemens.com>

tracing: Fix ftrace stack trace entries

8 hex characters tell only half the tale for 64 bit CPUs,
so use the appropriate length.

Link: http://lkml.kernel.org/r/1332411501-8059-2-git-send-email-wolfgang.mauerer@siemens.com

Cc: stable@vger.kernel.org
Signed-off-by: Wolfgang Mauerer <wolfgang.mauerer@siemens.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 02aa3162 15-Feb-2012 Jiri Olsa <jolsa@redhat.com>

ftrace: Allow to specify filter field type for ftrace events

Adding FILTER_TRACE_FN event field type for function tracepoint
event, so it can be properly recognized within filtering code.

Currently all fields of ftrace subsystem events share the common
field type FILTER_OTHER. Since the function trace fields need
special care within the filtering code we need to recognize it
properly, hence adding the FILTER_TRACE_FN event type.

Adding filter parameter to the FTRACE_ENTRY macro, to specify the
filter field type for the event.

Link: http://lkml.kernel.org/r/1329317514-8131-7-git-send-email-jolsa@redhat.com

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# ced39002 15-Feb-2012 Jiri Olsa <jolsa@redhat.com>

ftrace, perf: Add support to use function tracepoint in perf

Adding perf registration support for the ftrace function event,
so it is now possible to register it via perf interface.

The perf_event struct statically contains ftrace_ops as a handle
for function tracer. The function tracer is registered/unregistered
in open/close actions.

To be efficient, we enable/disable ftrace_ops each time the traced
process is scheduled in/out (via TRACE_REG_PERF_(ADD|DELL) handlers).
This way tracing is enabled only when the process is running.
Intentionally using this way instead of the event's hw state
PERF_HES_STOPPED, which would not disable the ftrace_ops.

It is now possible to use function trace within perf commands
like:

perf record -e ftrace:function ls
perf stat -e ftrace:function ls

Allowed only for root.

Link: http://lkml.kernel.org/r/1329317514-8131-6-git-send-email-jolsa@redhat.com

Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 4a9bd3f1 14-Jul-2011 Steven Rostedt <srostedt@redhat.com>

tracing: Have dynamic size event stack traces

Currently the stack trace per event in ftace is only 8 frames.
This can be quite limiting and sometimes useless. Especially when
the "ignore frames" is wrong and we also use up stack frames for
the event processing itself.

Change this to be dynamic by adding a percpu buffer that we can
write a large stack frame into and then copy into the ring buffer.

For interrupts and NMIs that come in while another event is being
process, will only get to use the 8 frame stack. That should be enough
as the task that it interrupted will have the full stack frame anyway.

Requested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 25985edc 30-Mar-2011 Lucas De Marchi <lucas.demarchi@profusion.mobi>

Fix common misspellings

Fixes generated by 'codespell' and manually reviewed.

Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>


# 140e4f2d 03-Dec-2010 David Sharp <dhsharp@google.com>

tracing: Fix event alignment: ftrace:context_switch and ftrace:wakeup

Signed-off-by: David Sharp <dhsharp@google.com>
LKML-Reference: <1291421609-14665-6-git-send-email-dhsharp@google.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# b595076a 01-Nov-2010 Uwe Kleine-König <u.kleine-koenig@pengutronix.de>

tree-wide: fix comment/printk typos

"gadget", "through", "command", "maintain", "maintain", "controller", "address",
"between", "initiali[zs]e", "instead", "function", "select", "already",
"equal", "access", "management", "hierarchy", "registration", "interest",
"relative", "memory", "offset", "already",

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>


# eb7beb5c 15-Jul-2010 Frederic Weisbecker <fweisbec@gmail.com>

tracing: Remove special traces

Special traces type was only used by sysprof. Lets remove it now
that sysprof ftrace plugin has been dropped.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Soeren Sandmann <sandmann@daimi.au.dk>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Li Zefan <lizf@cn.fujitsu.com>


# 5d550467 15-Jul-2010 Frederic Weisbecker <fweisbec@gmail.com>

tracing: Remove ksym tracer

The ksym (breakpoint) ftrace plugin has been superseded by perf
tools that are much more poweful to use the cpu breakpoints.
This tracer doesn't bring more feature. It has been deprecated
for a while now, lets remove it.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Prasad <prasad@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>


# 039ca4e7 26-May-2010 Li Zefan <lizf@cn.fujitsu.com>

tracing: Remove kmemtrace ftrace plugin

We have been resisting new ftrace plugins and removing existing
ones, and kmemtrace has been superseded by kmem trace events
and perf-kmem, so we remove it.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Acked-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Steven Rostedt <rostedt@goodmis.org>
[ remove kmemtrace from the makefile, handle slob too ]
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>


# 30dbb20e 26-May-2010 Américo Wang <xiyou.wangcong@gmail.com>

tracing: Remove boot tracer

The boot tracer is useless. It simply logs the initcalls
but in fact these initcalls are also logged through printk
while using the initcall_debug kernel parameter.

Nobody seem to be using it so far. Then just remove it.

Signed-off-by: WANG Cong <xiyou.wangcong@gmail.com>
Cc: Chase Douglas <chase.douglas@canonical.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Li Zefan <lizf@cn.fujitsu.com>
LKML-Reference: <20100526105753.GA5677@cr0.nay.redhat.com>
[ remove the hooks in main.c, and the headers ]
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>


# faa4602e 25-Mar-2010 Peter Zijlstra <a.p.zijlstra@chello.nl>

x86, perf, bts, mm: Delete the never used BTS-ptrace code

Support for the PMU's BTS features has been upstreamed in
v2.6.32, but we still have the old and disabled ptrace-BTS,
as Linus noticed it not so long ago.

It's buggy: TIF_DEBUGCTLMSR is trampling all over that MSR without
regard for other uses (perf) and doesn't provide the flexibility
needed for perf either.

Its users are ptrace-block-step and ptrace-bts, since ptrace-bts
was never used and ptrace-block-step can be implemented using a
much simpler approach.

So axe all 3000 lines of it. That includes the *locked_memory*()
APIs in mm/mlock.c as well.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Roland McGrath <roland@redhat.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Markus Metzger <markus.t.metzger@intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <20100325135413.938004390@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 24f1e32c 09-Sep-2009 Frederic Weisbecker <fweisbec@gmail.com>

hw-breakpoints: Rewrite the hw-breakpoints layer on top of perf events

This patch rebase the implementation of the breakpoints API on top of
perf events instances.

Each breakpoints are now perf events that handle the
register scheduling, thread/cpu attachment, etc..

The new layering is now made as follows:

ptrace kgdb ftrace perf syscall
\ | / /
\ | / /
/
Core breakpoint API /
/
| /
| /

Breakpoints perf events

|
|

Breakpoints PMU ---- Debug Register constraints handling
(Part of core breakpoint API)
|
|

Hardware debug registers

Reasons of this rewrite:

- Use the centralized/optimized pmu registers scheduling,
implying an easier arch integration
- More powerful register handling: perf attributes (pinned/flexible
events, exclusive/non-exclusive, tunable period, etc...)

Impact:

- New perf ABI: the hardware breakpoints counters
- Ptrace breakpoints setting remains tricky and still needs some per
thread breakpoints references.

Todo (in the order):

- Support breakpoints perf counter events for perf tools (ie: implement
perf_bpcounter_event())
- Support from perf tools

Changes in v2:

- Follow the perf "event " rename
- The ptrace regression have been fixed (ptrace breakpoint perf events
weren't released when a task ended)
- Drop the struct hw_breakpoint and store generic fields in
perf_event_attr.
- Separate core and arch specific headers, drop
asm-generic/hw_breakpoint.h and create linux/hw_breakpoint.h
- Use new generic len/type for breakpoint
- Handle off case: when breakpoints api is not supported by an arch

Changes in v3:

- Fix broken CONFIG_KVM, we need to propagate the breakpoint api
changes to kvm when we exit the guest and restore the bp registers
to the host.

Changes in v4:

- Drop the hw_breakpoint_restore() stub as it is only used by KVM
- EXPORT_SYMBOL_GPL hw_breakpoint_restore() as KVM can be built as a
module
- Restore the breakpoints unconditionally on kvm guest exit:
TIF_DEBUG_THREAD doesn't anymore cover every cases of running
breakpoints and vcpu->arch.switch_db_regs might not always be
set when the guest used debug registers.
(Waiting for a reliable optimization)

Changes in v5:

- Split-up the asm-generic/hw-breakpoint.h moving to
linux/hw_breakpoint.h into a separate patch
- Optimize the breakpoints restoring while switching from kvm guest
to host. We only want to restore the state if we have active
breakpoints to the host, otherwise we don't care about messed-up
address registers.
- Add asm/hw_breakpoint.h to Kbuild
- Fix bad breakpoint type in trace_selftest.c

Changes in v6:

- Fix wrong header inclusion in trace.h (triggered a build
error with CONFIG_FTRACE_SELFTEST

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Prasad <prasad@linux.vnet.ibm.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Jan Kiszka <jan.kiszka@web.de>
Cc: Jiri Slaby <jirislaby@gmail.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Avi Kivity <avi@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Paul Mundt <lethal@linux-sh.org>


# 61613521 17-Sep-2009 Arjan van de Ven <arjan@linux.intel.com>

tracing, perf: Convert the power tracer into an event tracer

This patch converts the existing power tracer into an event tracer,
so that power events (C states and frequency changes) can be
tracked via "perf".

This also removes the perl script that was used to demo the tracer;
its functionality is being replaced entirely with timechart.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <20090912130542.6d314860@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 20a58a77 14-Sep-2009 Li Zefan <lizf@cn.fujitsu.com>

tracing: remove some unused macros

- remove FTRACE_ENTRY_STRUCT_ONLY()
- remove TRACE_XXX() macros

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
LKML-Reference: <4AADF6E6.3080606@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# c16de8fd 14-Sep-2009 Li Zefan <lizf@cn.fujitsu.com>

tracing: fix F_printk() typos

I found some typos in F_printk(), so I wrote compile-time check
for it, and triggered some compile errors and warnings.

I've fixed them on x86_32, but I have no x86_64 in my hand, so there
may still be some compile warnings on 64bits.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
LKML-Reference: <4AADF60B.5070407@cn.fujitsu.com>

[ tested on x86_64, and works fine ]

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# d7315094 12-Sep-2009 Steven Rostedt <srostedt@redhat.com>

tracing: show details of structures within the ftrace structures

Some of the internal ftrace structures use structures within. The
output of a field saying it is just a structure is useless for a format
file. A binary reader of the ring buffer needs to know more about
how the fields are broken up.

This patch adds to the ftrace structure macros new fields to
describe the structures inside a structure.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 0a1c49db 12-Sep-2009 Steven Rostedt <srostedt@redhat.com>

tracing: use macros to create internal ftrace entry ring buffer structures

The entries used by ftrace internal code (plugins) currently have their
formats manually exported to userspace. That is, the format files in
debugfs/tracing/events/ftrace/*/format are currently created by hand.
This is a maintenance nightmare, and can easily become out of sync
with what is actually shown.

This patch uses the methodology of the TRACE_EVENT macros to build
the structures so that their formats can be automated and this
will keep the structures in sync with what users can see.

This patch only changes the way the structures are created. Further
patches will build off of this to automate the format files.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>