History log of /linux-master/kernel/trace/trace_irqsoff.c
Revision Date Author Comments
# eecb91b9 17-Aug-2023 Zheng Yejian <zhengyejian1@huawei.com>

tracing: Fix memleak due to race between current_tracer and trace

Kmemleak report a leak in graph_trace_open():

unreferenced object 0xffff0040b95f4a00 (size 128):
comm "cat", pid 204981, jiffies 4301155872 (age 99771.964s)
hex dump (first 32 bytes):
e0 05 e7 b4 ab 7d 00 00 0b 00 01 00 00 00 00 00 .....}..........
f4 00 01 10 00 a0 ff ff 00 00 00 00 65 00 10 00 ............e...
backtrace:
[<000000005db27c8b>] kmem_cache_alloc_trace+0x348/0x5f0
[<000000007df90faa>] graph_trace_open+0xb0/0x344
[<00000000737524cd>] __tracing_open+0x450/0xb10
[<0000000098043327>] tracing_open+0x1a0/0x2a0
[<00000000291c3876>] do_dentry_open+0x3c0/0xdc0
[<000000004015bcd6>] vfs_open+0x98/0xd0
[<000000002b5f60c9>] do_open+0x520/0x8d0
[<00000000376c7820>] path_openat+0x1c0/0x3e0
[<00000000336a54b5>] do_filp_open+0x14c/0x324
[<000000002802df13>] do_sys_openat2+0x2c4/0x530
[<0000000094eea458>] __arm64_sys_openat+0x130/0x1c4
[<00000000a71d7881>] el0_svc_common.constprop.0+0xfc/0x394
[<00000000313647bf>] do_el0_svc+0xac/0xec
[<000000002ef1c651>] el0_svc+0x20/0x30
[<000000002fd4692a>] el0_sync_handler+0xb0/0xb4
[<000000000c309c35>] el0_sync+0x160/0x180

The root cause is descripted as follows:

__tracing_open() { // 1. File 'trace' is being opened;
...
*iter->trace = *tr->current_trace; // 2. Tracer 'function_graph' is
// currently set;
...
iter->trace->open(iter); // 3. Call graph_trace_open() here,
// and memory are allocated in it;
...
}

s_start() { // 4. The opened file is being read;
...
*iter->trace = *tr->current_trace; // 5. If tracer is switched to
// 'nop' or others, then memory
// in step 3 are leaked!!!
...
}

To fix it, in s_start(), close tracer before switching then reopen the
new tracer after switching. And some tracers like 'wakeup' may not update
'iter->private' in some cases when reopen, then it should be cleared
to avoid being mistakenly closed again.

Link: https://lore.kernel.org/linux-trace-kernel/20230817125539.1646321-1-zhengyejian1@huawei.com

Fixes: d7350c3f4569 ("tracing/core: make the read callbacks reentrants")
Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>


# 36590c50 25-Jan-2021 Sebastian Andrzej Siewior <bigeasy@linutronix.de>

tracing: Merge irqflags + preempt counter.

The state of the interrupts (irqflags) and the preemption counter are
both passed down to tracing_generic_entry_update(). Only one bit of
irqflags is actually required: The on/off state. The complete 32bit
of the preemption counter isn't needed. Just whether of the upper bits
(softirq, hardirq and NMI) are set and the preemption depth is needed.

The irqflags and the preemption counter could be evaluated early and the
information stored in an integer `trace_ctx'.
tracing_generic_entry_update() would use the upper bits as the
TRACE_FLAG_* and the lower 8bit as the disabled-preemption depth
(considering that one must be substracted from the counter in one
special cases).

The actual preemption value is not used except for the tracing record.
The `irqflags' variable is mostly used only for the tracing record. An
exception here is for instance wakeup_tracer_call() or
probe_wakeup_sched_switch() which explicilty disable interrupts and use
that `irqflags' to save (and restore) the IRQ state and to record the
state.

Struct trace_event_buffer has also the `pc' and flags' members which can
be replaced with `trace_ctx' since their actual value is not used
outside of trace recording.

This will reduce tracing_generic_entry_update() to simply assign values
to struct trace_entry. The evaluation of the TRACE_FLAG_* bits is moved
to _tracing_gen_ctx_flags() which replaces preempt_count() and
local_save_flags() invocations.

As an example, ftrace_syscall_enter() may invoke:
- trace_buffer_lock_reserve() -> … -> tracing_generic_entry_update()
- event_trigger_unlock_commit()
-> ftrace_trace_stack() -> … -> tracing_generic_entry_update()
-> ftrace_trace_userstack() -> … -> tracing_generic_entry_update()

In this case the TRACE_FLAG_* bits were evaluated three times. By using
the `trace_ctx' they are evaluated once and assigned three times.

A build with all tracers enabled on x86-64 with and without the patch:

text data bss dec hex filename
21970669 17084168 7639260 46694097 2c87ed1 vmlinux.old
21970293 17084168 7639260 46693721 2c87d59 vmlinux.new

text shrank by 379 bytes, data remained constant.

Link: https://lkml.kernel.org/r/20210125194511.3924915-2-bigeasy@linutronix.de

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# da7f84cd 19-Jan-2021 Viktor Rosendahl <Viktor.Rosendahl@bmw.de>

tracing: Use pause-on-trace with the latency tracers

Eaerlier, tracing was disabled when reading the trace file. This behavior
was changed with:

commit 06e0a548bad0 ("tracing: Do not disable tracing when reading the
trace file").

This doesn't seem to work with the latency tracers.

The above mentioned commit dit not only change the behavior but also added
an option to emulate the old behavior. The idea with this patch is to
enable this pause-on-trace option when the latency tracers are used.

Link: https://lkml.kernel.org/r/20210119164344.37500-2-Viktor.Rosendahl@bmw.de

Cc: stable@vger.kernel.org
Fixes: 06e0a548bad0 ("tracing: Do not disable tracing when reading the trace file")
Signed-off-by: Viktor Rosendahl <Viktor.Rosendahl@bmw.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# d19ad077 28-Oct-2020 Steven Rostedt (VMware) <rostedt@goodmis.org>

ftrace: Have the callbacks receive a struct ftrace_regs instead of pt_regs

In preparation to have arguments of a function passed to callbacks attached
to functions as default, change the default callback prototype to receive a
struct ftrace_regs as the forth parameter instead of a pt_regs.

For callbacks that set the FL_SAVE_REGS flag in their ftrace_ops flags, they
will now need to get the pt_regs via a ftrace_get_regs() helper call. If
this is called by a callback that their ftrace_ops did not have a
FL_SAVE_REGS flag set, it that helper function will return NULL.

This will allow the ftrace_regs to hold enough just to get the parameters
and stack pointer, but without the worry that callbacks may have a pt_regs
that is not completely filled.

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 1c5eb448 09-Jan-2020 Steven Rostedt (VMware) <rostedt@goodmis.org>

tracing: Rename trace_buffer to array_buffer

As we are working to remove the generic "ring_buffer" name that is used by
both tracing and perf, the ring_buffer name for tracing will be renamed to
trace_buffer, and perf's ring buffer will be renamed to perf_buffer.

As there already exists a trace_buffer that is used by the trace_arrays, it
needs to be first renamed to array_buffer.

Link: https://lore.kernel.org/r/20191213153553.GE20583@krava

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# eeeb080b 12-Feb-2019 Masami Hiramatsu <mhiramat@kernel.org>

kprobes: Prohibit probing on hardirq tracers

Since kprobes breakpoint handling involves hardirq tracer,
probing these functions cause breakpoint recursion problem.

Prohibit probing on those functions.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andrea Righi <righi.andrea@gmail.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/154998802073.31052.17255044712514564153.stgit@devbox
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 9acd8de6 01-Jan-2019 Changbin Du <changbin.du@intel.com>

function_graph: Support displaying relative timestamp

When function_graph is used for latency tracers, relative timestamp
is more straightforward than absolute timestamp as function trace
does. This change adds relative timestamp support to function_graph
and applies to latency tracers (wakeup and irqsoff).

Instead of:

# tracer: irqsoff
#
# irqsoff latency trace v1.1.5 on 5.0.0-rc1-test
# --------------------------------------------------------------------
# latency: 521 us, #1125/1125, CPU#2 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:8)
# -----------------
# | task: swapper/2-0 (uid:0 nice:0 policy:0 rt_prio:0)
# -----------------
# => started at: __schedule
# => ended at: _raw_spin_unlock_irq
#
#
# _-----=> irqs-off
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# ||| /
# TIME CPU TASK/PID |||| DURATION FUNCTION CALLS
# | | | | |||| | | | | | |
124.974306 | 2) systemd-693 | d..1 0.000 us | __schedule();
124.974307 | 2) systemd-693 | d..1 | rcu_note_context_switch() {
124.974308 | 2) systemd-693 | d..1 0.487 us | rcu_preempt_deferred_qs();
124.974309 | 2) systemd-693 | d..1 0.451 us | rcu_qs();
124.974310 | 2) systemd-693 | d..1 2.301 us | }
[..]
124.974826 | 2) <idle>-0 | d..2 | finish_task_switch() {
124.974826 | 2) <idle>-0 | d..2 | _raw_spin_unlock_irq() {
124.974827 | 2) <idle>-0 | d..2 0.000 us | _raw_spin_unlock_irq();
124.974828 | 2) <idle>-0 | d..2 0.000 us | tracer_hardirqs_on();
<idle>-0 2d..2 552us : <stack trace>
=> __schedule
=> schedule_idle
=> do_idle
=> cpu_startup_entry
=> start_secondary
=> secondary_startup_64

Show:

# tracer: irqsoff
#
# irqsoff latency trace v1.1.5 on 5.0.0-rc1-test+
# --------------------------------------------------------------------
# latency: 511 us, #1053/1053, CPU#7 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:8)
# -----------------
# | task: swapper/7-0 (uid:0 nice:0 policy:0 rt_prio:0)
# -----------------
# => started at: __schedule
# => ended at: _raw_spin_unlock_irq
#
#
# _-----=> irqs-off
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# ||| /
# REL TIME CPU TASK/PID |||| DURATION FUNCTION CALLS
# | | | | |||| | | | | | |
0 us | 7) sshd-1704 | d..1 0.000 us | __schedule();
1 us | 7) sshd-1704 | d..1 | rcu_note_context_switch() {
1 us | 7) sshd-1704 | d..1 0.611 us | rcu_preempt_deferred_qs();
2 us | 7) sshd-1704 | d..1 0.484 us | rcu_qs();
3 us | 7) sshd-1704 | d..1 2.599 us | }
[..]
509 us | 7) <idle>-0 | d..2 | finish_task_switch() {
510 us | 7) <idle>-0 | d..2 | _raw_spin_unlock_irq() {
510 us | 7) <idle>-0 | d..2 0.000 us | _raw_spin_unlock_irq();
512 us | 7) <idle>-0 | d..2 0.000 us | tracer_hardirqs_on();
<idle>-0 7d..2 543us : <stack trace>
=> __schedule
=> schedule_idle
=> do_idle
=> cpu_startup_entry
=> start_secondary
=> secondary_startup_64

Link: http://lkml.kernel.org/r/20190101154614.8887-2-changbin.du@gmail.com

Signed-off-by: Changbin Du <changbin.du@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 688f7089 15-Nov-2018 Steven Rostedt (VMware) <rostedt@goodmis.org>

fgraph: Add new fgraph_ops structure to enable function graph hooks

Currently the registering of function graph is to pass in a entry and return
function. We need to have a way to associate those functions together where
the entry can determine to run the return hook. Having a structure that
contains both functions will facilitate the process of converting the code
to be able to do such.

This is similar to the way function hooks are enabled (it passes in
ftrace_ops). Instead of passing in the functions to use, a single structure
is passed in to the registering function.

The unregister function is now passed in the fgraph_ops handle. When we
allow more than one callback to the function graph hooks, this will let the
system know which one to remove.

Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 5cf99a0f 29-Nov-2018 Steven Rostedt (VMware) <rostedt@goodmis.org>

tracing/fgraph: Fix set_graph_function from showing interrupts

The tracefs file set_graph_function is used to only function graph functions
that are listed in that file (or all functions if the file is empty). The
way this is implemented is that the function graph tracer looks at every
function, and if the current depth is zero and the function matches
something in the file then it will trace that function. When other functions
are called, the depth will be greater than zero (because the original
function will be at depth zero), and all functions will be traced where the
depth is greater than zero.

The issue is that when a function is first entered, and the handler that
checks this logic is called, the depth is set to zero. If an interrupt comes
in and a function in the interrupt handler is traced, its depth will be
greater than zero and it will automatically be traced, even if the original
function was not. But because the logic only looks at depth it may trace
interrupts when it should not be.

The recent design change of the function graph tracer to fix other bugs
caused the depth to be zero while the function graph callback handler is
being called for a longer time, widening the race of this happening. This
bug was actually there for a longer time, but because the race window was so
small it seldom happened. The Fixes tag below is for the commit that widen
the race window, because that commit belongs to a series that will also help
fix the original bug.

Cc: stable@kernel.org
Fixes: 39eb456dacb5 ("function_graph: Use new curr_ret_depth to manage depth instead of curr_ret_stack")
Reported-by: Joe Lawrence <joe.lawrence@redhat.com>
Tested-by: Joe Lawrence <joe.lawrence@redhat.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# bcea3f96 16-Aug-2018 Steven Rostedt (VMware) <rostedt@goodmis.org>

tracing: Add SPDX License format tags to tracing files

Add the SPDX License header to ease license compliance management.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 3f1756dc 08-Aug-2018 Steven Rostedt (VMware) <rostedt@goodmis.org>

tracing: More reverting of "tracing: Centralize preemptirq tracepoints and unify their usage"

Joel Fernandes created a nice patch that cleaned up the duplicate hooks used
by lockdep and irqsoff latency tracer. It made both use tracepoints. But the
latency tracer is triggering warnings when using tracepoints to call into
the latency tracer's routines. Mainly, they can be called from NMI context.
If that happens, then the SRCU may not work properly because on some
architectures, SRCU is not safe to be called in both NMI and non-NMI
context.

This is a partial revert of the clean up patch c3bc8fd637a9 ("tracing:
Centralize preemptirq tracepoints and unify their usage") that adds back the
direct calls into the latency tracer. It also only calls the trace events
when not in NMI.

Link: http://lkml.kernel.org/r/20180809210654.622445925@goodmis.org
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Fixes: c3bc8fd637a9 ("tracing: Centralize preemptirq tracepoints and unify their usage")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# f27107fa 07-Aug-2018 Steven Rostedt (VMware) <rostedt@goodmis.org>

tracing/irqsoff: Handle preempt_count for different configs

I was hitting the following warning:

WARNING: CPU: 0 PID: 1 at kernel/trace/trace_irqsoff.c:631 tracer_hardirqs_off+0x15/0x2a

Modules linked in:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.18.0-rc6-test+ #13
Hardware name: MSI MS-7823/CSM-H87M-G43 (MS-7823), BIOS V1.6 02/22/2014
EIP: tracer_hardirqs_off+0x15/0x2a
Code: ff 85 c0 74 0e 8b 45 00 8b 50 04 8b 45 04 e8 35 ff ff ff 5d c3 55 64 a1 cc 37 51 c1 a9 ff ff ff 7f 89 e5 53 89 d3 89 ca 75 02 <0f> 0b e8 90 fc ff ff 85 c0 74 07 89 d8 e8 0c ff ff ff 5b 5d c3 55
EAX: 80000000 EBX: c04337f0 ECX: c04338e3 EDX: c04338e3
ESI: c04337f0 EDI: c04338e3 EBP: f2aa1d68 ESP: f2aa1d64
DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00210046
CR0: 80050033 CR2: 00000000 CR3: 01668000 CR4: 001406f0
Call Trace:
trace_irq_disable_rcuidle+0x63/0x6c
trace_hardirqs_off+0x26/0x30
default_send_IPI_mask_allbutself_logical+0x31/0x93
default_send_IPI_allbutself+0x37/0x48
native_send_call_func_ipi+0x4d/0x6a
smp_call_function_many+0x165/0x19d
? add_nops+0x34/0x34
? trace_hardirqs_on_caller+0x2d/0x2d
? add_nops+0x34/0x34
smp_call_function+0x1f/0x23
on_each_cpu+0x15/0x43
? trace_hardirqs_on_caller+0x2d/0x2d
? trace_hardirqs_on_caller+0x2d/0x2d
? trace_irq_disable_rcuidle+0x1/0x6c
text_poke_bp+0xa0/0xc2
? trace_hardirqs_on_caller+0x2d/0x2d
arch_jump_label_transform+0xa7/0xcb
? trace_irq_disable_rcuidle+0x5/0x6c
__jump_label_update+0x3e/0x6d
jump_label_update+0x7d/0x81
static_key_slow_inc_cpuslocked+0x58/0x6d
static_key_slow_inc+0x19/0x20
tracepoint_probe_register_prio+0x19e/0x1d1
? start_critical_timings+0x1c/0x1c
tracepoint_probe_register+0xf/0x11
irqsoff_tracer_init+0x21/0xf2
tracer_init+0x16/0x1a
trace_selftest_startup_irqsoff+0x25/0xc4
run_tracer_selftest+0xca/0x131
register_tracer+0xd5/0x172
? trace_event_define_fields_preemptirq_template+0x45/0x45
init_irqsoff_tracer+0xd/0x11
do_one_initcall+0xab/0x1e8
? rcu_read_lock_sched_held+0x3d/0x44
? trace_initcall_level+0x52/0x86
kernel_init_freeable+0x195/0x21a
? rest_init+0xb4/0xb4
kernel_init+0xd/0xe4
ret_from_fork+0x2e/0x38

It is due to running a CONFIG_PREEMPT_VOLUNTARY kernel, which would trigger
this warning every time:

WARN_ON_ONCE(preempt_count());

Because on CONFIG_PREEMPT_VOLUNTARY, preempt_count() is always zero.

This warning is to make sure preempt_count is set because
tracer_hardirqs_on() does a preempt_enable_notrace() to make the
preempt_trace() work properly, as being called by a trace event, the trace
event code disables preemption, and the tracer wants to know what the
preemption was before it was called.

Instead of enabling preemption like this, just record the preempt_count,
subtract PREEMPT_DISABLE_OFFSET from it (which is zero with !CONFIG_PREEMPT
set), and pass that value to the necessary functions, which should use the
passed in parameter instead of calling preempt_count() directly.

Fixes: da5b3ebb45277 ("tracing: irqsoff: Account for additional preempt_disable")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# da5b3ebb 05-Aug-2018 Joel Fernandes (Google) <joel@joelfernandes.org>

tracing: irqsoff: Account for additional preempt_disable

Recently we tried to make the preemptirqsoff tracer to use irqsoff
tracepoint probes. However this causes issues as reported by Masami:

[2.271078] Testing tracer preemptirqsoff: .. no entries found ..FAILED!
[2.381015] WARNING: CPU: 0 PID: 1 at /home/mhiramat/ksrc/linux/kernel/
trace/trace.c:1512 run_tracer_selftest+0xf3/0x154

This is due to the tracepoint code increasing the preempt nesting count
by calling an additional preempt_disable before calling into the
preemptoff tracer which messes up the preempt_count() check in
tracer_hardirqs_off.

To fix this, make the irqsoff tracer probes balance the additional outer
preempt_disable with a preempt_enable_notrace.

The other way to fix this is to just use SRCU for all tracepoints.
However we can't do that because we can't use NMIs from RCU context.

Link: http://lkml.kernel.org/r/20180806034049.67949-1-joel@joelfernandes.org

Fixes: c3bc8fd637a9 ("tracing: Centralize preemptirq tracepoints and unify their usage")
Fixes: e6753f23d961 ("tracepoint: Make rcuidle tracepoint callers use SRCU")
Reported-by: Masami Hiramatsu <mhiramat@kernel.org>
Tested-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# c3bc8fd6 30-Jul-2018 Joel Fernandes (Google) <joel@joelfernandes.org>

tracing: Centralize preemptirq tracepoints and unify their usage

This patch detaches the preemptirq tracepoints from the tracers and
keeps it separate.

Advantages:
* Lockdep and irqsoff event can now run in parallel since they no longer
have their own calls.

* This unifies the usecase of adding hooks to an irqsoff and irqson
event, and a preemptoff and preempton event.
3 users of the events exist:
- Lockdep
- irqsoff and preemptoff tracers
- irqs and preempt trace events

The unification cleans up several ifdefs and makes the code in preempt
tracer and irqsoff tracers simpler. It gets rid of all the horrific
ifdeferry around PROVE_LOCKING and makes configuration of the different
users of the tracepoints more easy and understandable. It also gets rid
of the time_* function calls from the lockdep hooks used to call into
the preemptirq tracer which is not needed anymore. The negative delta in
lines of code in this patch is quite large too.

In the patch we introduce a new CONFIG option PREEMPTIRQ_TRACEPOINTS
as a single point for registering probes onto the tracepoints. With
this,
the web of config options for preempt/irq toggle tracepoints and its
users becomes:

PREEMPT_TRACER PREEMPTIRQ_EVENTS IRQSOFF_TRACER PROVE_LOCKING
| | \ | |
\ (selects) / \ \ (selects) /
TRACE_PREEMPT_TOGGLE ----> TRACE_IRQFLAGS
\ /
\ (depends on) /
PREEMPTIRQ_TRACEPOINTS

Other than the performance tests mentioned in the previous patch, I also
ran the locking API test suite. I verified that all tests cases are
passing.

I also injected issues by not registering lockdep probes onto the
tracepoints and I see failures to confirm that the probes are indeed
working.

This series + lockdep probes not registered (just to inject errors):
[ 0.000000] hard-irqs-on + irq-safe-A/21: ok | ok | ok |
[ 0.000000] soft-irqs-on + irq-safe-A/21: ok | ok | ok |
[ 0.000000] sirq-safe-A => hirqs-on/12:FAILED|FAILED| ok |
[ 0.000000] sirq-safe-A => hirqs-on/21:FAILED|FAILED| ok |
[ 0.000000] hard-safe-A + irqs-on/12:FAILED|FAILED| ok |
[ 0.000000] soft-safe-A + irqs-on/12:FAILED|FAILED| ok |
[ 0.000000] hard-safe-A + irqs-on/21:FAILED|FAILED| ok |
[ 0.000000] soft-safe-A + irqs-on/21:FAILED|FAILED| ok |
[ 0.000000] hard-safe-A + unsafe-B #1/123: ok | ok | ok |
[ 0.000000] soft-safe-A + unsafe-B #1/123: ok | ok | ok |

With this series + lockdep probes registered, all locking tests pass:

[ 0.000000] hard-irqs-on + irq-safe-A/21: ok | ok | ok |
[ 0.000000] soft-irqs-on + irq-safe-A/21: ok | ok | ok |
[ 0.000000] sirq-safe-A => hirqs-on/12: ok | ok | ok |
[ 0.000000] sirq-safe-A => hirqs-on/21: ok | ok | ok |
[ 0.000000] hard-safe-A + irqs-on/12: ok | ok | ok |
[ 0.000000] soft-safe-A + irqs-on/12: ok | ok | ok |
[ 0.000000] hard-safe-A + irqs-on/21: ok | ok | ok |
[ 0.000000] soft-safe-A + irqs-on/21: ok | ok | ok |
[ 0.000000] hard-safe-A + unsafe-B #1/123: ok | ok | ok |
[ 0.000000] soft-safe-A + unsafe-B #1/123: ok | ok | ok |

Link: http://lkml.kernel.org/r/20180730222423.196630-4-joel@joelfernandes.org

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 2b27ece6 28-Jun-2018 Joel Fernandes (Google) <joel@joelfernandes.org>

tracing/irqsoff: Split reset into separate functions

Split reset functions into seperate functions in preparation
of future patches that need to do tracer specific reset.

Link: http://lkml.kernel.org/r/20180628182149.226164-4-joel@joelfernandes.org

Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# d5915816 10-Oct-2017 Joel Fernandes <joelaf@google.com>

tracing: Add support for preempt and irq enable/disable events

Preempt and irq trace events can be used for tracing the start and
end of an atomic section which can be used by a trace viewer like
systrace to graphically view the start and end of an atomic section and
correlate them with latencies and scheduling issues.

This also serves as a prelude to using synthetic events or probes to
rewrite the preempt and irqsoff tracers, along with numerous benefits of
using trace events features for these events.
Link: http://lkml.kernel.org/r/20171006005432.14244-3-joelaf@google.com
Link: http://lkml.kernel.org/r/20171010225137.17370-1-joelaf@google.com

Cc: Peter Zilstra <peterz@infradead.org>
Cc: kernel-team@android.com
Signed-off-by: Joel Fernandes <joelaf@google.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# aaecaa0b 05-Oct-2017 Joel Fernandes <joelaf@google.com>

tracing: Prepare to add preempt and irq trace events

In preparation of adding irqsoff and preemptsoff enable and disable trace
events, move required functions and code to make it easier to add these events
in a later patch. This patch is just code movement and no functional change.
Link: http://lkml.kernel.org/r/20171006005432.14244-2-joelaf@google.com

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: kernel-team@android.com
Signed-off-by: Joel Fernandes <joelaf@google.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# a5a1d1c2 21-Dec-2016 Thomas Gleixner <tglx@linutronix.de>

clocksource: Use a plain u64 instead of cycle_t

There is no point in having an extra type for extra confusion. u64 is
unambiguous.

Conversion was done with the following coccinelle script:

@rem@
@@
-typedef u64 cycle_t;

@fix@
typedef cycle_t;
@@
-cycle_t
+u64

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: John Stultz <john.stultz@linaro.org>


# 1a414428 08-Dec-2016 Steven Rostedt (Red Hat) <rostedt@goodmis.org>

tracing/fgraph: Have wakeup and irqsoff tracers ignore graph functions too

Currently both the wakeup and irqsoff traces do not handle set_graph_notrace
well. The ftrace infrastructure will ignore the return paths of all
functions leaving them hanging without an end:

# echo '*spin*' > set_graph_notrace
# cat trace
[...]
_raw_spin_lock() {
preempt_count_add() {
do_raw_spin_lock() {
update_rq_clock();

Where the '*spin*' functions should have looked like this:

_raw_spin_lock() {
preempt_count_add();
do_raw_spin_lock();
}
update_rq_clock();

Instead, have the wakeup and irqsoff tracers ignore the functions that are
set by the set_graph_notrace like the function_graph tracer does. Move
the logic in the function_graph tracer into a header to allow wakeup and
irqsoff tracers to use it as well.

Cc: Namhyung Kim <namhyung.kim@lge.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 741f3a69 19-Oct-2015 Dmitry Safonov <0x7f454c46@gmail.com>

tracing: Remove redundant reset per-CPU buff in irqsoff tracer

There is no reason to do it twice: from commit b6f11df26fdc28
("trace: Call tracing_reset_online_cpus before tracer->init()")
resetting of per-CPU buffers done before tracer->init() call.

tracer->init() calls {irqs,preempt,preemptirqs}off_tracer_init() and it
calls __irqsoff_tracer_init(), which resets per-CPU ringbuffer second
time.
It's slowpath, but anyway.

Link: http://lkml.kernel.org/r/1445278226-16187-1-git-send-email-0x7f454c46@gmail.com

Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# cb86e053 17-Mar-2016 Steven Rostedt (Red Hat) <rostedt@goodmis.org>

tracing: Have preempt(irqs)off trace preempt disabled functions

Joel Fernandes reported that the function tracing of preempt disabled
sections was not being reported when running either the preemptirqsoff or
preemptoff tracers. This was due to the fact that the function tracer
callback for those tracers checked if irqs were disabled before tracing. But
this fails when we want to trace preempt off locations as well.

Joel explained that he wanted to see funcitons where interrupts are enabled
but preemption was disabled. The expected output he wanted:

<...>-2265 1d.h1 3419us : preempt_count_sub <-irq_exit
<...>-2265 1d..1 3419us : __do_softirq <-irq_exit
<...>-2265 1d..1 3419us : msecs_to_jiffies <-__do_softirq
<...>-2265 1d..1 3420us : irqtime_account_irq <-__do_softirq
<...>-2265 1d..1 3420us : __local_bh_disable_ip <-__do_softirq
<...>-2265 1..s1 3421us : run_timer_softirq <-__do_softirq
<...>-2265 1..s1 3421us : hrtimer_run_pending <-run_timer_softirq
<...>-2265 1..s1 3421us : _raw_spin_lock_irq <-run_timer_softirq
<...>-2265 1d.s1 3422us : preempt_count_add <-_raw_spin_lock_irq
<...>-2265 1d.s2 3422us : _raw_spin_unlock_irq <-run_timer_softirq
<...>-2265 1..s2 3422us : preempt_count_sub <-_raw_spin_unlock_irq
<...>-2265 1..s1 3423us : rcu_bh_qs <-__do_softirq
<...>-2265 1d.s1 3423us : irqtime_account_irq <-__do_softirq
<...>-2265 1d.s1 3423us : __local_bh_enable <-__do_softirq

There's a comment saying that the irq disabled check is because there's a
possible race that tracing_cpu may be set when the function is executed. But
I don't remember that race. For now, I added a check for preemption being
enabled too to not record the function, as there would be no race if that
was the case. I need to re-investigate this, as I'm now thinking that the
tracing_cpu will always be correct. But no harm in keeping the check for
now, except for the slight performance hit.

Link: http://lkml.kernel.org/r/1457770386-88717-1-git-send-email-agnel.joel@gmail.com

Fixes: 5e6d2b9cfa3a "tracing: Use one prologue for the preempt irqs off tracer function tracers"
Cc: stable@vget.kernel.org # 2.6.37+
Reported-by: Joel Fernandes <agnel.joel@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 79851821 29-Sep-2015 Yaowei Bai <bywxiaobai@163.com>

tracing: report_latency() in trace_irqsoff.c can return boolean

This patch makes report_latency return bool due to this
particular function only using either one or zero as its
return value.

No functional change.

Link: http://lkml.kernel.org/r/1443537816-5788-3-git-send-email-bywxiaobai@163.com

Signed-off-by: Yaowei Bai <bywxiaobai@163.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 983f938a 30-Sep-2015 Steven Rostedt (Red Hat) <rostedt@goodmis.org>

tracing: Move trace_flags from global to a trace_array field

In preparation to make trace options per instance, the global trace_flags
needs to be moved from being a global variable to a field within the trace
instance trace_array structure.

There's still more work to do, as there's some functions that use
trace_flags without passing in a way to get to the current_trace array. For
those, the global_trace is used directly (from trace.c). This includes
setting and clearing the trace_flags. This means that when a new instance is
created, it just gets the trace_flags of the global_trace and will not be
able to modify them. Depending on the functions that have access to the
trace_array, the flags of an instance may not affect parts of its trace,
where the global_trace is used. These will be fixed in future changes.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 8179e8a1 29-Sep-2015 Steven Rostedt (Red Hat) <rostedt@goodmis.org>

tracing: Do not create function tracer options when not compiled in

When the function tracer is not compiled in, do not create the option files
for it.

Fix up both the sched_wakeup and irqsoff tracers to handle the change.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 729358da 29-Sep-2015 Steven Rostedt (Red Hat) <rostedt@goodmis.org>

tracing: Only create function graph options when it is compiled in

Do not create fuction graph tracer options when function graph tracer is not
even compiled in.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 03905582 28-Sep-2015 Steven Rostedt (Red Hat) <rostedt@goodmis.org>

tracing: Move "display-graph" option to main options

In order to facilitate making all tracer options visible even when the
tracer is not active, we need to get rid of duplicate options. Any option
that is shared between multiple tracers really should be a main option.

As the wakeup and irqsoff tracers both use the "display-graph" option, and
use it exactly the same way, move that option from the tracer options to the
main options and consolidate them.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 3efb5f21 20-Jan-2015 Steven Rostedt (Red Hat) <rostedt@goodmis.org>

tracing: Remove unneeded includes of debugfs.h and fs.h

The creation of tracing files and directories is for the most part
encapsulated in helper functions in trace.c. Other files do not need to
include debugfs.h or fs.h, as they may have needed to in the past.

Remove them from the files that do not need them.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 02f2f764 14-Jan-2014 Steven Rostedt (Red Hat) <rostedt@goodmis.org>

tracing: Allow irq/preempt tracers to be used by instances

The irqsoff, preemptoff and preemptirqsoff tracers can now be used by
instances. But they may only be used by one instance at a time (including
the top level directory). This allows multiple tracers to run while the
irqsoff (and friends) tracer is running simultaneously.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 6d9b3fa5 14-Jan-2014 Steven Rostedt (Red Hat) <rostedt@goodmis.org>

tracing: Move tracing_max_latency into trace_array

In preparation for letting the latency tracers be used by instances,
remove the global tracing_max_latency variable and add a max_latency
field to the trace_array that the latency tracers will now use.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 4104d326 10-Jan-2014 Steven Rostedt (Red Hat) <rostedt@goodmis.org>

ftrace: Remove global function list and call function directly

Instead of having a list of global functions that are called,
as only one global function is allow to be enabled at a time, there's
no reason to have a list.

Instead, simply have all the users of the global ops, use the global ops
directly, instead of registering their own ftrace_ops. Just switch what
function is used before enabling the function tracer.

This removes a lot of code as well as the complexity involved with it.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# bf6065b5 10-Jan-2014 Steven Rostedt (Red Hat) <rostedt@goodmis.org>

tracing: Pass trace_array to flag_changed callback

As options (flags) may affect instances instead of being global
the flag_changed() callbacks need to receive the trace_array descriptor
of the instance they will be modifying.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 8c1a49ae 10-Jan-2014 Steven Rostedt (Red Hat) <rostedt@goodmis.org>

tracing: Pass trace_array to set_flag callback

As options (flags) may affect instances instead of being global
the set_flag() callbacks need to receive the trace_array descriptor
of the instance they will be modifying.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 285c00ad 08-Feb-2014 Andi Kleen <ak@linux.intel.com>

asmlinkage: Make trace_hardirqs_on/off_caller visible

These functions are called from assembler, and thus need to be
__visible.

Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Link: http://lkml.kernel.org/r/1391845930-28580-12-git-send-email-ak@linux.intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>


# 10246fa3 01-Jul-2013 Steven Rostedt (Red Hat) <rostedt@goodmis.org>

tracing: Use flag buffer_disabled for irqsoff tracer

If the ring buffer is disabled and the irqsoff tracer records a trace it
will clear out its buffer and lose the data it had previously recorded.

Currently there's a callback when writing to the tracing_of file, but if
tracing is disabled via the function tracer trigger, it will not inform
the irqsoff tracer to stop recording.

By using the "mirror" flag (buffer_disabled) in the trace_array, that keeps
track of the status of the trace_array's buffer, it gives the irqsoff
tracer a fast way to know if it should record a new trace or not.
The flag may be a little behind the real state of the buffer, but it
should not affect the trace too much. It's more important for the irqsoff
tracer to be fast.

Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 328df475 13-Mar-2013 Steven Rostedt (Red Hat) <rostedt@goodmis.org>

tracing: Add function-trace option to disable function tracing of latency tracers

Currently, the only way to stop the latency tracers from doing function
tracing is to fully disable the function tracer from the proc file
system:

echo 0 > /proc/sys/kernel/ftrace_enabled

This is a big hammer approach as it disables function tracing for
all users. This includes kprobes, perf, stack tracer, etc.

Instead, create a function-trace option that the latency tracers can
check to determine if it should enable function tracing or not.
This option can be set or cleared even while the tracer is active
and the tracers will disable or enable function tracing depending
on how the option was set.

Instead of using the proc file, disable latency function tracing with

echo 0 > /debug/tracing/options/function-trace

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Clark Williams <williams@redhat.com>
Cc: John Kacur <jkacur@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 12883efb 05-Mar-2013 Steven Rostedt (Red Hat) <srostedt@redhat.com>

tracing: Consolidate max_tr into main trace_array structure

Currently, the way the latency tracers and snapshot feature works
is to have a separate trace_array called "max_tr" that holds the
snapshot buffer. For latency tracers, this snapshot buffer is used
to swap the running buffer with this buffer to save the current max
latency.

The only items needed for the max_tr is really just a copy of the buffer
itself, the per_cpu data pointers, the time_start timestamp that states
when the max latency was triggered, and the cpu that the max latency
was triggered on. All other fields in trace_array are unused by the
max_tr, making the max_tr mostly bloat.

This change removes the max_tr completely, and adds a new structure
called trace_buffer, that holds the buffer pointer, the per_cpu data
pointers, the time_start timestamp, and the cpu where the latency occurred.

The trace_array, now has two trace_buffers, one for the normal trace and
one for the max trace or snapshot. By doing this, not only do we remove
the bloat from the max_trace but the instances of traces can now use
their own snapshot feature and not have just the top level global_trace have
the snapshot feature and latency tracers for itself.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# a7603ff4 06-Aug-2012 Steven Rostedt <srostedt@redhat.com>

tracing: Replace the static global per_cpu arrays with allocated per_cpu

The global and max-tr currently use static per_cpu arrays for the CPU data
descriptors. But in order to get new allocated trace_arrays, they need to
be allocated per_cpu arrays. Instead of using the static arrays, switch
the global and max-tr to use allocated data.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 2b6080f2 11-May-2012 Steven Rostedt <srostedt@redhat.com>

tracing: Encapsulate global_trace and remove dependencies on global vars

The global_trace variable in kernel/trace/trace.c has been kept 'static' and
local to that file so that it would not be used too much outside of that
file. This has paid off, even though there were lots of changes to make
the trace_array structure more generic (not depending on global_trace).

Removal of a lot of direct usages of global_trace is needed to be able to
create more trace_arrays such that we can add multiple buffers.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 613f04a0 14-Mar-2013 Steven Rostedt (Red Hat) <rostedt@goodmis.org>

tracing: Prevent buffer overwrite disabled for latency tracers

The latency tracers require the buffers to be in overwrite mode,
otherwise they get screwed up. Force the buffers to stay in overwrite
mode when latency tracers are enabled.

Added a flag_changed() method to the tracer structure to allow
the tracers to see what flags are being changed, and also be able
to prevent the change from happing.

Cc: stable@vger.kernel.org
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 6d49e352 06-Dec-2012 Nadia Yvette Chambers <nyc@holomorphy.com>

propagate name change to comments in kernel source

I've legally changed my name with New York State, the US Social Security
Administration, et al. This patch propagates the name change and change
in initials and login to comments in the kernel source as well.

Signed-off-by: Nadia Yvette Chambers <nyc@holomorphy.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>


# f43c738b 02-Oct-2012 Hiraku Toyooka <hiraku.toyooka.gu@hitachi.com>

tracing: Change tracer's integer flags to bool

print_max and use_max_tr in struct tracer are "int" variables and
used like flags. This is wasteful, so change the type to "bool".

Link: http://lkml.kernel.org/r/20121002082710.9807.86393.stgit@falsita

Signed-off-by: Hiraku Toyooka <hiraku.toyooka.gu@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 6f415672 04-Oct-2012 Steven Rostedt <srostedt@redhat.com>

tracing: Allow tracers to start at core initcall

There's times during debugging that it is helpful to see traces of early
boot functions. But the tracers are initialized at device_initcall()
which is quite late during the boot process. Setting the kernel command
line parameter ftrace=function will not show anything until the function
tracer is initialized. This prevents being able to trace functions before
device_initcall().

There's no reason that the tracers need to be initialized so late in the
boot process. Move them up to core_initcall() as they still need to come
after early_initcall() which initializes the tracing buffers.

Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 4740974a 20-Jul-2012 Steven Rostedt <srostedt@redhat.com>

ftrace: Add default recursion protection for function tracing

As more users of the function tracer utility are being added, they do
not always add the necessary recursion protection. To protect from
function recursion due to tracing, if the callback ftrace_ops does not
specifically specify that it protects against recursion (by setting
the FTRACE_OPS_FL_RECURSION_SAFE flag), the list operation will be
called by the mcount trampoline which adds recursion protection.

If the flag is set, then the function will be called directly with no
extra protection.

Note, the list operation is called if more than one function callback
is registered, or if the arch does not support all of the function
tracer features.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# a1e2e31d 08-Aug-2011 Steven Rostedt <srostedt@redhat.com>

ftrace: Return pt_regs to function trace callback

Return as the 4th paramater to the function tracer callback the pt_regs.

Later patches that implement regs passing for the architectures will require
having the ftrace_ops set the SAVE_REGS flag, which will tell the arch
to take the time to pass a full set of pt_regs to the ftrace_ops callback
function. If the arch does not support it then it should pass NULL.

If an arch can pass full regs, then it should define:
ARCH_SUPPORTS_FTRACE_SAVE_REGS to 1

Link: http://lkml.kernel.org/r/20120702201821.019966811@goodmis.org

Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 2f5f6ad9 08-Aug-2011 Steven Rostedt <srostedt@redhat.com>

ftrace: Pass ftrace_ops as third parameter to function trace callback

Currently the function trace callback receives only the ip and parent_ip
of the function that it traced. It would be more powerful to also return
the ops that registered the function as well. This allows the same function
to act differently depending on what ftrace_ops registered it.

Link: http://lkml.kernel.org/r/20120612225424.267254552@goodmis.org

Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 7e9a49ef 07-Nov-2011 Jiri Olsa <jolsa@redhat.com>

tracing/latency: Fix header output for latency tracers

In case the the graph tracer (CONFIG_FUNCTION_GRAPH_TRACER) or even the
function tracer (CONFIG_FUNCTION_TRACER) are not set, the latency tracers
do not display proper latency header.

The involved/fixed latency tracers are:
wakeup_rt
wakeup
preemptirqsoff
preemptoff
irqsoff

The patch adds proper handling of tracer configuration options for latency
tracers, and displaying correct header info accordingly.

* The current output (for wakeup tracer) with both graph and function
tracers disabled is:

# tracer: wakeup
#
<idle>-0 0d.h5 1us+: 0:120:R + [000] 7: 0:R watchdog/0
<idle>-0 0d.h5 3us+: ttwu_do_activate.clone.1 <-try_to_wake_up
...

* The fixed output is:

# tracer: wakeup
#
# wakeup latency trace v1.1.5 on 3.1.0-tip+
# --------------------------------------------------------------------
# latency: 55 us, #4/4, CPU#0 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:2)
# -----------------
# | task: migration/0-6 (uid:0 nice:0 policy:1 rt_prio:99)
# -----------------
#
# _------=> CPU#
# / _-----=> irqs-off
# | / _----=> need-resched
# || / _---=> hardirq/softirq
# ||| / _--=> preempt-depth
# |||| / delay
# cmd pid ||||| time | caller
# \ / ||||| \ | /
cat-1129 0d..4 1us : 1129:120:R + [000] 6: 0:R migration/0
cat-1129 0d..4 2us+: ttwu_do_activate.clone.1 <-try_to_wake_up

* The current output (for wakeup tracer) with only function
tracer enabled is:

# tracer: wakeup
#
cat-1140 0d..4 1us+: 1140:120:R + [000] 6: 0:R migration/0
cat-1140 0d..4 2us : ttwu_do_activate.clone.1 <-try_to_wake_up

* The fixed output is:
# tracer: wakeup
#
# wakeup latency trace v1.1.5 on 3.1.0-tip+
# --------------------------------------------------------------------
# latency: 207 us, #109/109, CPU#1 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:2)
# -----------------
# | task: watchdog/1-12 (uid:0 nice:0 policy:1 rt_prio:99)
# -----------------
#
# _------=> CPU#
# / _-----=> irqs-off
# | / _----=> need-resched
# || / _---=> hardirq/softirq
# ||| / _--=> preempt-depth
# |||| / delay
# cmd pid ||||| time | caller
# \ / ||||| \ | /
<idle>-0 1d.h5 1us+: 0:120:R + [001] 12: 0:R watchdog/1
<idle>-0 1d.h5 3us : ttwu_do_activate.clone.1 <-try_to_wake_up

Link: http://lkml.kernel.org/r/20111107150849.GE1807@m.brq.redhat.com

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# e36de1de 22-Sep-2011 Steven Rostedt <srostedt@redhat.com>

tracing: Fix preemptirqsoff tracer to not stop at preempt off

If irqs are disabled when preemption count reaches zero, the
preemptirqsoff tracer should not flag that as the end.

When interrupts are enabled and preemption count is not zero
the preemptirqsoff correctly continues its tracing.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 5389f6fa 25-Jul-2009 Thomas Gleixner <tglx@linutronix.de>

locking, tracing: Annotate tracing locks as raw

The tracing locks can be taken in atomic context and therefore
cannot be preempted on -rt - annotate it.

In mainline this change documents the low level nature of
the lock - otherwise there's no functional difference. Lockdep
and Sparse checking will work as usual.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 321e68b0 03-Jun-2011 Jiri Olsa <jolsa@redhat.com>

tracing, function_graph: Remove dependency of abstime and duration fields on latency

The display of absolute time and duration fields is based on the
latency field. This was added during the irqsoff/wakeup tracers
graph support changes.

It's causing confusion in what fields will be displayed for the
function_graph tracer itself. So I'm removing this depency, and
adding absolute time and duration fields to the preemptirqsoff
preemptoff irqsoff wakeup tracers.

With following commands:
# echo function_graph > ./current_tracer
# cat trace

This is what it looked like before:
# tracer: function_graph
#
# TIME CPU DURATION FUNCTION CALLS
# | | | | | | | |
0) 0.068 us | } /* page_add_file_rmap */
0) | _raw_spin_unlock() {
...

This is what it looks like now:
# tracer: function_graph
#
# CPU DURATION FUNCTION CALLS
# | | | | | | |
0) 0.068 us | } /* add_preempt_count */
0) 0.993 us | } /* vfsmount_lock_local_lock */
...

For preemptirqsoff preemptoff irqsoff wakeup tracers,
this is what it looked like before:
SNIP
# _-----=> irqs-off
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# ||| / _-=> lock-depth
# |||| /
# CPU TASK/PID ||||| DURATION FUNCTION CALLS
# | | | ||||| | | | | | |
1) <idle>-0 | d..1 0.000 us | acpi_idle_enter_simple();
...

This is what it looks like now:
SNIP
#
# _-----=> irqs-off
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# ||| /
# TIME CPU TASK/PID |||| DURATION FUNCTION CALLS
# | | | | |||| | | | | | |
19.847735 | 1) <idle>-0 | d..1 0.000 us | acpi_idle_enter_simple();
...

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1307113131-10045-2-git-send-email-jolsa@redhat.com
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# b848914c 04-May-2011 Steven Rostedt <srostedt@redhat.com>

ftrace: Implement separate user function filtering

ftrace_ops that are registered to trace functions can now be
agnostic to each other in respect to what functions they trace.
Each ops has their own hash of the functions they want to trace
and a hash to what they do not want to trace. A empty hash for
the functions they want to trace denotes all functions should
be traced that are not in the notrace hash.

Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 25985edc 30-Mar-2011 Lucas De Marchi <lucas.demarchi@profusion.mobi>

Fix common misspellings

Fixes generated by 'codespell' and manually reviewed.

Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>


# 2ce802f6 19-Jan-2011 Tejun Heo <tj@kernel.org>

lockdep: Move early boot local IRQ enable/disable status to init/main.c

During early boot, local IRQ is disabled until IRQ subsystem is
properly initialized. During this time, no one should enable
local IRQ and some operations which usually are not allowed with
IRQ disabled, e.g. operations which might sleep or require
communications with other processors, are allowed.

lockdep tracked this with early_boot_irqs_off/on() callbacks.
As other subsystems need this information too, move it to
init/main.c and make it generally available. While at it,
toggle the boolean to early_boot_irqs_disabled instead of
enabled so that it can be initialized with %false and %true
indicates the exceptional condition.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Pekka Enberg <penberg@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
LKML-Reference: <20110120110635.GB6036@htj.dyndns.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 5e6d2b9c 05-Oct-2010 Steven Rostedt <srostedt@redhat.com>

tracing: Use one prologue for the preempt irqs off tracer function tracers

The preempt and irqsoff tracers have three types of function tracers.
Normal function tracer, function graph entry, and function graph return.
Each of these use a complex dance to prevent recursion and whether
to trace the data or not (depending if interrupts are enabled or not).

This patch moves the duplicate code into a single routine, to
prevent future mistakes with modifying duplicate complex code.

Cc: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 0a772620 23-Sep-2010 Jiri Olsa <jolsa@redhat.com>

tracing: Make graph related irqs/preemptsoff functions global

Move trace_graph_function() and print_graph_headers_flags() functions
to the trace_function_graph.c to be globaly available.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
LKML-Reference: <1285243253-7372-3-git-send-email-jolsa@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# ef710e10 30-Jun-2010 KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>

tracing: Shrink max latency ringbuffer if unnecessary

Documentation/trace/ftrace.txt says

buffer_size_kb:

This sets or displays the number of kilobytes each CPU
buffer can hold. The tracer buffers are the same size
for each CPU. The displayed number is the size of the
CPU buffer and not total size of all buffers. The
trace buffers are allocated in pages (blocks of memory
that the kernel uses for allocation, usually 4 KB in size).
If the last page allocated has room for more bytes
than requested, the rest of the page will be used,
making the actual allocation bigger than requested.
( Note, the size may not be a multiple of the page size
due to buffer management overhead. )

This can only be updated when the current_tracer
is set to "nop".

But it's incorrect. currently total memory consumption is
'buffer_size_kb x CPUs x 2'.

Why two times difference is there? because ftrace implicitly allocate
the buffer for max latency too.

That makes sad result when admin want to use large buffer. (If admin
want full logging and makes detail analysis). example, If admin
have 24 CPUs machine and write 200MB to buffer_size_kb, the system
consume ~10GB memory (200MB x 24 x 2). umm.. 5GB memory waste is
usually unacceptable.

Fortunatelly, almost all users don't use max latency feature.
The max latency buffer can be disabled easily.

This patch shrink buffer size of the max latency buffer if
unnecessary.

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
LKML-Reference: <20100701104554.DA2D.A69D9226@jp.fujitsu.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 62b915f1 02-Apr-2010 Jiri Olsa <jolsa@redhat.com>

tracing: Add graph output support for irqsoff tracer

Add function graph output to irqsoff tracer.

The graph output is enabled by setting new 'display-graph' trace option.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
LKML-Reference: <1270227683-14631-4-git-send-email-jolsa@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# cc51a0fc 11-Dec-2009 Steven Rostedt <srostedt@redhat.com>

tracing: Add stack trace to irqsoff tracer

The irqsoff and friends tracers help in finding causes of latency in the
kernel. The also work with the function tracer to show what was happening
when interrupts or preemption are disabled. But the function tracer has
a bit of an overhead and can cause exagerated readings.

Currently, when tracing with /proc/sys/kernel/ftrace_enabled = 0, where the
function tracer is disabled, the information that is provided can end up
being useless. For example, a 2 and a half millisecond latency only showed:

# tracer: preemptirqsoff
#
# preemptirqsoff latency trace v1.1.5 on 2.6.32
# --------------------------------------------------------------------
# latency: 2463 us, #4/4, CPU#2 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:4)
# -----------------
# | task: -4242 (uid:0 nice:0 policy:0 rt_prio:0)
# -----------------
# => started at: _spin_lock_irqsave
# => ended at: remove_wait_queue
#
#
# _------=> CPU#
# / _-----=> irqs-off
# | / _----=> need-resched
# || / _---=> hardirq/softirq
# ||| / _--=> preempt-depth
# |||| /_--=> lock-depth
# |||||/ delay
# cmd pid |||||| time | caller
# \ / |||||| \ | /
hackbenc-4242 2d.... 0us!: trace_hardirqs_off <-_spin_lock_irqsave
hackbenc-4242 2...1. 2463us+: _spin_unlock_irqrestore <-remove_wait_queue
hackbenc-4242 2...1. 2466us : trace_preempt_on <-remove_wait_queue

The above lets us know that hackbench with pid 2463 grabbed a spin lock
somewhere and enabled preemption at remove_wait_queue. This helps a little
but where this actually happened is not informative.

This patch adds the stack dump to the end of the irqsoff tracer. This provides
the following output:

hackbenc-4242 2d.... 0us!: trace_hardirqs_off <-_spin_lock_irqsave
hackbenc-4242 2...1. 2463us+: _spin_unlock_irqrestore <-remove_wait_queue
hackbenc-4242 2...1. 2466us : trace_preempt_on <-remove_wait_queue
hackbenc-4242 2...1. 2467us : <stack trace>
=> sub_preempt_count
=> _spin_unlock_irqrestore
=> remove_wait_queue
=> free_poll_entry
=> poll_freewait
=> do_sys_poll
=> sys_poll
=> system_call_fastpath

Now we see that the culprit of this latency was the free_poll_entry code.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# b5130b1e 12-Sep-2009 Carsten Emde <Carsten.Emde@osadl.org>

tracing: do not update tracing_max_latency when tracer is stopped

The state of the function pair tracing_stop()/tracing_start() is
correctly considered when tracer data are updated. However, the global
and externally accessible variable tracing_max_latency is always updated
- even when tracing is stopped.

The update should only occur, if tracing was not stopped.

Signed-off-by: Carsten Emde <C.Emde@osadl.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 41dfba43 12-Sep-2009 Carsten Emde <Carsten.Emde@osadl.org>

tracing: remove unused local variables in tracer probe functions

When the nsecs_to_usecs() conversion in probe_wakeup_sched_switch() and
check_critical_timing() was moved to a later stage in order to avoid
unnecessary computing, it was overlooked to remove the original
variables, assignments and comments..

Signed-off-by: Carsten Emde <C.Emde@osadl.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 2f26ebd5 01-Sep-2009 Steven Rostedt <srostedt@redhat.com>

tracing: use timestamp to determine start of latency traces

Currently the latency tracers reset the ring buffer. Unfortunately
if a commit is in process (due to a trace event), this can corrupt
the ring buffer. When this happens, the ring buffer will detect
the corruption and then permanently disable the ring buffer.

The bug does not crash the system, but it does prevent further tracing
after the bug is hit.

Instead of reseting the trace buffers, the timestamp of the start of
the trace is used instead. The buffers will still contain the previous
data, but the output will not count any data that is before the
timestamp of the trace.

Note, this only affects the static trace output (trace) and not the
runtime trace output (trace_pipe). The runtime trace output does not
make sense for the latency tracers anyway.

Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# e9d25fe6 04-Mar-2009 Steven Rostedt <srostedt@redhat.com>

tracing: have latency tracers set the latency format

The latency tracers (irqsoff, preemptoff, preemptirqsoff, and wakeup)
are pretty useless with the default output format. This patch makes them
automatically enable the latency format when they are selected. They
also record the state of the latency option, and if it was not enabled
when selected, they disable it on reset.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>


# 73d8b8bc 16-Feb-2009 Wenji Huang <wenji.huang@oracle.com>

tracing: fix typing mistake in hint message and comments

Impact: cleanup

Fix incorrect hint message in code and typos in comments.

Signed-off-by: Wenji Huang <wenji.huang@oracle.com>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>


# 7be42151 04-Feb-2009 Arnaldo Carvalho de Melo <acme@redhat.com>

trace: Remove unused trace_array_cpu parameter

Impact: cleanup

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 94523e81 22-Jan-2009 Steven Rostedt <srostedt@redhat.com>

trace: remove internal irqsoff disabling for trace output

Impact: cleanup of duplicate features

The trace output disables the ring buffer and prevents tracing to
occur. The code in irqsoff to do the same thing is no longer needed.
This patch removes it.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 1092307d 15-Jan-2009 Steven Rostedt <srostedt@redhat.com>

trace: set max latency variable to zero on default

Impact: trace max latencies on start of latency tracing

This patch sets the max latency to zero whenever one of the
irq variant tracers or the wakeup tracer is set to current tracer.

Most developers expect to see output when starting up a latency
tracer. But since the max_latency is already set to max, and
it takes a latency greater than max_latency to be recorded, there
is no trace. This is not the expected behavior and has even confused
myself.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 745b1626 15-Jan-2009 Steven Rostedt <srostedt@redhat.com>

trace: set max latency variable to zero on default

Impact: trace max latencies on start of latency tracing

This patch sets the max latency to zero whenever one of the
irq variant tracers or the wakeup tracer is set to current tracer.

Most developers expect to see output when starting up a latency
tracer. But since the max_latency is already set to max, and
it takes a latency greater than max_latency to be recorded, there
is no trace. This is not the expected behavior and has even confused
myself.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 1c80025a 15-Nov-2008 Frederic Weisbecker <fweisbec@gmail.com>

tracing/ftrace: change the type of the init() callback

Impact: extend the ->init() method with the ability to fail

This bring a way to know if the initialization of a tracer successed.
A tracer must return 0 on success and a traditional error (ie:
-ENOMEM) if it fails.

If a tracer fails to init, it is free to print a detailed warn. The
tracing api will not and switch to a new tracer will just return the
error from the init callback.

Note: this will be used for the return tracer.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# c76f0694 07-Nov-2008 Steven Rostedt <srostedt@redhat.com>

ftrace: remove trace array ctrl

Impact: remove obsolete variable in trace_array structure

With the new start / stop method of ftrace, the ctrl variable
in the trace_array structure is now obsolete. Remove it.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# bbf5b1a0 07-Nov-2008 Steven Rostedt <srostedt@redhat.com>

ftrace: remove ctrl_update method

Impact: Remove the ctrl_update tracer method

With the new quick start/stop method of tracing, the ctrl_update
method is out of date.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 45193170 07-Nov-2008 Steven Rostedt <srostedt@redhat.com>

ftrace: irqsoff tracer incorrect reset

Impact: fix to irqsoff tracer output

In converting to the new start / stop ftrace handling, the irqsoff
tracer start called the irqsoff reset function. irqsoff tracer is
not the same as the other traces, and it resets the buffers while
searching for the longest latency.

The reset that the irqsoff stop method calls disables the function
tracing. That means that, by starting the tracer, the function
tracer is disabled incorrectly.

This patch simply removes the call to reset which keeps the function
tracing enabled. Reset is not needed for the irqsoff stop method.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 9036990d 05-Nov-2008 Steven Rostedt <srostedt@redhat.com>

ftrace: restructure tracing start/stop infrastructure

Impact: change where tracing is started up and stopped

Currently, when a new tracer is selected via echo'ing a tracer name into
the current_tracer file, the startup is only done if tracing_enabled is
set to one. If tracing_enabled is changed to zero (by echo'ing 0 into
the tracing_enabled file) a full shutdown is performed.

The full startup and shutdown of a tracer can be expensive and the
user can lose out traces when echo'ing in 0 to the tracing_enabled file,
because the process takes too long. There can also be places that
the user would like to start and stop the tracer several times and
doing the full startup and shutdown of a tracer might be too expensive.

This patch performs the full startup and shutdown when a tracer is
selected. It also adds a way to do a quick start or stop of a tracer.
The quick version is just a flag that prevents the tracing from
taking place, but the overhead of the code is still there.

For example, the startup of a tracer may enable tracepoints, or enable
the function tracer. The stop and start will just set a flag to
have the tracer ignore the calls when the tracepoint or function trace
is called. The overhead of the tracer may still be present when
the tracer is stopped, but no tracing will occur. Setting the tracer
to the 'nop' tracer (or any other tracer) will perform the shutdown
of the tracer which will disable the tracepoint or disable the
function tracer.

The tracing_enabled file will simply start or stop tracing.

This change is all internal. The end result for the user should be the same
as before. If tracing_enabled is not set, no trace will happen.
If tracing_enabled is set, then the trace will happen. The tracing_enabled
variable is static between tracers. Enabling tracing_enabled and
going to another tracer will keep tracing_enabled enabled. Same
is true with disabling tracing_enabled.

This patch will now provide a fast start/stop method to the users
for enabling or disabling tracing.

Note: There were two methods to the struct tracer that were never
used: The methods start and stop. These were to be used as a hook
to the reading of the trace output, but ended up not being
necessary. These two methods are now used to enable the start
and stop of each tracer, in case the tracer needs to do more than
just not write into the buffer. For example, the irqsoff tracer
must stop recording max latencies when tracing is stopped.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 606576ce 06-Oct-2008 Steven Rostedt <rostedt@goodmis.org>

ftrace: rename FTRACE to FUNCTION_TRACER

Due to confusion between the ftrace infrastructure and the gcc profiling
tracer "ftrace", this patch renames the config options from FTRACE to
FUNCTION_TRACER. The other two names that are offspring from FTRACE
DYNAMIC_FTRACE and FTRACE_MCOUNT_RECORD will stay the same.

This patch was generated mostly by script, and partially by hand.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 6450c1d3 02-Oct-2008 Steven Rostedt <rostedt@goodmis.org>

ftrace: move pc counter in irqtrace

The assigning of the pc counter is in the wrong spot in the
check_critical_timing function. The pc variable is used in the
out jump.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 38697053 01-Oct-2008 Steven Rostedt <rostedt@goodmis.org>

ftrace: preempt disable over interrupt disable

With the new ring buffer infrastructure in ftrace, I'm trying to make
ftrace a little more light weight.

This patch converts a lot of the local_irq_save/restore into
preempt_disable/enable. The original preempt count in a lot of cases
has to be sent in as a parameter so that it can be recorded correctly.
Some places were recording it incorrectly before anyway.

This is also laying the ground work to make ftrace a little bit
more reentrant, and remove all locking. The function tracers must
still protect from reentrancy.

Note: All the function tracers must be careful when using preempt_disable.
It must do the following:

resched = need_resched();
preempt_disable_notrace();
[...]
if (resched)
preempt_enable_no_resched_notrace();
else
preempt_enable_notrace();

The reason is that if this function traces schedule() itself, the
preempt_enable_notrace() will cause a schedule, which will lead
us into a recursive failure.

If we needed to reschedule before calling preempt_disable, we
should have already scheduled. Since we did not, this is most
likely that we should not and are probably inside a schedule
function.

If resched was not set, we still need to catch the need resched
flag being set when preemption was off and the if case at the
end will catch that for us.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 3928a8a2 29-Sep-2008 Steven Rostedt <rostedt@goodmis.org>

ftrace: make work with new ring buffer

This patch ports ftrace over to the new ring buffer.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 1fe37104 26-Jul-2008 Ingo Molnar <mingo@elte.hu>

ftrace: fix modular build

fix:

ERROR: "start_critical_timings" [drivers/acpi/processor.ko] undefined!
ERROR: "stop_critical_timings" [drivers/acpi/processor.ko] undefined!

Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 1e01cb0c 15-Jul-2008 Steven Rostedt <rostedt@goodmis.org>

ftrace: only trace preempt off with preempt tracer

When PREEMPT_TRACER and IRQSOFF_TRACER are both configured and irqsoff
tracer is running, the preempt_off sections might also be traced.

Thanks to Andrew Morton for pointing out my mistake of spin_lock disabling
interrupts while he was reviewing ftrace.txt. Seems that my example I used
actually hit this bug.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# da89a7a2 21-May-2008 Steven Rostedt <rostedt@goodmis.org>

ftrace: remove printks from irqsoff trace

Printing out new max latencies was fine for the old RT tracer. But for
mainline it is a bit messy. We also need to test if the run queue
is locked before we can do the print. This means that we may not be
printing out latencies if the run queue is locked on another CPU.
This produces inconsistencies in the output.

This patch simply removes the print altogether.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Cc: pq@iki.fi
Cc: proski@gnu.org
Cc: sandmann@redhat.com
Cc: a.p.zijlstra@chello.nl
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# c5f888ca 12-May-2008 Steven Rostedt <rostedt@goodmis.org>

ftrace: irqsoff use raw_smp_processor_id

This patch changes the use of __get_cpu_var to explicitly calling
raw_smp_processor_id and using the per_cpu() macro. On some debug
configurations, the use of __get_cpu_var may cause ftrace to trigger
and this can cause problems with the irqsoff tracing.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 4fe8c304 12-May-2008 Steven Rostedt <rostedt@goodmis.org>

ftrace: printk and trace irqsoff and wakeups

printk called from wakeup critical timings and irqs off can
cause deadlocks since printk might do a wakeup itself. If the
call to printk happens with the runqueue lock held, it can
deadlock.

This patch protects the printk from being called in trace irqs off
with a test to see if the runqueue for the current CPU is locked.
If it is locked, the printk is skipped.

The wakeup always holds the runqueue lock, so the printk is
simply removed.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# e309b41d 12-May-2008 Ingo Molnar <mingo@elte.hu>

ftrace: remove notrace

now that we have a kbuild method for notrace, no need to pollute the
C code with the annotations.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 9ff9cdb2 12-May-2008 Ingo Molnar <mingo@elte.hu>

ftrace: cleanups

clean up recent code.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 6fb44b71 12-May-2008 Steven Rostedt <srostedt@srostedt@redhat.com>

ftrace: add trace_function api for other tracers to use

A new check was added in the ftrace function that wont trace if the CPU
trace buffer is disabled. Unfortunately, other tracers used ftrace() to
write to the buffer after they disabled it. The new disable check makes
these calls into a nop.

This patch changes the __ftrace that is called without the check into a
new api for the other tracers to use, called "trace_function". The other
tracers use this interface instead when the trace CPU buffer is already
disabled.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 750ed1a4 12-May-2008 Ingo Molnar <mingo@elte.hu>

ftrace: timestamp syncing, prepare

rename and uninline now() to ftrace_now().

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# c7aafc54 12-May-2008 Ingo Molnar <mingo@elte.hu>

ftrace: cleanups

factor out code and clean it up.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 60a11774 12-May-2008 Steven Rostedt <srostedt@redhat.com>

ftrace: add self-tests

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 89b2f978 12-May-2008 Steven Rostedt <srostedt@redhat.com>

ftrace: fix updates to max trace

This patch fixes some bugs to the updating of the max trace that
was caused by implementing the new buffering.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 361943ad 12-May-2008 Steven Rostedt <srostedt@redhat.com>

ftrace: irqs off smp_processor_id() fix

The irqsoff function tracer did a __get_cpu_var to determine
if it should trace the function or not. The problem is that
__get_cpu_var can preempt between getting the CPU and reading
the cpu variable. This means that the cpu variable that is
being read is not from the cpu being run on.

At worst, this can give a false positive, where we trace the
function when we should not. It will never give a false negative
since we only want to trace when interrupts are disabled
and we never preempt when they are.

This fix adds a check after reading the irq flags to only
trace if the interrupts are actually disabled. It also changes
the reading of the cpu variable to use a raw_smp_processor_id
since we now don't care if we preempt. We still catch that fact.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 6cd8a4bb 12-May-2008 Steven Rostedt <srostedt@redhat.com>

ftrace: trace preempt off critical timings

Add preempt off timings. A lot of kernel core code is taken from the RT patch
latency trace that was written by Ingo Molnar.

This adds "preemptoff" and "preemptirqsoff" to /debugfs/tracing/available_tracers

Now instead of just tracing irqs off, preemption off can be selected
to be recorded.

When this is selected, it shares the same files as irqs off timings.
One can either trace preemption off, irqs off, or one or the other off.

By echoing "preemptoff" into /debugfs/tracing/current_tracer, recording
of preempt off only is performed. "irqsoff" will only record the time
irqs are disabled, but "preemptirqsoff" will take the total time irqs
or preemption are disabled. Runtime switching of these options is now
supported by simpling echoing in the appropriate trace name into
/debugfs/tracing/current_tracer.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 81d68a96 12-May-2008 Steven Rostedt <srostedt@redhat.com>

ftrace: trace irq disabled critical timings

This patch adds latency tracing for critical timings
(how long interrupts are disabled for).

"irqsoff" is added to /debugfs/tracing/available_tracers

Note:
tracing_max_latency
also holds the max latency for irqsoff (in usecs).
(default to large number so one must start latency tracing)

tracing_thresh
threshold (in usecs) to always print out if irqs off
is detected to be longer than stated here.
If irq_thresh is non-zero, then max_irq_latency
is ignored.

Here's an example of a trace with ftrace_enabled = 0

=======
preemption latency trace v1.1.5 on 2.6.24-rc7
Signed-off-by: Ingo Molnar <mingo@elte.hu>
--------------------------------------------------------------------
latency: 100 us, #3/3, CPU#1 | (M:rt VP:0, KP:0, SP:0 HP:0 #P:2)
-----------------
| task: swapper-0 (uid:0 nice:0 policy:0 rt_prio:0)
-----------------
=> started at: _spin_lock_irqsave+0x2a/0xb7
=> ended at: _spin_unlock_irqrestore+0x32/0x5f

_------=> CPU#
/ _-----=> irqs-off
| / _----=> need-resched
|| / _---=> hardirq/softirq
||| / _--=> preempt-depth
|||| /
||||| delay
cmd pid ||||| time | caller
\ / ||||| \ | /
swapper-0 1d.s3 0us+: _spin_lock_irqsave+0x2a/0xb7 (e1000_update_stats+0x47/0x64c [e1000])
swapper-0 1d.s3 100us : _spin_unlock_irqrestore+0x32/0x5f (e1000_update_stats+0x641/0x64c [e1000])
swapper-0 1d.s3 100us : trace_hardirqs_on_caller+0x75/0x89 (_spin_unlock_irqrestore+0x32/0x5f)

vim:ft=help
=======

And this is a trace with ftrace_enabled == 1

=======
preemption latency trace v1.1.5 on 2.6.24-rc7
--------------------------------------------------------------------
latency: 102 us, #12/12, CPU#1 | (M:rt VP:0, KP:0, SP:0 HP:0 #P:2)
-----------------
| task: swapper-0 (uid:0 nice:0 policy:0 rt_prio:0)
-----------------
=> started at: _spin_lock_irqsave+0x2a/0xb7
=> ended at: _spin_unlock_irqrestore+0x32/0x5f

_------=> CPU#
/ _-----=> irqs-off
| / _----=> need-resched
|| / _---=> hardirq/softirq
||| / _--=> preempt-depth
|||| /
||||| delay
cmd pid ||||| time | caller
\ / ||||| \ | /
swapper-0 1dNs3 0us+: _spin_lock_irqsave+0x2a/0xb7 (e1000_update_stats+0x47/0x64c [e1000])
swapper-0 1dNs3 46us : e1000_read_phy_reg+0x16/0x225 [e1000] (e1000_update_stats+0x5e2/0x64c [e1000])
swapper-0 1dNs3 46us : e1000_swfw_sync_acquire+0x10/0x99 [e1000] (e1000_read_phy_reg+0x49/0x225 [e1000])
swapper-0 1dNs3 46us : e1000_get_hw_eeprom_semaphore+0x12/0xa6 [e1000] (e1000_swfw_sync_acquire+0x36/0x99 [e1000])
swapper-0 1dNs3 47us : __const_udelay+0x9/0x47 (e1000_read_phy_reg+0x116/0x225 [e1000])
swapper-0 1dNs3 47us+: __delay+0x9/0x50 (__const_udelay+0x45/0x47)
swapper-0 1dNs3 97us : preempt_schedule+0xc/0x84 (__delay+0x4e/0x50)
swapper-0 1dNs3 98us : e1000_swfw_sync_release+0xc/0x55 [e1000] (e1000_read_phy_reg+0x211/0x225 [e1000])
swapper-0 1dNs3 99us+: e1000_put_hw_eeprom_semaphore+0x9/0x35 [e1000] (e1000_swfw_sync_release+0x50/0x55 [e1000])
swapper-0 1dNs3 101us : _spin_unlock_irqrestore+0xe/0x5f (e1000_update_stats+0x641/0x64c [e1000])
swapper-0 1dNs3 102us : _spin_unlock_irqrestore+0x32/0x5f (e1000_update_stats+0x641/0x64c [e1000])
swapper-0 1dNs3 102us : trace_hardirqs_on_caller+0x75/0x89 (_spin_unlock_irqrestore+0x32/0x5f)

vim:ft=help
=======

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>