History log of /linux-master/kernel/trace/ftrace.c
Revision Date Author Comments
# d1530413 22-Feb-2024 Randy Dunlap <rdunlap@infradead.org>

ftrace: Fix most kernel-doc warnings

Reduce the number of kernel-doc warnings from 52 down to 10, i.e.,
fix 42 kernel-doc warnings by (a) using the Returns: format for
function return values or (b) using "@var:" instead of "@var -"
for function parameter descriptions.

Fix one return values list so that it is formatted correctly when
rendered for output.

Spell "non-zero" with a hyphen in several places.

Link: https://lore.kernel.org/linux-trace-kernel/20240223054833.15471-1-rdunlap@infradead.org

Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/oe-kbuild-all/202312180518.X6fRyDSN-lkp@intel.com/
Reported-by: kernel test robot <lkp@intel.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>


# bdbddb10 13-Feb-2024 Petr Pavlu <petr.pavlu@suse.com>

tracing: Fix HAVE_DYNAMIC_FTRACE_WITH_REGS ifdef

Commit a8b9cf62ade1 ("ftrace: Fix DIRECT_CALLS to use SAVE_REGS by
default") attempted to fix an issue with direct trampolines on x86, see
its description for details. However, it wrongly referenced the
HAVE_DYNAMIC_FTRACE_WITH_REGS config option and the problem is still
present.

Add the missing "CONFIG_" prefix for the logic to work as intended.

Link: https://lore.kernel.org/linux-trace-kernel/20240213132434.22537-1-petr.pavlu@suse.com

Fixes: a8b9cf62ade1 ("ftrace: Fix DIRECT_CALLS to use SAVE_REGS by default")
Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>


# a8b9cf62 09-Jan-2024 Masami Hiramatsu (Google) <mhiramat@kernel.org>

ftrace: Fix DIRECT_CALLS to use SAVE_REGS by default

The commit 60c8971899f3 ("ftrace: Make DIRECT_CALLS work WITH_ARGS
and !WITH_REGS") changed DIRECT_CALLS to use SAVE_ARGS when there
are multiple ftrace_ops at the same function, but since the x86 only
support to jump to direct_call from ftrace_regs_caller, when we set
the function tracer on the same target function on x86, ftrace-direct
does not work as below (this actually works on arm64.)

At first, insmod ftrace-direct.ko to put a direct_call on
'wake_up_process()'.

# insmod kernel/samples/ftrace/ftrace-direct.ko
# less trace
...
<idle>-0 [006] ..s1. 564.686958: my_direct_func: waking up rcu_preempt-17
<idle>-0 [007] ..s1. 564.687836: my_direct_func: waking up kcompactd0-63
<idle>-0 [006] ..s1. 564.690926: my_direct_func: waking up rcu_preempt-17
<idle>-0 [006] ..s1. 564.696872: my_direct_func: waking up rcu_preempt-17
<idle>-0 [007] ..s1. 565.191982: my_direct_func: waking up kcompactd0-63

Setup a function filter to the 'wake_up_process' too, and enable it.

# cd /sys/kernel/tracing/
# echo wake_up_process > set_ftrace_filter
# echo function > current_tracer
# less trace
...
<idle>-0 [006] ..s3. 686.180972: wake_up_process <-call_timer_fn
<idle>-0 [006] ..s3. 686.186919: wake_up_process <-call_timer_fn
<idle>-0 [002] ..s3. 686.264049: wake_up_process <-call_timer_fn
<idle>-0 [002] d.h6. 686.515216: wake_up_process <-kick_pool
<idle>-0 [002] d.h6. 686.691386: wake_up_process <-kick_pool

Then, only function tracer is shown on x86.
But if you enable 'kprobe on ftrace' event (which uses SAVE_REGS flag)
on the same function, it is shown again.

# echo 'p wake_up_process' >> dynamic_events
# echo 1 > events/kprobes/p_wake_up_process_0/enable
# echo > trace
# less trace
...
<idle>-0 [006] ..s2. 2710.345919: p_wake_up_process_0: (wake_up_process+0x4/0x20)
<idle>-0 [006] ..s3. 2710.345923: wake_up_process <-call_timer_fn
<idle>-0 [006] ..s1. 2710.345928: my_direct_func: waking up rcu_preempt-17
<idle>-0 [006] ..s2. 2710.349931: p_wake_up_process_0: (wake_up_process+0x4/0x20)
<idle>-0 [006] ..s3. 2710.349934: wake_up_process <-call_timer_fn
<idle>-0 [006] ..s1. 2710.349937: my_direct_func: waking up rcu_preempt-17

To fix this issue, use SAVE_REGS flag for multiple ftrace_ops flag of
direct_call by default.

Link: https://lore.kernel.org/linux-trace-kernel/170484558617.178953.1590516949390270842.stgit@devnote2

Fixes: 60c8971899f3 ("ftrace: Make DIRECT_CALLS work WITH_ARGS and !WITH_REGS")
Cc: stable@vger.kernel.org
Cc: Florent Revest <revest@chromium.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com> [arm64]
Acked-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>


# d05cb470 29-Dec-2023 Steven Rostedt (Google) <rostedt@goodmis.org>

ftrace: Fix modification of direct_function hash while in use

Masami Hiramatsu reported a memory leak in register_ftrace_direct() where
if the number of new entries are added is large enough to cause two
allocations in the loop:

for (i = 0; i < size; i++) {
hlist_for_each_entry(entry, &hash->buckets[i], hlist) {
new = ftrace_add_rec_direct(entry->ip, addr, &free_hash);
if (!new)
goto out_remove;
entry->direct = addr;
}
}

Where ftrace_add_rec_direct() has:

if (ftrace_hash_empty(direct_functions) ||
direct_functions->count > 2 * (1 << direct_functions->size_bits)) {
struct ftrace_hash *new_hash;
int size = ftrace_hash_empty(direct_functions) ? 0 :
direct_functions->count + 1;

if (size < 32)
size = 32;

new_hash = dup_hash(direct_functions, size);
if (!new_hash)
return NULL;

*free_hash = direct_functions;
direct_functions = new_hash;
}

The "*free_hash = direct_functions;" can happen twice, losing the previous
allocation of direct_functions.

But this also exposed a more serious bug.

The modification of direct_functions above is not safe. As
direct_functions can be referenced at any time to find what direct caller
it should call, the time between:

new_hash = dup_hash(direct_functions, size);
and
direct_functions = new_hash;

can have a race with another CPU (or even this one if it gets interrupted),
and the entries being moved to the new hash are not referenced.

That's because the "dup_hash()" is really misnamed and is really a
"move_hash()". It moves the entries from the old hash to the new one.

Now even if that was changed, this code is not proper as direct_functions
should not be updated until the end. That is the best way to handle
function reference changes, and is the way other parts of ftrace handles
this.

The following is done:

1. Change add_hash_entry() to return the entry it created and inserted
into the hash, and not just return success or not.

2. Replace ftrace_add_rec_direct() with add_hash_entry(), and remove
the former.

3. Allocate a "new_hash" at the start that is made for holding both the
new hash entries as well as the existing entries in direct_functions.

4. Copy (not move) the direct_function entries over to the new_hash.

5. Copy the entries of the added hash to the new_hash.

6. If everything succeeds, then use rcu_pointer_assign() to update the
direct_functions with the new_hash.

This simplifies the code and fixes both the memory leak as well as the
race condition mentioned above.

Link: https://lore.kernel.org/all/170368070504.42064.8960569647118388081.stgit@devnote2/
Link: https://lore.kernel.org/linux-trace-kernel/20231229115134.08dd5174@gandalf.local.home

Cc: stable@vger.kernel.org
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Fixes: 763e34e74bb7d ("ftrace: Add register_ftrace_direct()")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>


# 2a30dbcb 09-Aug-2023 Ruan Jinjie <ruanjinjie@huawei.com>

ftrace: Use LIST_HEAD to initialize clear_hash

Use LIST_HEAD() to initialize clear_hash instead of open-coding it.

Link: https://lore.kernel.org/linux-trace-kernel/20230809071551.913041-1-ruanjinjie@huawei.com

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Ruan Jinjie <ruanjinjie@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>


# 13511489 03-Aug-2023 Levi Yun <ppbuk5246@gmail.com>

ftrace: Use within_module to check rec->ip within specified module.

within_module_core && within_module_init condition is same to
within module but it's more readable.

Use within_module instead of former condition to check rec->ip
within specified module area or not.

Link: https://lore.kernel.org/linux-trace-kernel/20230803205236.32201-1-ppbuk5246@gmail.com

Signed-off-by: Levi Yun <ppbuk5246@gmail.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>


# 26efd79c 12-Jul-2023 Zheng Yejian <zhengyejian1@huawei.com>

ftrace: Fix possible warning on checking all pages used in ftrace_process_locs()

As comments in ftrace_process_locs(), there may be NULL pointers in
mcount_loc section:
> Some architecture linkers will pad between
> the different mcount_loc sections of different
> object files to satisfy alignments.
> Skip any NULL pointers.

After commit 20e5227e9f55 ("ftrace: allow NULL pointers in mcount_loc"),
NULL pointers will be accounted when allocating ftrace pages but skipped
before adding into ftrace pages, this may result in some pages not being
used. Then after commit 706c81f87f84 ("ftrace: Remove extra helper
functions"), warning may occur at:
WARN_ON(pg->next);

To fix it, only warn for case that no pointers skipped but pages not used
up, then free those unused pages after releasing ftrace_lock.

Link: https://lore.kernel.org/linux-trace-kernel/20230712060452.3175675-1-zhengyejian1@huawei.com

Cc: stable@vger.kernel.org
Fixes: 706c81f87f84 ("ftrace: Remove extra helper functions")
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>


# 83f74441 11-Jun-2023 Jiri Olsa <jolsa@kernel.org>

ftrace: Show all functions with addresses in available_filter_functions_addrs

Adding new available_filter_functions_addrs file that shows all available
functions (same as available_filter_functions) together with addresses,
like:

# cat available_filter_functions_addrs | head
ffffffff81000770 __traceiter_initcall_level
ffffffff810007c0 __traceiter_initcall_start
ffffffff81000810 __traceiter_initcall_finish
ffffffff81000860 trace_initcall_finish_cb
...

Note displayed address is the patch-site address and can differ from
/proc/kallsyms address.

It's useful to have address avilable for traceable symbols, so we don't
need to allways cross check kallsyms with available_filter_functions
(or the other way around) and have all the data in single file.

For backwards compatibility reasons we can't change the existing
available_filter_functions file output, but we need to add new file.

The problem is that we need to do 2 passes:

- through available_filter_functions and find out if the function is traceable
- through /proc/kallsyms to get the address for traceable function

Having available_filter_functions symbols together with addresses allow
us to skip the kallsyms step and we are ok with the address in
available_filter_functions_addr not being the function entry, because
kprobe_multi uses fprobe and that handles both entry and patch-site
address properly.

We have 2 interfaces how to create kprobe_multi link:

a) passing symbols to kernel

1) user gathers symbols and need to ensure that they are
trace-able -> pass through available_filter_functions file

2) kernel takes those symbols and translates them to addresses
through kallsyms api

3) addresses are passed to fprobe/ftrace through:

register_fprobe_ips
-> ftrace_set_filter_ips

b) passing addresses to kernel

1) user gathers symbols and needs to ensure that they are
trace-able -> pass through available_filter_functions file

2) user takes those symbols and translates them to addresses
through /proc/kallsyms

3) addresses are passed to the kernel and kernel calls:

register_fprobe_ips
-> ftrace_set_filter_ips

The new available_filter_functions_addrs file helps us with option b),
because we can make 'b 1' and 'b 2' in one step - while filtering traceable
functions, we get the address directly.

Link: https://lore.kernel.org/linux-trace-kernel/20230611130029.1202298-1-jolsa@kernel.org

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Tested-by: Jackie Liu <liuyun01@kylinos.cn> # x86
Suggested-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>


# d0c2d66f 17-May-2023 Azeem Shaikh <azeemshaikh38@gmail.com>

ftrace: Replace all non-returning strlcpy with strscpy

strlcpy() reads the entire source buffer first.
This read may exceed the destination size limit.
This is both inefficient and can lead to linear read
overflows if a source string is not NUL-terminated [1].
In an effort to remove strlcpy() completely [2], replace
strlcpy() here with strscpy().
No return values were used, so direct replacement is safe.

[1] https://www.kernel.org/doc/html/latest/process/deprecated.html#strlcpy
[2] https://github.com/KSPP/linux/issues/89

Signed-off-by: Azeem Shaikh <azeemshaikh38@gmail.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20230517145323.1522010-1-azeemshaikh38@gmail.com


# 6ce2c04f 02-May-2023 Steven Rostedt (Google) <rostedt@goodmis.org>

ftrace: Add MODIFIED flag to show if IPMODIFY or direct was attached

If a function had ever had IPMODIFY or DIRECT attached to it, where this
is how live kernel patching and BPF overrides work, mark them and display
an "M" in the enabled_functions and touched_functions files. This can be
used for debugging. If a function had been modified and later there's a bug
in the code related to that function, this can be used to know if the cause
is possibly from a live kernel patch or a BPF program that changed the
behavior of the code.

Also update the documentation on the enabled_functions and
touched_functions output, as it was missing direct callers and CALL_OPS.
And include this new modify attribute.

Link: https://lore.kernel.org/linux-trace-kernel/20230502213233.004e3ae4@gandalf.local.home

Cc: Mark Rutland <mark.rutland@arm.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>


# e11b521a 24-Jan-2023 Steven Rostedt (Google) <rostedt@goodmis.org>

ftrace: Show a list of all functions that have ever been enabled

When debugging a crash that appears to be related to ftrace, but not for
sure, it is useful to know if a function was ever enabled by ftrace or
not. It could be that a BPF program was attached to it, or possibly a live
patch.

We are having crashes in the field where this information is not always
known. But having ftrace set a flag if a function has ever been attached
since boot up helps tremendously in trying to know if a crash had to do
with something using ftrace.

For analyzing crashes, the use of a kdump image can have access to the
flags. When looking at issues where the kernel did not panic, the
touched_functions file can simply be used.

Link: https://lore.kernel.org/linux-trace-kernel/20230124095653.6fd1640e@gandalf.local.home

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Chris Li <chriscli@google.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>


# 3703bd54 08-Mar-2023 Zhen Lei <thunder.leizhen@huawei.com>

kallsyms: Delete an unused parameter related to {module_}kallsyms_on_each_symbol()

The parameter 'struct module *' in the hook function associated with
{module_}kallsyms_on_each_symbol() is no longer used. Delete it.

Suggested-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Reviewed-by: Vincenzo Palazzo <vincenzopalazzodev@gmail.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>


# 60c89718 21-Mar-2023 Florent Revest <revest@chromium.org>

ftrace: Make DIRECT_CALLS work WITH_ARGS and !WITH_REGS

Direct called trampolines can be called in two ways:
- either from the ftrace callsite. In this case, they do not access any
struct ftrace_regs nor pt_regs
- Or, if a ftrace ops is also attached, from the end of a ftrace
trampoline. In this case, the call_direct_funcs ops is in charge of
setting the direct call trampoline's address in a struct ftrace_regs

Since:

commit 9705bc709604 ("ftrace: pass fregs to arch_ftrace_set_direct_caller()")

The later case no longer requires a full pt_regs. It only needs a struct
ftrace_regs so DIRECT_CALLS can work with both WITH_ARGS or WITH_REGS.
With architectures like arm64 already abandoning WITH_REGS in favor of
WITH_ARGS, it's important to have DIRECT_CALLS work WITH_ARGS only.

Link: https://lkml.kernel.org/r/20230321140424.345218-7-revest@chromium.org

Signed-off-by: Florent Revest <revest@chromium.org>
Co-developed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>


# dbaccb61 21-Mar-2023 Florent Revest <revest@chromium.org>

ftrace: Store direct called addresses in their ops

All direct calls are now registered using the register_ftrace_direct API
so each ops can jump to only one direct-called trampoline.

By storing the direct called trampoline address directly in the ops we
can save one hashmap lookup in the direct call ops and implement arm64
direct calls on top of call ops.

Link: https://lkml.kernel.org/r/20230321140424.345218-6-revest@chromium.org

Signed-off-by: Florent Revest <revest@chromium.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>


# da8bdfbd 21-Mar-2023 Florent Revest <revest@chromium.org>

ftrace: Rename _ftrace_direct_multi APIs to _ftrace_direct APIs

Now that the original _ftrace_direct APIs are gone, the "_multi"
suffixes only add confusion.

Link: https://lkml.kernel.org/r/20230321140424.345218-5-revest@chromium.org

Signed-off-by: Florent Revest <revest@chromium.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>


# 8788ca16 21-Mar-2023 Florent Revest <revest@chromium.org>

ftrace: Remove the legacy _ftrace_direct API

This API relies on a single global ops, used for all direct calls
registered with it. However, to implement arm64 direct calls, we need
each ops to point to a single direct call trampoline.

Link: https://lkml.kernel.org/r/20230321140424.345218-4-revest@chromium.org

Signed-off-by: Florent Revest <revest@chromium.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>


# 59495740 21-Mar-2023 Florent Revest <revest@chromium.org>

ftrace: Let unregister_ftrace_direct_multi() call ftrace_free_filter()

A common pattern when using the ftrace_direct_multi API is to unregister
the ops and also immediately free its filter. We've noticed it's very
easy for users to miss calling ftrace_free_filter().

This adds a "free_filters" argument to unregister_ftrace_direct_multi()
to both remind the user they should free filters and also to make their
life easier.

Link: https://lkml.kernel.org/r/20230321140424.345218-2-revest@chromium.org

Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Florent Revest <revest@chromium.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>


# 2a2d8c51 29-Mar-2023 Zheng Yejian <zhengyejian1@huawei.com>

ftrace: Fix issue that 'direct->addr' not restored in modify_ftrace_direct()

Syzkaller report a WARNING: "WARN_ON(!direct)" in modify_ftrace_direct().

Root cause is 'direct->addr' was changed from 'old_addr' to 'new_addr' but
not restored if error happened on calling ftrace_modify_direct_caller().
Then it can no longer find 'direct' by that 'old_addr'.

To fix it, restore 'direct->addr' to 'old_addr' explicitly in error path.

Link: https://lore.kernel.org/linux-trace-kernel/20230330025223.1046087-1-zhengyejian1@huawei.com

Cc: stable@vger.kernel.org
Cc: <mhiramat@kernel.org>
Cc: <mark.rutland@arm.com>
Cc: <ast@kernel.org>
Cc: <daniel@iogearbox.net>
Fixes: 8a141dd7f706 ("ftrace: Fix modify_ftrace_direct.")
Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>


# 87325655 11-Mar-2023 Tom Rix <trix@redhat.com>

ftrace: Set direct_ops storage-class-specifier to static

smatch reports this warning
kernel/trace/ftrace.c:2594:19: warning:
symbol 'direct_ops' was not declared. Should it be static?

The variable direct_ops is only used in ftrace.c, so it should be static

Link: https://lore.kernel.org/linux-trace-kernel/20230311135113.711824-1-trix@redhat.com

Signed-off-by: Tom Rix <trix@redhat.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>


# ee92fa44 09-Mar-2023 Chen Zhongjin <chenzhongjin@huawei.com>

ftrace: Fix invalid address access in lookup_rec() when index is 0

KASAN reported follow problem:

BUG: KASAN: use-after-free in lookup_rec
Read of size 8 at addr ffff000199270ff0 by task modprobe
CPU: 2 Comm: modprobe
Call trace:
kasan_report
__asan_load8
lookup_rec
ftrace_location
arch_check_ftrace_location
check_kprobe_address_safe
register_kprobe

When checking pg->records[pg->index - 1].ip in lookup_rec(), it can get a
pg which is newly added to ftrace_pages_start in ftrace_process_locs().
Before the first pg->index++, index is 0 and accessing pg->records[-1].ip
will cause this problem.

Don't check the ip when pg->index is 0.

Link: https://lore.kernel.org/linux-trace-kernel/20230309080230.36064-1-chenzhongjin@huawei.com

Cc: stable@vger.kernel.org
Fixes: 9644302e3315 ("ftrace: Speed up search by skipping pages by address")
Suggested-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Chen Zhongjin <chenzhongjin@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>


# cbad0fb2 23-Jan-2023 Mark Rutland <mark.rutland@arm.com>

ftrace: Add DYNAMIC_FTRACE_WITH_CALL_OPS

Architectures without dynamic ftrace trampolines incur an overhead when
multiple ftrace_ops are enabled with distinct filters. in these cases,
each call site calls a common trampoline which uses
ftrace_ops_list_func() to iterate over all enabled ftrace functions, and
so incurs an overhead relative to the size of this list (including RCU
protection overhead).

Architectures with dynamic ftrace trampolines avoid this overhead for
call sites which have a single associated ftrace_ops. In these cases,
the dynamic trampoline is customized to branch directly to the relevant
ftrace function, avoiding the list overhead.

On some architectures it's impractical and/or undesirable to implement
dynamic ftrace trampolines. For example, arm64 has limited branch ranges
and cannot always directly branch from a call site to an arbitrary
address (e.g. from a kernel text address to an arbitrary module
address). Calls from modules to core kernel text can be indirected via
PLTs (allocated at module load time) to address this, but the same is
not possible from calls from core kernel text.

Using an indirect branch from a call site to an arbitrary trampoline is
possible, but requires several more instructions in the function
prologue (or immediately before it), and/or comes with far more complex
requirements for patching.

Instead, this patch adds a new option, where an architecture can
associate each call site with a pointer to an ftrace_ops, placed at a
fixed offset from the call site. A shared trampoline can recover this
pointer and call ftrace_ops::func() without needing to go via
ftrace_ops_list_func(), avoiding the associated overhead.

This avoids issues with branch range limitations, and avoids the need to
allocate and manipulate dynamic trampolines, making it far simpler to
implement and maintain, while having similar performance
characteristics.

Note that this allows for dynamic ftrace_ops to be invoked directly from
an architecture's ftrace_caller trampoline, whereas existing code forces
the use of ftrace_ops_get_list_func(), which is in part necessary to
permit the ftrace_ops to be freed once unregistered *and* to avoid
branch/address-generation range limitation on some architectures (e.g.
where ops->func is a module address, and may be outside of the direct
branch range for callsites within the main kernel image).

The CALL_OPS approach avoids this problems and is safe as:

* The existing synchronization in ftrace_shutdown() using
ftrace_shutdown() using synchronize_rcu_tasks_rude() (and
synchronize_rcu_tasks()) ensures that no tasks hold a stale reference
to an ftrace_ops (e.g. in the middle of the ftrace_caller trampoline,
or while invoking ftrace_ops::func), when that ftrace_ops is
unregistered.

Arguably this could also be relied upon for the existing scheme,
permitting dynamic ftrace_ops to be invoked directly when ops->func is
in range, but this will require additional logic to handle branch
range limitations, and is not handled by this patch.

* Each callsite's ftrace_ops pointer literal can hold any valid kernel
address, and is updated atomically. As an architecture's ftrace_caller
trampoline will atomically load the ops pointer then dereference
ops->func, there is no risk of invoking ops->func with a mismatches
ops pointer, and updates to the ops pointer do not require special
care.

A subsequent patch will implement architectures support for arm64. There
should be no functional change as a result of this patch alone.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Cc: Florent Revest <revest@chromium.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20230123134603.1064407-2-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>


# 07cc2c93 16-Jan-2023 Zhen Lei <thunder.leizhen@huawei.com>

livepatch: Improve the search performance of module_kallsyms_on_each_symbol()

Currently we traverse all symbols of all modules to find the specified
function for the specified module. But in reality, we just need to find
the given module and then traverse all the symbols in it.

Let's add a new parameter 'const char *modname' to function
module_kallsyms_on_each_symbol(), then we can compare the module names
directly in this function and call hook 'fn' after matching. If 'modname'
is NULL, the symbols of all modules are still traversed for compatibility
with other usage cases.

Phase1: mod1-->mod2..(subsequent modules do not need to be compared)
|
Phase2: -->f1-->f2-->f3

Assuming that there are m modules, each module has n symbols on average,
then the time complexity is reduced from O(m * n) to O(m) + O(n).

Reviewed-by: Petr Mladek <pmladek@suse.com>
Acked-by: Song Liu <song@kernel.org>
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Miroslav Benes <mbenes@suse.cz>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Link: https://lore.kernel.org/r/20230116101009.23694-2-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>


# 8be9fbd5 02-Jan-2023 Mark Rutland <mark.rutland@arm.com>

ftrace: Export ftrace_free_filter() to modules

Setting filters on an ftrace ops results in some memory being allocated
for the filter hashes, which must be freed before the ops can be freed.
This can be done by removing every individual element of the hash by
calling ftrace_set_filter_ip() or ftrace_set_filter_ips() with `remove`
set, but this is somewhat error prone as it's easy to forget to remove
an element.

Make it easier to clean this up by exporting ftrace_free_filter(), which
can be used to clean up all of the filter hashes after an ftrace_ops has
been unregistered.

Using this, fix the ftrace-direct* samples to free hashes prior to being
unloaded. All other code either removes individual filters explicitly or
is built-in and already calls ftrace_free_filter().

Link: https://lkml.kernel.org/r/20230103124912.2948963-3-mark.rutland@arm.com

Cc: stable@vger.kernel.org
Cc: Florent Revest <revest@chromium.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Fixes: e1067a07cfbc ("ftrace/samples: Add module to test multi direct modify interface")
Fixes: 5fae941b9a6f ("ftrace/samples: Add multi direct interface test module")
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>


# d0b24b4e 15-Nov-2022 Guilherme G. Piccoli <gpiccoli@igalia.com>

ftrace: Prevent RCU stall on PREEMPT_VOLUNTARY kernels

The function match_records() may take a while due to a large
number of string comparisons, so when in PREEMPT_VOLUNTARY
kernels we could face RCU stalls due to that.

Add a cond_resched() to prevent that.

Link: https://lkml.kernel.org/r/20221115204847.593616-1-gpiccoli@igalia.com

Cc: Mark Rutland <mark.rutland@arm.com>
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Acked-by: Paul E. McKenney <paulmck@kernel.org> # from RCU CPU stall warning perspective
Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>


# bd604f3d 22-Nov-2022 Steven Rostedt (Google) <rostedt@goodmis.org>

ftrace: Avoid needless updates of the ftrace function call

Song Shuai reported:

The list func (ftrace_ops_list_func) will be patched first
before the transition between old and new calls are set,
which fixed the race described in this commit `59338f75`.

While ftrace_trace_function changes from the list func to a
ftrace_ops func, like unregistering the klp_ops to leave the only
global_ops in ftrace_ops_list, the ftrace_[regs]_call will be
replaced with the list func although it already exists. So there
should be a condition to avoid this.

And suggested using another variable to keep track of what the ftrace
function is set to. But this could be simplified by using a helper
function that does the same with a static variable.

Link: https://lore.kernel.org/lkml/20221026132039.2236233-1-suagrfillet@gmail.com/
Link: https://lore.kernel.org/linux-trace-kernel/20221122180905.737b6f52@gandalf.local.home

Reported-by: Song Shuai <suagrfillet@gmail.com>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>


# 78a01feb 25-Oct-2022 Zheng Yejian <zhengyejian1@huawei.com>

ftrace: Clean comments related to FTRACE_OPS_FL_PER_CPU

Commit b3a88803ac5b ("ftrace: Kill FTRACE_OPS_FL_PER_CPU") didn't
completely remove the comments related to FTRACE_OPS_FL_PER_CPU.

Link: https://lkml.kernel.org/r/20221025153923.1995973-1-zhengyejian1@huawei.com

Fixes: b3a88803ac5b ("ftrace: Kill FTRACE_OPS_FL_PER_CPU")
Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>


# 9705bc70 03-Nov-2022 Mark Rutland <mark.rutland@arm.com>

ftrace: pass fregs to arch_ftrace_set_direct_caller()

In subsequent patches we'll arrange for architectures to have an
ftrace_regs which is entirely distinct from pt_regs. In preparation for
this, we need to minimize the use of pt_regs to where strictly
necessary in the core ftrace code.

This patch changes the prototype of arch_ftrace_set_direct_caller() to
take ftrace_regs rather than pt_regs, and moves the extraction of the
pt_regs into arch_ftrace_set_direct_caller().

On x86, arch_ftrace_set_direct_caller() can be used even when
CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS=n, and <linux/ftrace.h> defines
struct ftrace_regs. Due to this, it's necessary to define
arch_ftrace_set_direct_caller() as a macro to avoid using an incomplete
type. I've also moved the body of arch_ftrace_set_direct_caller() after
the CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS=y defineidion of struct
ftrace_regs.

There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Florent Revest <revest@chromium.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Link: https://lore.kernel.org/r/20221103170520.931305-2-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>


# 3640bf85 25-Oct-2022 Jiri Olsa <jolsa@kernel.org>

ftrace: Add support to resolve module symbols in ftrace_lookup_symbols

Currently ftrace_lookup_symbols iterates only over core symbols,
adding module_kallsyms_on_each_symbol call to check on modules
symbols as well.

Also removing 'args.found == args.cnt' condition, because it's
already checked in kallsyms_callback function.

Also removing 'err < 0' check, because both *kallsyms_on_each_symbol
functions do not return error.

Reported-by: Martynas Pumputis <m@lambda.lt>
Acked-by: Song Liu <song@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20221025134148.3300700-3-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>


# 19ba6c8a 15-Nov-2022 Xiu Jianfeng <xiujianfeng@huawei.com>

ftrace: Fix null pointer dereference in ftrace_add_mod()

The @ftrace_mod is allocated by kzalloc(), so both the members {prev,next}
of @ftrace_mode->list are NULL, it's not a valid state to call list_del().
If kstrdup() for @ftrace_mod->{func|module} fails, it goes to @out_free
tag and calls free_ftrace_mod() to destroy @ftrace_mod, then list_del()
will write prev->next and next->prev, where null pointer dereference
happens.

BUG: kernel NULL pointer dereference, address: 0000000000000008
Oops: 0002 [#1] PREEMPT SMP NOPTI
Call Trace:
<TASK>
ftrace_mod_callback+0x20d/0x220
? do_filp_open+0xd9/0x140
ftrace_process_regex.isra.51+0xbf/0x130
ftrace_regex_write.isra.52.part.53+0x6e/0x90
vfs_write+0xee/0x3a0
? __audit_filter_op+0xb1/0x100
? auditd_test_task+0x38/0x50
ksys_write+0xa5/0xe0
do_syscall_64+0x3a/0x90
entry_SYSCALL_64_after_hwframe+0x63/0xcd
Kernel panic - not syncing: Fatal exception

So call INIT_LIST_HEAD() to initialize the list member to fix this issue.

Link: https://lkml.kernel.org/r/20221116015207.30858-1-xiujianfeng@huawei.com

Cc: stable@vger.kernel.org
Fixes: 673feb9d76ab ("ftrace: Add :mod: caching infrastructure to trace_array")
Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>


# bcea02b0 09-Nov-2022 Wang Wensheng <wangwensheng4@huawei.com>

ftrace: Optimize the allocation for mcount entries

If we can't allocate this size, try something smaller with half of the
size. Its order should be decreased by one instead of divided by two.

Link: https://lkml.kernel.org/r/20221109094434.84046-3-wangwensheng4@huawei.com

Cc: <mhiramat@kernel.org>
Cc: <mark.rutland@arm.com>
Cc: stable@vger.kernel.org
Fixes: a79008755497d ("ftrace: Allocate the mcount record pages as groups")
Signed-off-by: Wang Wensheng <wangwensheng4@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>


# 08948cae 09-Nov-2022 Wang Wensheng <wangwensheng4@huawei.com>

ftrace: Fix the possible incorrect kernel message

If the number of mcount entries is an integer multiple of
ENTRIES_PER_PAGE, the page count showing on the console would be wrong.

Link: https://lkml.kernel.org/r/20221109094434.84046-2-wangwensheng4@huawei.com

Cc: <mhiramat@kernel.org>
Cc: <mark.rutland@arm.com>
Cc: stable@vger.kernel.org
Fixes: 5821e1b74f0d0 ("function tracing: fix wrong pos computing when read buffer has been fulfilled")
Signed-off-by: Wang Wensheng <wangwensheng4@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>


# 0e792b89 02-Nov-2022 Li Huafei <lihuafei1@huawei.com>

ftrace: Fix use-after-free for dynamic ftrace_ops

KASAN reported a use-after-free with ftrace ops [1]. It was found from
vmcore that perf had registered two ops with the same content
successively, both dynamic. After unregistering the second ops, a
use-after-free occurred.

In ftrace_shutdown(), when the second ops is unregistered, the
FTRACE_UPDATE_CALLS command is not set because there is another enabled
ops with the same content. Also, both ops are dynamic and the ftrace
callback function is ftrace_ops_list_func, so the
FTRACE_UPDATE_TRACE_FUNC command will not be set. Eventually the value
of 'command' will be 0 and ftrace_shutdown() will skip the rcu
synchronization.

However, ftrace may be activated. When the ops is released, another CPU
may be accessing the ops. Add the missing synchronization to fix this
problem.

[1]
BUG: KASAN: use-after-free in __ftrace_ops_list_func kernel/trace/ftrace.c:7020 [inline]
BUG: KASAN: use-after-free in ftrace_ops_list_func+0x2b0/0x31c kernel/trace/ftrace.c:7049
Read of size 8 at addr ffff56551965bbc8 by task syz-executor.2/14468

CPU: 1 PID: 14468 Comm: syz-executor.2 Not tainted 5.10.0 #7
Hardware name: linux,dummy-virt (DT)
Call trace:
dump_backtrace+0x0/0x40c arch/arm64/kernel/stacktrace.c:132
show_stack+0x30/0x40 arch/arm64/kernel/stacktrace.c:196
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x1b4/0x248 lib/dump_stack.c:118
print_address_description.constprop.0+0x28/0x48c mm/kasan/report.c:387
__kasan_report mm/kasan/report.c:547 [inline]
kasan_report+0x118/0x210 mm/kasan/report.c:564
check_memory_region_inline mm/kasan/generic.c:187 [inline]
__asan_load8+0x98/0xc0 mm/kasan/generic.c:253
__ftrace_ops_list_func kernel/trace/ftrace.c:7020 [inline]
ftrace_ops_list_func+0x2b0/0x31c kernel/trace/ftrace.c:7049
ftrace_graph_call+0x0/0x4
__might_sleep+0x8/0x100 include/linux/perf_event.h:1170
__might_fault mm/memory.c:5183 [inline]
__might_fault+0x58/0x70 mm/memory.c:5171
do_strncpy_from_user lib/strncpy_from_user.c:41 [inline]
strncpy_from_user+0x1f4/0x4b0 lib/strncpy_from_user.c:139
getname_flags+0xb0/0x31c fs/namei.c:149
getname+0x2c/0x40 fs/namei.c:209
[...]

Allocated by task 14445:
kasan_save_stack+0x24/0x50 mm/kasan/common.c:48
kasan_set_track mm/kasan/common.c:56 [inline]
__kasan_kmalloc mm/kasan/common.c:479 [inline]
__kasan_kmalloc.constprop.0+0x110/0x13c mm/kasan/common.c:449
kasan_kmalloc+0xc/0x14 mm/kasan/common.c:493
kmem_cache_alloc_trace+0x440/0x924 mm/slub.c:2950
kmalloc include/linux/slab.h:563 [inline]
kzalloc include/linux/slab.h:675 [inline]
perf_event_alloc.part.0+0xb4/0x1350 kernel/events/core.c:11230
perf_event_alloc kernel/events/core.c:11733 [inline]
__do_sys_perf_event_open kernel/events/core.c:11831 [inline]
__se_sys_perf_event_open+0x550/0x15f4 kernel/events/core.c:11723
__arm64_sys_perf_event_open+0x6c/0x80 kernel/events/core.c:11723
[...]

Freed by task 14445:
kasan_save_stack+0x24/0x50 mm/kasan/common.c:48
kasan_set_track+0x24/0x34 mm/kasan/common.c:56
kasan_set_free_info+0x20/0x40 mm/kasan/generic.c:358
__kasan_slab_free.part.0+0x11c/0x1b0 mm/kasan/common.c:437
__kasan_slab_free mm/kasan/common.c:445 [inline]
kasan_slab_free+0x2c/0x40 mm/kasan/common.c:446
slab_free_hook mm/slub.c:1569 [inline]
slab_free_freelist_hook mm/slub.c:1608 [inline]
slab_free mm/slub.c:3179 [inline]
kfree+0x12c/0xc10 mm/slub.c:4176
perf_event_alloc.part.0+0xa0c/0x1350 kernel/events/core.c:11434
perf_event_alloc kernel/events/core.c:11733 [inline]
__do_sys_perf_event_open kernel/events/core.c:11831 [inline]
__se_sys_perf_event_open+0x550/0x15f4 kernel/events/core.c:11723
[...]

Link: https://lore.kernel.org/linux-trace-kernel/20221103031010.166498-1-lihuafei1@huawei.com

Fixes: edb096e00724f ("ftrace: Fix memleak when unregistering dynamic ops when tracing disabled")
Cc: stable@vger.kernel.org
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Li Huafei <lihuafei1@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>


# 30f7d1ca 10-Oct-2022 Zheng Yejian <zhengyejian1@huawei.com>

ftrace: Fix char print issue in print_ip_ins()

When ftrace bug happened, following log shows every hex data in
problematic ip address:
actual: ffffffe8:6b:ffffffd9:01:21

But so many 'f's seem a little confusing, and that is because format
'%x' being used to print signed chars in array 'ins'. As suggested
by Joe, change to use format "%*phC" to print array 'ins'.

After this patch, the log is like:
actual: e8:6b:d9:01:21

Link: https://lkml.kernel.org/r/20221011120352.1878494-1-zhengyejian1@huawei.com

Fixes: 6c14133d2d3f ("ftrace: Do not blindly read the ip address in ftrace_bug()")
Suggested-by: Joe Perches <joe@perches.com>
Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>


# cf04f2d5 04-Oct-2022 Steven Rostedt (Google) <rostedt@goodmis.org>

ftrace: Still disable enabled records marked as disabled

Weak functions started causing havoc as they showed up in the
"available_filter_functions" and this confused people as to why some
functions marked as "notrace" were listed, but when enabled they did
nothing. This was because weak functions can still have fentry calls, and
these addresses get added to the "available_filter_functions" file.
kallsyms is what converts those addresses to names, and since the weak
functions are not listed in kallsyms, it would just pick the function
before that.

To solve this, there was a trick to detect weak functions listed, and
these records would be marked as DISABLED so that they do not get enabled
and are mostly ignored. As the processing of the list of all functions to
figure out what is weak or not can take a long time, this process is put
off into a kernel thread and run in parallel with the rest of start up.

Now the issue happens whet function tracing is enabled via the kernel
command line. As it starts very early in boot up, it can be enabled before
the records that are weak are marked to be disabled. This causes an issue
in the accounting, as the weak records are enabled by the command line
function tracing, but after boot up, they are not disabled.

The ftrace records have several accounting flags and a ref count. The
DISABLED flag is just one. If the record is enabled before it is marked
DISABLED it will get an ENABLED flag and also have its ref counter
incremented. After it is marked for DISABLED, neither the ENABLED flag nor
the ref counter is cleared. There's sanity checks on the records that are
performed after an ftrace function is registered or unregistered, and this
detected that there were records marked as ENABLED with ref counter that
should not have been.

Note, the module loading code uses the DISABLED flag as well to keep its
functions from being modified while its being loaded and some of these
flags may get set in this process. So changing the verification code to
ignore DISABLED records is a no go, as it still needs to verify that the
module records are working too.

Also, the weak functions still are calling a trampoline. Even though they
should never be called, it is dangerous to leave these weak functions
calling a trampoline that is freed, so they should still be set back to
nops.

There's two places that need to not skip records that have the ENABLED
and the DISABLED flags set. That is where the ftrace_ops is processed and
sets the records ref counts, and then later when the function itself is to
be updated, and the ENABLED flag gets removed. Add a helper function
"skip_record()" that returns true if the record has the DISABLED flag set
but not the ENABLED flag.

Link: https://lkml.kernel.org/r/20221005003809.27d2b97b@gandalf.local.home

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: stable@vger.kernel.org
Fixes: b39181f7c6907 ("ftrace: Add FTRACE_MCOUNT_MAX_OFFSET to avoid adding weak function")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>


# 9d2ce78d 26-Sep-2022 Song Liu <song@kernel.org>

ftrace: Fix recursive locking direct_mutex in ftrace_modify_direct_caller

Naveen reported recursive locking of direct_mutex with sample
ftrace-direct-modify.ko:

[ 74.762406] WARNING: possible recursive locking detected
[ 74.762887] 6.0.0-rc6+ #33 Not tainted
[ 74.763216] --------------------------------------------
[ 74.763672] event-sample-fn/1084 is trying to acquire lock:
[ 74.764152] ffffffff86c9d6b0 (direct_mutex){+.+.}-{3:3}, at: \
register_ftrace_function+0x1f/0x180
[ 74.764922]
[ 74.764922] but task is already holding lock:
[ 74.765421] ffffffff86c9d6b0 (direct_mutex){+.+.}-{3:3}, at: \
modify_ftrace_direct+0x34/0x1f0
[ 74.766142]
[ 74.766142] other info that might help us debug this:
[ 74.766701] Possible unsafe locking scenario:
[ 74.766701]
[ 74.767216] CPU0
[ 74.767437] ----
[ 74.767656] lock(direct_mutex);
[ 74.767952] lock(direct_mutex);
[ 74.768245]
[ 74.768245] *** DEADLOCK ***
[ 74.768245]
[ 74.768750] May be due to missing lock nesting notation
[ 74.768750]
[ 74.769332] 1 lock held by event-sample-fn/1084:
[ 74.769731] #0: ffffffff86c9d6b0 (direct_mutex){+.+.}-{3:3}, at: \
modify_ftrace_direct+0x34/0x1f0
[ 74.770496]
[ 74.770496] stack backtrace:
[ 74.770884] CPU: 4 PID: 1084 Comm: event-sample-fn Not tainted ...
[ 74.771498] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), ...
[ 74.772474] Call Trace:
[ 74.772696] <TASK>
[ 74.772896] dump_stack_lvl+0x44/0x5b
[ 74.773223] __lock_acquire.cold.74+0xac/0x2b7
[ 74.773616] lock_acquire+0xd2/0x310
[ 74.773936] ? register_ftrace_function+0x1f/0x180
[ 74.774357] ? lock_is_held_type+0xd8/0x130
[ 74.774744] ? my_tramp2+0x11/0x11 [ftrace_direct_modify]
[ 74.775213] __mutex_lock+0x99/0x1010
[ 74.775536] ? register_ftrace_function+0x1f/0x180
[ 74.775954] ? slab_free_freelist_hook.isra.43+0x115/0x160
[ 74.776424] ? ftrace_set_hash+0x195/0x220
[ 74.776779] ? register_ftrace_function+0x1f/0x180
[ 74.777194] ? kfree+0x3e1/0x440
[ 74.777482] ? my_tramp2+0x11/0x11 [ftrace_direct_modify]
[ 74.777941] ? __schedule+0xb40/0xb40
[ 74.778258] ? register_ftrace_function+0x1f/0x180
[ 74.778672] ? my_tramp1+0xf/0xf [ftrace_direct_modify]
[ 74.779128] register_ftrace_function+0x1f/0x180
[ 74.779527] ? ftrace_set_filter_ip+0x33/0x70
[ 74.779910] ? __schedule+0xb40/0xb40
[ 74.780231] ? my_tramp1+0xf/0xf [ftrace_direct_modify]
[ 74.780678] ? my_tramp2+0x11/0x11 [ftrace_direct_modify]
[ 74.781147] ftrace_modify_direct_caller+0x5b/0x90
[ 74.781563] ? 0xffffffffa0201000
[ 74.781859] ? my_tramp1+0xf/0xf [ftrace_direct_modify]
[ 74.782309] modify_ftrace_direct+0x1b2/0x1f0
[ 74.782690] ? __schedule+0xb40/0xb40
[ 74.783014] ? simple_thread+0x2a/0xb0 [ftrace_direct_modify]
[ 74.783508] ? __schedule+0xb40/0xb40
[ 74.783832] ? my_tramp2+0x11/0x11 [ftrace_direct_modify]
[ 74.784294] simple_thread+0x76/0xb0 [ftrace_direct_modify]
[ 74.784766] kthread+0xf5/0x120
[ 74.785052] ? kthread_complete_and_exit+0x20/0x20
[ 74.785464] ret_from_fork+0x22/0x30
[ 74.785781] </TASK>

Fix this by using register_ftrace_function_nolock in
ftrace_modify_direct_caller.

Link: https://lkml.kernel.org/r/20220927004146.1215303-1-song@kernel.org

Fixes: 53cd885bc5c3 ("ftrace: Allow IPMODIFY and DIRECT ops on the same function")
Reported-and-tested-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>


# 0ce0638e 26-Sep-2022 Zheng Yejian <zhengyejian1@huawei.com>

ftrace: Properly unset FTRACE_HASH_FL_MOD

When executing following commands like what document said, but the log
"#### all functions enabled ####" was not shown as expect:
1. Set a 'mod' filter:
$ echo 'write*:mod:ext3' > /sys/kernel/tracing/set_ftrace_filter
2. Invert above filter:
$ echo '!write*:mod:ext3' >> /sys/kernel/tracing/set_ftrace_filter
3. Read the file:
$ cat /sys/kernel/tracing/set_ftrace_filter

By some debugging, I found that flag FTRACE_HASH_FL_MOD was not unset
after inversion like above step 2 and then result of ftrace_hash_empty()
is incorrect.

Link: https://lkml.kernel.org/r/20220926152008.2239274-1-zhengyejian1@huawei.com

Cc: <mingo@redhat.com>
Cc: stable@vger.kernel.org
Fixes: 8c08f0d5c6fb ("ftrace: Have cached module filters be an active filter")
Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>


# 9d68c19c 26-Sep-2022 Jiri Olsa <jolsa@kernel.org>

ftrace: Keep the resolved addr in kallsyms_callback

Keeping the resolved 'addr' in kallsyms_callback, instead of taking
ftrace_location value, because we depend on symbol address in the
cookie related code.

With CONFIG_X86_KERNEL_IBT option the ftrace_location value differs
from symbol address, which screwes the symbol address cookies matching.

There are 2 users of this function:
- bpf_kprobe_multi_link_attach
for which this fix is for

- get_ftrace_locations
which is used by register_fprobe_syms

this function needs to get symbols resolved to addresses,
but does not need 'ftrace location addresses' at this point
there's another ftrace location translation in the path done
by ftrace_set_filter_ips call:

register_fprobe_syms
addrs = get_ftrace_locations

register_fprobe_ips(addrs)
...
ftrace_set_filter_ips
...
__ftrace_match_addr
ip = ftrace_location(ip);
...

Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20220926153340.1621984-3-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>


# 123d6455 01-Aug-2022 Wang Jingjin <wangjingjin1@huawei.com>

ftrace: Fix build warning for ops_references_rec() not used

The change that made IPMODIFY and DIRECT ops work together needed access
to the ops_references_ip() function, which it pulled out of the module
only code. But now if both CONFIG_MODULES and
CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS is not set, we get the below
warning:

‘ops_references_rec’ defined but not used.

Since ops_references_rec() only calls ops_references_ip() replace the
usage of ops_references_rec() with ops_references_ip() and encompass the
function with an #ifdef of DIRECT_CALLS || MODULES being defined.

Link: https://lkml.kernel.org/r/20220801084745.1187987-1-wangjingjin1@huawei.com

Fixes: 53cd885bc5c3 ("ftrace: Allow IPMODIFY and DIRECT ops on the same function")
Signed-off-by: Wang Jingjin <wangjingjin1@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>


# c3b0f72e 17-Aug-2022 Yang Jihong <yangjihong1@huawei.com>

ftrace: Fix NULL pointer dereference in is_ftrace_trampoline when ftrace is dead

ftrace_startup does not remove ops from ftrace_ops_list when
ftrace_startup_enable fails:

register_ftrace_function
ftrace_startup
__register_ftrace_function
...
add_ftrace_ops(&ftrace_ops_list, ops)
...
...
ftrace_startup_enable // if ftrace failed to modify, ftrace_disabled is set to 1
...
return 0 // ops is in the ftrace_ops_list.

When ftrace_disabled = 1, unregister_ftrace_function simply returns without doing anything:
unregister_ftrace_function
ftrace_shutdown
if (unlikely(ftrace_disabled))
return -ENODEV; // return here, __unregister_ftrace_function is not executed,
// as a result, ops is still in the ftrace_ops_list
__unregister_ftrace_function
...

If ops is dynamically allocated, it will be free later, in this case,
is_ftrace_trampoline accesses NULL pointer:

is_ftrace_trampoline
ftrace_ops_trampoline
do_for_each_ftrace_op(op, ftrace_ops_list) // OOPS! op may be NULL!

Syzkaller reports as follows:
[ 1203.506103] BUG: kernel NULL pointer dereference, address: 000000000000010b
[ 1203.508039] #PF: supervisor read access in kernel mode
[ 1203.508798] #PF: error_code(0x0000) - not-present page
[ 1203.509558] PGD 800000011660b067 P4D 800000011660b067 PUD 130fb8067 PMD 0
[ 1203.510560] Oops: 0000 [#1] SMP KASAN PTI
[ 1203.511189] CPU: 6 PID: 29532 Comm: syz-executor.2 Tainted: G B W 5.10.0 #8
[ 1203.512324] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 1203.513895] RIP: 0010:is_ftrace_trampoline+0x26/0xb0
[ 1203.514644] Code: ff eb d3 90 41 55 41 54 49 89 fc 55 53 e8 f2 00 fd ff 48 8b 1d 3b 35 5d 03 e8 e6 00 fd ff 48 8d bb 90 00 00 00 e8 2a 81 26 00 <48> 8b ab 90 00 00 00 48 85 ed 74 1d e8 c9 00 fd ff 48 8d bb 98 00
[ 1203.518838] RSP: 0018:ffffc900012cf960 EFLAGS: 00010246
[ 1203.520092] RAX: 0000000000000000 RBX: 000000000000007b RCX: ffffffff8a331866
[ 1203.521469] RDX: 0000000000000000 RSI: 0000000000000008 RDI: 000000000000010b
[ 1203.522583] RBP: 0000000000000000 R08: 0000000000000000 R09: ffffffff8df18b07
[ 1203.523550] R10: fffffbfff1be3160 R11: 0000000000000001 R12: 0000000000478399
[ 1203.524596] R13: 0000000000000000 R14: ffff888145088000 R15: 0000000000000008
[ 1203.525634] FS: 00007f429f5f4700(0000) GS:ffff8881daf00000(0000) knlGS:0000000000000000
[ 1203.526801] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1203.527626] CR2: 000000000000010b CR3: 0000000170e1e001 CR4: 00000000003706e0
[ 1203.528611] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1203.529605] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400

Therefore, when ftrace_startup_enable fails, we need to rollback registration
process and remove ops from ftrace_ops_list.

Link: https://lkml.kernel.org/r/20220818032659.56209-1-yangjihong1@huawei.com

Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>


# 53cd885b 19-Jul-2022 Song Liu <song@kernel.org>

ftrace: Allow IPMODIFY and DIRECT ops on the same function

IPMODIFY (livepatch) and DIRECT (bpf trampoline) ops are both important
users of ftrace. It is necessary to allow them work on the same function
at the same time.

First, DIRECT ops no longer specify IPMODIFY flag. Instead, DIRECT flag is
handled together with IPMODIFY flag in __ftrace_hash_update_ipmodify().

Then, a callback function, ops_func, is added to ftrace_ops. This is used
by ftrace core code to understand whether the DIRECT ops can share with an
IPMODIFY ops. To share with IPMODIFY ops, the DIRECT ops need to implement
the callback function and adjust the direct trampoline accordingly.

If DIRECT ops is attached before the IPMODIFY ops, ftrace core code calls
ENABLE_SHARE_IPMODIFY_PEER on the DIRECT ops before registering the
IPMODIFY ops.

If IPMODIFY ops is attached before the DIRECT ops, ftrace core code calls
ENABLE_SHARE_IPMODIFY_SELF in __ftrace_hash_update_ipmodify. Owner of the
DIRECT ops may return 0 if the DIRECT trampoline can share with IPMODIFY,
so error code otherwise. The error code is propagated to
register_ftrace_direct_multi so that onwer of the DIRECT trampoline can
handle it properly.

For more details, please refer to comment before enum ftrace_ops_cmd.

Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Link: https://lore.kernel.org/all/20220602193706.2607681-2-song@kernel.org/
Link: https://lore.kernel.org/all/20220718055449.3960512-1-song@kernel.org/
Link: https://lore.kernel.org/bpf/20220720002126.803253-3-song@kernel.org


# f96f644a 19-Jul-2022 Song Liu <song@kernel.org>

ftrace: Add modify_ftrace_direct_multi_nolock

This is similar to modify_ftrace_direct_multi, but does not acquire
direct_mutex. This is useful when direct_mutex is already locked by the
user.

Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Link: https://lore.kernel.org/bpf/20220720002126.803253-2-song@kernel.org


# eb1b2985 15-Jun-2022 Jiri Olsa <jolsa@kernel.org>

ftrace: Keep address offset in ftrace_lookup_symbols

We want to store the resolved address on the same index as
the symbol string, because that's the user (bpf kprobe link)
code assumption.

Also making sure we don't store duplicates that might be
present in kallsyms.

Acked-by: Song Liu <songliubraving@fb.com>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Fixes: bed0d9a50dac ("ftrace: Add ftrace_lookup_symbols function")
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20220615112118.497303-3-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>


# b39181f7 26-May-2022 Steven Rostedt (Google) <rostedt@goodmis.org>

ftrace: Add FTRACE_MCOUNT_MAX_OFFSET to avoid adding weak function

If an unused weak function was traced, it's call to fentry will still
exist, which gets added into the __mcount_loc table. Ftrace will use
kallsyms to retrieve the name for each location in __mcount_loc to display
it in the available_filter_functions and used to enable functions via the
name matching in set_ftrace_filter/notrace. Enabling these functions do
nothing but enable an unused call to ftrace_caller. If a traced weak
function is overridden, the symbol of the function would be used for it,
which will either created duplicate names, or if the previous function was
not traced, it would be incorrectly be listed in available_filter_functions
as a function that can be traced.

This became an issue with BPF[1] as there are tooling that enables the
direct callers via ftrace but then checks to see if the functions were
actually enabled. The case of one function that was marked notrace, but
was followed by an unused weak function that was traced. The unused
function's call to fentry was added to the __mcount_loc section, and
kallsyms retrieved the untraced function's symbol as the weak function was
overridden. Since the untraced function would not get traced, the BPF
check would detect this and fail.

The real fix would be to fix kallsyms to not show addresses of weak
functions as the function before it. But that would require adding code in
the build to add function size to kallsyms so that it can know when the
function ends instead of just using the start of the next known symbol.

In the mean time, this is a work around. Add a FTRACE_MCOUNT_MAX_OFFSET
macro that if defined, ftrace will ignore any function that has its call
to fentry/mcount that has an offset from the symbol that is greater than
FTRACE_MCOUNT_MAX_OFFSET.

If CONFIG_HAVE_FENTRY is defined for x86, define FTRACE_MCOUNT_MAX_OFFSET
to zero (unless IBT is enabled), which will have ftrace ignore all locations
that are not at the start of the function (or one after the ENDBR
instruction).

A worker thread is added at boot up to scan all the ftrace record entries,
and will mark any that fail the FTRACE_MCOUNT_MAX_OFFSET test as disabled.
They will still appear in the available_filter_functions file as:

__ftrace_invalid_address___<invalid-offset>

(showing the offset that caused it to be invalid).

This is required for tools that use libtracefs (like trace-cmd does) that
scan the available_filter_functions and enable set_ftrace_filter and
set_ftrace_notrace using indexes of the function listed in the file (this
is a speedup, as enabling thousands of files via names is an O(n^2)
operation and can take minutes to complete, where the indexing takes less
than a second).

The invalid functions cannot be removed from available_filter_functions as
the names there correspond to the ftrace records in the array that manages
them (and the indexing depends on this).

[1] https://lore.kernel.org/all/20220412094923.0abe90955e5db486b7bca279@kernel.org/

Link: https://lkml.kernel.org/r/20220526141912.794c2786@gandalf.local.home

Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>


# 7d54c15c 24-May-2022 Song Liu <song@kernel.org>

ftrace: Clean up hash direct_functions on register failures

We see the following GPF when register_ftrace_direct fails:

[ ] general protection fault, probably for non-canonical address \
0x200000000000010: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC PTI
[...]
[ ] RIP: 0010:ftrace_find_rec_direct+0x53/0x70
[ ] Code: 48 c1 e0 03 48 03 42 08 48 8b 10 31 c0 48 85 d2 74 [...]
[ ] RSP: 0018:ffffc9000138bc10 EFLAGS: 00010206
[ ] RAX: 0000000000000000 RBX: ffffffff813e0df0 RCX: 000000000000003b
[ ] RDX: 0200000000000000 RSI: 000000000000000c RDI: ffffffff813e0df0
[ ] RBP: ffffffffa00a3000 R08: ffffffff81180ce0 R09: 0000000000000001
[ ] R10: ffffc9000138bc18 R11: 0000000000000001 R12: ffffffff813e0df0
[ ] R13: ffffffff813e0df0 R14: ffff888171b56400 R15: 0000000000000000
[ ] FS: 00007fa9420c7780(0000) GS:ffff888ff6a00000(0000) knlGS:000000000
[ ] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ ] CR2: 000000000770d000 CR3: 0000000107d50003 CR4: 0000000000370ee0
[ ] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ ] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ ] Call Trace:
[ ] <TASK>
[ ] register_ftrace_direct+0x54/0x290
[ ] ? render_sigset_t+0xa0/0xa0
[ ] bpf_trampoline_update+0x3f5/0x4a0
[ ] ? 0xffffffffa00a3000
[ ] bpf_trampoline_link_prog+0xa9/0x140
[ ] bpf_tracing_prog_attach+0x1dc/0x450
[ ] bpf_raw_tracepoint_open+0x9a/0x1e0
[ ] ? find_held_lock+0x2d/0x90
[ ] ? lock_release+0x150/0x430
[ ] __sys_bpf+0xbd6/0x2700
[ ] ? lock_is_held_type+0xd8/0x130
[ ] __x64_sys_bpf+0x1c/0x20
[ ] do_syscall_64+0x3a/0x80
[ ] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ ] RIP: 0033:0x7fa9421defa9
[ ] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 9 f8 [...]
[ ] RSP: 002b:00007ffed743bd78 EFLAGS: 00000246 ORIG_RAX: 0000000000000141
[ ] RAX: ffffffffffffffda RBX: 00000000069d2480 RCX: 00007fa9421defa9
[ ] RDX: 0000000000000078 RSI: 00007ffed743bd80 RDI: 0000000000000011
[ ] RBP: 00007ffed743be00 R08: 0000000000bb7270 R09: 0000000000000000
[ ] R10: 00000000069da210 R11: 0000000000000246 R12: 0000000000000001
[ ] R13: 00007ffed743c4b0 R14: 00000000069d2480 R15: 0000000000000001
[ ] </TASK>
[ ] Modules linked in: klp_vm(OK)
[ ] ---[ end trace 0000000000000000 ]---

One way to trigger this is:
1. load a livepatch that patches kernel function xxx;
2. run bpftrace -e 'kfunc:xxx {}', this will fail (expected for now);
3. repeat #2 => gpf.

This is because the entry is added to direct_functions, but not removed.
Fix this by remove the entry from direct_functions when
register_ftrace_direct fails.

Also remove the last trailing space from ftrace.c, so we don't have to
worry about it anymore.

Link: https://lkml.kernel.org/r/20220524170839.900849-1-song@kernel.org

Cc: stable@vger.kernel.org
Fixes: 763e34e74bb7 ("ftrace: Add register_ftrace_direct()")
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>


# 50c69781 21-May-2022 Julia Lawall <Julia.Lawall@inria.fr>

ftrace: Fix typo in comment

Spelling mistake (triple letters) in comment.
Detected with the help of Coccinelle.

Link: https://lkml.kernel.org/r/20220521111145.81697-81-Julia.Lawall@inria.fr

Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>


# 3a2bfec0 17-May-2022 Li kunyu <kunyu@nfschina.com>

ftrace: Remove return value of ftrace_arch_modify_*()

All instances of the function ftrace_arch_modify_prepare() and
ftrace_arch_modify_post_process() return zero. There's no point in
checking their return value. Just have them be void functions.

Link: https://lkml.kernel.org/r/20220518023639.4065-1-kunyu@nfschina.com

Signed-off-by: Li kunyu <kunyu@nfschina.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>


# 2889c658 19-Jan-2022 Yuntao Wang <ytcoode@gmail.com>

ftrace: Deal with error return code of the ftrace_process_locs() function

The ftrace_process_locs() function may return -ENOMEM error code, which
should be handled by the callers.

Link: https://lkml.kernel.org/r/20220120065949.1813231-1-ytcoode@gmail.com

Signed-off-by: Yuntao Wang <ytcoode@gmail.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>


# e4931b82 14-Jan-2022 Yuntao Wang <ytcoode@gmail.com>

tracing: Use trace_create_file() to simplify creation of tracefs entries

Creating tracefs entries with tracefs_create_file() followed by pr_warn()
is tedious and repetitive, we can use trace_create_file() to simplify
this process and make the code more readable.

Link: https://lkml.kernel.org/r/20220114131052.534382-1-ytcoode@gmail.com

Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Yuntao Wang <ytcoode@gmail.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>


# ba27d855 27-Apr-2022 Jakob Koschel <jakobkoschel@gmail.com>

tracing: Remove check of list iterator against head past the loop body

When list_for_each_entry() completes the iteration over the whole list
without breaking the loop, the iterator value will be a bogus pointer
computed based on the head element.

While it is safe to use the pointer to determine if it was computed
based on the head element, either with list_entry_is_head() or
&pos->member == head, using the iterator variable after the loop should
be avoided.

In preparation to limit the scope of a list iterator to the list
traversal loop, use a dedicated pointer to point to the found element [1].

Link: https://lkml.kernel.org/r/20220427170734.819891-5-jakobkoschel@gmail.com

Cc: Ingo Molnar <mingo@redhat.com>
Link: https://lore.kernel.org/all/CAHk-=wgRr_D8CB-D9Kg-c=EHreAsk5SqXPwr9Y7k9sA6cWXJ6w@mail.gmail.com/
Signed-off-by: Jakob Koschel <jakobkoschel@gmail.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>


# 8fd7c214 21-Apr-2022 Luis Chamberlain <mcgrof@kernel.org>

ftrace: fix building with SYSCTL=y but DYNAMIC_FTRACE=n

Ok so hopefully this is the last of it. 0day picked up a build
failure [0] when SYSCTL=y but DYNAMIC_FTRACE=n. This can be fixed
by just declaring an empty routine for the calls moved just
recently.

[0] https://lkml.kernel.org/r/202204161203.6dSlgKJX-lkp@intel.com

Reported-by: kernel test robot <lkp@intel.com>
Fixes: f8b7d2b4c192 ("ftrace: fix building with SYSCTL=n but DYNAMIC_FTRACE=y")
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>


# f8b7d2b4 15-Apr-2022 Luis Chamberlain <mcgrof@kernel.org>

ftrace: fix building with SYSCTL=n but DYNAMIC_FTRACE=y

One can enable dyanmic tracing but disable sysctls.
When this is doen we get the compile kernel warning:

CC kernel/trace/ftrace.o
kernel/trace/ftrace.c:3086:13: warning: ‘ftrace_shutdown_sysctl’ defined
but not used [-Wunused-function]
3086 | static void ftrace_shutdown_sysctl(void)
| ^~~~~~~~~~~~~~~~~~~~~~
kernel/trace/ftrace.c:3068:13: warning: ‘ftrace_startup_sysctl’ defined
but not used [-Wunused-function]
3068 | static void ftrace_startup_sysctl(void)

When CONFIG_DYNAMIC_FTRACE=n the ftrace_startup_sysctl() and
routines ftrace_shutdown_sysctl() still compiles, so these
are actually more just used for when SYSCTL=y.

Fix this then by just moving these routines to when sysctls
are enabled.

Fixes: 7cde53da38a3 ("ftrace: move sysctl_ftrace_enabled to ftrace.c")
Reported-by: kernel test robot <lkp@intel.com>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>


# 5d79fa0d 07-Apr-2022 YueHaibing <yuehaibing@huawei.com>

ftrace: Fix build warning

If CONFIG_SYSCTL and CONFIG_DYNAMIC_FTRACE is n, build warns:

kernel/trace/ftrace.c:7912:13: error: ‘is_permanent_ops_registered’ defined but not used [-Werror=unused-function]
static bool is_permanent_ops_registered(void)
^~~~~~~~~~~~~~~~~~~~~~~~~~~
kernel/trace/ftrace.c:89:12: error: ‘last_ftrace_enabled’ defined but not used [-Werror=unused-variable]
static int last_ftrace_enabled;
^~~~~~~~~~~~~~~~~~~

Move is_permanent_ops_registered() to ifdef block and mark last_ftrace_enabled as
__maybe_unused to fix this.

Fixes: 7cde53da38a3 ("ftrace: move sysctl_ftrace_enabled to ftrace.c")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>


# 8e4e83b2 23-Feb-2022 Wei Xiao <xiaowei66@huawei.com>

ftrace: move sysctl_ftrace_enabled to ftrace.c

This moves ftrace_enabled to trace/ftrace.c.

We move sysctls to places where features actually belong to improve
the readability of the code and reduce the risk of code merge conflicts.
At the same time, the proc-sysctl maintainers do not want to know what
sysctl knobs you wish to add for your owner piece of code, we just care
about the core logic.

Signed-off-by: Wei Xiao <xiaowei66@huawei.com>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>


# bed0d9a5 10-May-2022 Jiri Olsa <jolsa@kernel.org>

ftrace: Add ftrace_lookup_symbols function

Adding ftrace_lookup_symbols function that resolves array of symbols
with single pass over kallsyms.

The user provides array of string pointers with count and pointer to
allocated array for resolved values.

int ftrace_lookup_symbols(const char **sorted_syms, size_t cnt,
unsigned long *addrs)

It iterates all kallsyms symbols and tries to loop up each in provided
symbols array with bsearch. The symbols array needs to be sorted by
name for this reason.

We also check each symbol to pass ftrace_location, because this API
will be used for fprobe symbols resolving.

Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20220510122616.2652285-3-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>


# 9c2136be 11-May-2022 Delyan Kratunov <delyank@fb.com>

sched/tracing: Append prev_state to tp args instead

Commit fa2c3254d7cf (sched/tracing: Don't re-read p->state when emitting
sched_switch event, 2022-01-20) added a new prev_state argument to the
sched_switch tracepoint, before the prev task_struct pointer.

This reordering of arguments broke BPF programs that use the raw
tracepoint (e.g. tp_btf programs). The type of the second argument has
changed and existing programs that assume a task_struct* argument
(e.g. for bpf_task_storage access) will now fail to verify.

If we instead append the new argument to the end, all existing programs
would continue to work and can conditionally extract the prev_state
argument on supported kernel versions.

Fixes: fa2c3254d7cf (sched/tracing: Don't re-read p->state when emitting sched_switch event, 2022-01-20)
Signed-off-by: Delyan Kratunov <delyank@fb.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Link: https://lkml.kernel.org/r/c8a6930dfdd58a4a5755fc01732675472979732b.camel@fb.com


# aebfd125 08-Mar-2022 Peter Zijlstra <peterz@infradead.org>

x86/ibt,ftrace: Search for __fentry__ location

Currently a lot of ftrace code assumes __fentry__ is at sym+0. However
with Intel IBT enabled the first instruction of a function will most
likely be ENDBR.

Change ftrace_location() to not only return the __fentry__ location
when called for the __fentry__ location, but also when called for the
sym+0 location.

Then audit/update all callsites of this function to consistently use
these new semantics.

Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lore.kernel.org/r/20220308154318.227581603@infradead.org


# 4f554e95 15-Mar-2022 Jiri Olsa <jolsa@redhat.com>

ftrace: Add ftrace_set_filter_ips function

Adding ftrace_set_filter_ips function to be able to set filter on
multiple ip addresses at once.

With the kprobe multi attach interface we have cases where we need to
initialize ftrace_ops object with thousands of functions, so having
single function diving into ftrace_hash_move_and_update_ops with
ftrace_lock is faster.

The functions ips are passed as unsigned long array with count.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Tested-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/164735282673.1084943.18310504594134769804.stgit@devnote2


# 380af29b 10-Mar-2022 Steven Rostedt (Google) <rostedt@goodmis.org>

tracing: Add snapshot at end of kernel boot up

Add ftrace_boot_snapshot kernel parameter that will take a snapshot at the
end of boot up just before switching over to user space (it happens during
the kernel freeing of init memory).

This is useful when there's interesting data that can be collected from
kernel start up, but gets overridden by user space start up code. With
this option, the ring buffer content from the boot up traces gets saved in
the snapshot at the end of boot up. This trace can be read from:

/sys/kernel/tracing/snapshot

Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>


# fa2c3254 20-Jan-2022 Valentin Schneider <valentin.schneider@arm.com>

sched/tracing: Don't re-read p->state when emitting sched_switch event

As of commit

c6e7bd7afaeb ("sched/core: Optimize ttwu() spinning on p->on_cpu")

the following sequence becomes possible:

p->__state = TASK_INTERRUPTIBLE;
__schedule()
deactivate_task(p);
ttwu()
READ !p->on_rq
p->__state=TASK_WAKING
trace_sched_switch()
__trace_sched_switch_state()
task_state_index()
return 0;

TASK_WAKING isn't in TASK_REPORT, so the task appears as TASK_RUNNING in
the trace event.

Prevent this by pushing the value read from __schedule() down the trace
event.

Reported-by: Abhijeet Dharmapurikar <adharmap@quicinc.com>
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Link: https://lore.kernel.org/r/20220120162520.570782-2-valentin.schneider@arm.com


# 78cbc651 06-Mar-2022 Jiapeng Chong <jiapeng.chong@linux.alibaba.com>

ftrace: Fix some W=1 warnings in kernel doc comments

Clean up the following clang-w1 warning:

kernel/trace/ftrace.c:7827: warning: Function parameter or member 'ops'
not described in 'unregister_ftrace_function'.

kernel/trace/ftrace.c:7805: warning: Function parameter or member 'ops'
not described in 'register_ftrace_function'.

Link: https://lkml.kernel.org/r/20220307004303.26399-1-jiapeng.chong@linux.alibaba.com

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>


# ab2f993c 14-Feb-2022 Nathan Chancellor <nathan@kernel.org>

ftrace: Remove unused ftrace_startup_enable() stub

When building with clang + CONFIG_DYNAMIC_FTRACE=n + W=1, there is a
warning:

kernel/trace/ftrace.c:7194:20: error: unused function 'ftrace_startup_enable' [-Werror,-Wunused-function]
static inline void ftrace_startup_enable(int command) { }
^
1 error generated.

Clang warns on instances of static inline functions in .c files with W=1
after commit 6863f5643dd7 ("kbuild: allow Clang to find unused static
inline functions for W=1 build").

The ftrace_startup_enable() stub has been unused since
commit e1effa0144a1 ("ftrace: Annotate the ops operation on update"),
where its use outside of the CONFIG_DYNAMIC_TRACE section was replaced
by ftrace_startup_all(). Remove it to resolve the warning.

Link: https://lkml.kernel.org/r/20220214192847.488166-1-nathan@kernel.org

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>


# 6b9b6413 22-Jan-2022 Steven Rostedt (Google) <rostedt@goodmis.org>

ftrace: Fix assuming build time sort works for s390

To speed up the boot process, as mcount_loc needs to be sorted for ftrace
to work properly, sorting it at build time is more efficient than boot up
and can save milliseconds of time. Unfortunately, this change broke s390
as it will modify the mcount_loc location after the sorting takes place
and will put back the unsorted locations. Since the sorting is skipped at
boot up if it is believed that it was sorted at run time, ftrace can crash
as its algorithms are dependent on the list being sorted.

Add a new config BUILDTIME_MCOUNT_SORT that is set when
BUILDTIME_TABLE_SORT but not if S390 is set. Use this config to determine
if sorting should take place at boot up.

Link: https://lore.kernel.org/all/yt9dee51ctfn.fsf@linux.ibm.com/

Fixes: 72b3942a173c ("scripts: ftrace - move the sort-processing in ftrace_init")
Reported-by: Sven Schnelle <svens@linux.ibm.com>
Tested-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>


# 8147dc78 06-Dec-2021 Steven Rostedt (VMware) <rostedt@goodmis.org>

ftrace: Add test to make sure compiled time sorts work

Now that ftrace function pointers are sorted at compile time, add a test
that makes sure they are sorted at run time. This test is only run if it is
configured in.

Link: https://lkml.kernel.org/r/20211206151858.4d21a24d@gandalf.local.home

Cc: Yinan Liu <yinan@linux.alibaba.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 72b3942a 12-Dec-2021 Yinan Liu <yinan@linux.alibaba.com>

scripts: ftrace - move the sort-processing in ftrace_init

When the kernel starts, the initialization of ftrace takes
up a portion of the time (approximately 6~8ms) to sort mcount
addresses. We can save this time by moving mcount-sorting to
compile time.

Link: https://lkml.kernel.org/r/20211212113358.34208-2-yinan@linux.alibaba.com

Signed-off-by: Yinan Liu <yinan@linux.alibaba.com>
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: kernel test robot <oliver.sang@intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# fea3ffa4 06-Dec-2021 Jiri Olsa <jolsa@redhat.com>

ftrace: Add cleanup to unregister_ftrace_direct_multi

Adding ops cleanup to unregister_ftrace_direct_multi,
so it can be reused in another register call.

Link: https://lkml.kernel.org/r/20211206182032.87248-3-jolsa@kernel.org

Cc: Ingo Molnar <mingo@redhat.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Fixes: f64dd4627ec6 ("ftrace: Add multi direct register/unregister interface")
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 7d5b7cad 06-Dec-2021 Jiri Olsa <jolsa@redhat.com>

ftrace: Use direct_ops hash in unregister_ftrace_direct

Now when we have *direct_multi interface the direct_functions
hash is no longer owned just by direct_ops. It's also used by
any other ftrace_ops passed to *direct_multi interface.

Thus to find out that we are unregistering the last function
from direct_ops, we need to check directly direct_ops's hash.

Link: https://lkml.kernel.org/r/20211206182032.87248-2-jolsa@kernel.org

Cc: Ingo Molnar <mingo@redhat.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Fixes: f64dd4627ec6 ("ftrace: Add multi direct register/unregister interface")
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 2e6e9058 08-Nov-2021 Jiri Olsa <jolsa@redhat.com>

ftrace/direct: Fix lockup in modify_ftrace_direct_multi

We can't call unregister_ftrace_function under ftrace_lock.

Link: https://lkml.kernel.org/r/20211109114217.1645296-1-jolsa@kernel.org

Fixes: ed29271894aa ("ftrace/direct: Do not disable when switching direct callers")
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# a20deb3a 08-Nov-2021 Kefeng Wang <wangkefeng.wang@huawei.com>

sections: move and rename core_kernel_data() to is_kernel_core_data()

Move core_kernel_data() into sections.h and rename it to
is_kernel_core_data(), also make it return bool value, then update all the
callers.

Link: https://lkml.kernel.org/r/20210930071143.63410-4-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Alexander Potapenko <glider@google.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# ce5e4803 26-Oct-2021 王贇 <yun.wang@linux.alibaba.com>

ftrace: disable preemption when recursion locked

As the documentation explained, ftrace_test_recursion_trylock()
and ftrace_test_recursion_unlock() were supposed to disable and
enable preemption properly, however currently this work is done
outside of the function, which could be missing by mistake.

And since the internal using of trace_test_and_set_recursion()
and trace_clear_recursion() also require preemption disabled, we
can just merge the logical.

This patch will make sure the preemption has been disabled when
trace_test_and_set_recursion() return bit >= 0, and
trace_clear_recursion() will enable the preemption if previously
enabled.

Link: https://lkml.kernel.org/r/13bde807-779c-aa4c-0672-20515ae365ea@linux.alibaba.com

CC: Petr Mladek <pmladek@suse.com>
Cc: Guo Ren <guoren@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Joe Lawrence <joe.lawrence@redhat.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Jisheng Zhang <jszhang@kernel.org>
CC: Steven Rostedt <rostedt@goodmis.org>
CC: Miroslav Benes <mbenes@suse.cz>
Reported-by: Abaci <abaci@linux.alibaba.com>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Michael Wang <yun.wang@linux.alibaba.com>
[ Removed extra line in comment - SDR ]
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 172f7ba9 19-Oct-2021 chongjiapeng <jiapeng.chong@linux.alibaba.com>

ftrace: Make ftrace_profile_pages_init static

This symbol is not used outside of ftrace.c, so marks it static.

Fixes the following sparse warning:

kernel/trace/ftrace.c:579:5: warning: symbol 'ftrace_profile_pages_init'
was not declared. Should it be static?

Link: https://lkml.kernel.org/r/1634640534-18280-1-git-send-email-jiapeng.chong@linux.alibaba.com

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Fixes: cafb168a1c92 ("tracing: make the function profiler per cpu")
Signed-off-by: chongjiapeng <jiapeng.chong@linux.alibaba.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# ed292718 14-Oct-2021 Steven Rostedt (VMware) <rostedt@goodmis.org>

ftrace/direct: Do not disable when switching direct callers

Currently to switch a set of "multi" direct trampolines from one
trampoline to another, a full shutdown of the current set needs to be
done, followed by an update to what trampoline the direct callers would
call, and then re-enabling the callers. This leaves a time when the
functions will not be calling anything, and events may be missed.

Instead, use a trick to allow all the functions with direct trampolines
attached will always call either the new or old trampoline while the
switch is happening. To do this, first attach a "dummy" callback via
ftrace to all the functions that the current direct trampoline is attached
to. This will cause the functions to call the "list func" instead of the
direct trampoline. The list function will call the direct trampoline
"helper" that will set the function it should call as it returns back to
the ftrace trampoline.

At this moment, the direct caller descriptor can safely update the direct
call trampoline. The list function will pick either the new or old
function (depending on the memory coherency model of the architecture).

Now removing the dummy function from each of the locations of the direct
trampoline caller, will put back the direct call, but now to the new
trampoline.

A better visual is:

[ Changing direct call from my_direct_1 to my_direct_2 ]

<traced_func>:
call my_direct_1

||||||||||||||||||||
vvvvvvvvvvvvvvvvvvvv

<traced_func>:
call ftrace_caller

<ftrace_caller>:
[..]
call ftrace_ops_list_func

ftrace_ops_list_func()
{
ops->func() -> direct_helper -> set rax to my_direct_1 or my_direct_2
}

call rax (to either my_direct_1 or my_direct_2

||||||||||||||||||||
vvvvvvvvvvvvvvvvvvvv

<traced_func>:
call my_direct_2

Link: https://lore.kernel.org/all/20211014162819.5c85618b@gandalf.local.home/

Acked-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# ccf5a89e 08-Oct-2021 Jiri Olsa <jolsa@redhat.com>

ftrace: Add multi direct modify interface

Adding interface to modify registered direct function
for ftrace_ops. Adding following function:

modify_ftrace_direct_multi(struct ftrace_ops *ops, unsigned long addr)

The function changes the currently registered direct
function for all attached functions.

Link: https://lkml.kernel.org/r/20211008091336.33616-8-jolsa@kernel.org

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# f64dd462 08-Oct-2021 Jiri Olsa <jolsa@redhat.com>

ftrace: Add multi direct register/unregister interface

Adding interface to register multiple direct functions
within single call. Adding following functions:

register_ftrace_direct_multi(struct ftrace_ops *ops, unsigned long addr)
unregister_ftrace_direct_multi(struct ftrace_ops *ops, unsigned long addr)

The register_ftrace_direct_multi registers direct function (addr)
with all functions in ops filter. The ops filter can be updated
before with ftrace_set_filter_ip calls.

All requested functions must not have direct function currently
registered, otherwise register_ftrace_direct_multi will fail.

The unregister_ftrace_direct_multi unregisters ops related direct
functions.

Link: https://lkml.kernel.org/r/20211008091336.33616-7-jolsa@kernel.org

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 1904a814 08-Oct-2021 Jiri Olsa <jolsa@redhat.com>

ftrace: Add ftrace_add_rec_direct function

Factor out the code that adds (ip, addr) tuple to direct_functions
hash in new ftrace_add_rec_direct function. It will be used in
following patches.

Link: https://lkml.kernel.org/r/20211008091336.33616-6-jolsa@kernel.org

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 34cdd18b 17-Jun-2020 Steven Rostedt (VMware) <rostedt@goodmis.org>

tracing: Use linker magic instead of recasting ftrace_ops_list_func()

In an effort to enable -Wcast-function-type in the top-level Makefile to
support Control Flow Integrity builds, all function casts need to be
removed.

This means that ftrace_ops_list_func() can no longer be defined as
ftrace_ops_no_ops(). The reason for ftrace_ops_no_ops() is to use that when
an architecture calls ftrace_ops_list_func() with only two parameters
(called from assembly). And to make sure there's no C side-effects, those
archs call ftrace_ops_no_ops() which only has two parameters, as
ftrace_ops_list_func() has four parameters.

Instead of a typecast, use vmlinux.lds.h to define ftrace_ops_list_func() to
arch_ftrace_ops_list_func() that will define the proper set of parameters.

Link: https://lore.kernel.org/r/20200614070154.6039-1-oscar.carter@gmx.com
Link: https://lkml.kernel.org/r/20200617165616.52241bde@oasis.local.home
Link: https://lore.kernel.org/all/20211005053922.GA702049@embeddedor/

Requested-by: Oscar Carter <oscar.carter@gmx.com>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 6644c654 09-Sep-2021 Weizhao Ouyang <o451686892@gmail.com>

ftrace: Cleanup ftrace_dyn_arch_init()

Most of ARCHs use empty ftrace_dyn_arch_init(), introduce a weak common
ftrace_dyn_arch_init() to cleanup them.

Link: https://lkml.kernel.org/r/20210909090216.1955240-1-o451686892@gmail.com

Acked-by: Heiko Carstens <hca@linux.ibm.com> (s390)
Acked-by: Helge Deller <deller@gmx.de> (parisc)
Signed-off-by: Weizhao Ouyang <o451686892@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 21ccc9cd 18-Aug-2021 Steven Rostedt (VMware) <rostedt@goodmis.org>

tracing: Disable "other" permission bits in the tracefs files

When building the files in the tracefs file system, do not by default set
any permissions for OTH (other). This will make it easier for admins who
want to define a group for accessing tracefs and not having to first
disable all the permission bits for "other" in the file system.

As tracing can leak sensitive information, it should never by default
allowing all users access. An admin can still set the permission bits for
others to have access, which may be useful for creating a honeypot and
seeing who takes advantage of it and roots the machine.

Link: https://lkml.kernel.org/r/20210818153038.864149276@goodmis.org

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 6954e415 23-Sep-2021 Steven Rostedt (VMware) <rostedt@goodmis.org>

tracing: Place trace_pid_list logic into abstract functions

Instead of having the logic that does trace_pid_list open coded, wrap it in
abstract functions. This will allow a rewrite of the logic that implements
the trace_pid_list without affecting the users.

Note, this causes a change in behavior. Every time a pid is written into
the set_*_pid file, it creates a new list and uses RCU to update it. If
pid_max is lowered, but there was a pid currently in the list that was
higher than pid_max, those pids will now be removed on updating the list.
The old behavior kept that from happening.

The rewrite of the pid_list logic will no longer depend on pid_max,
and will return the old behavior.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 6130722f 29-Oct-2021 Steven Rostedt (VMware) <rostedt@goodmis.org>

ftrace: Fix kernel-doc formatting issues

Some functions had kernel-doc that used a comma instead of a hash to
separate the function name from the one line description.

Also, the "ftrace_is_dead()" had an incomplete description.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# ed65df63 18-Oct-2021 Steven Rostedt (VMware) <rostedt@goodmis.org>

tracing: Have all levels of checks prevent recursion

While writing an email explaining the "bit = 0" logic for a discussion on
making ftrace_test_recursion_trylock() disable preemption, I discovered a
path that makes the "not do the logic if bit is zero" unsafe.

The recursion logic is done in hot paths like the function tracer. Thus,
any code executed causes noticeable overhead. Thus, tricks are done to try
to limit the amount of code executed. This included the recursion testing
logic.

Having recursion testing is important, as there are many paths that can
end up in an infinite recursion cycle when tracing every function in the
kernel. Thus protection is needed to prevent that from happening.

Because it is OK to recurse due to different running context levels (e.g.
an interrupt preempts a trace, and then a trace occurs in the interrupt
handler), a set of bits are used to know which context one is in (normal,
softirq, irq and NMI). If a recursion occurs in the same level, it is
prevented*.

Then there are infrastructure levels of recursion as well. When more than
one callback is attached to the same function to trace, it calls a loop
function to iterate over all the callbacks. Both the callbacks and the
loop function have recursion protection. The callbacks use the
"ftrace_test_recursion_trylock()" which has a "function" set of context
bits to test, and the loop function calls the internal
trace_test_and_set_recursion() directly, with an "internal" set of bits.

If an architecture does not implement all the features supported by ftrace
then the callbacks are never called directly, and the loop function is
called instead, which will implement the features of ftrace.

Since both the loop function and the callbacks do recursion protection, it
was seemed unnecessary to do it in both locations. Thus, a trick was made
to have the internal set of recursion bits at a more significant bit
location than the function bits. Then, if any of the higher bits were set,
the logic of the function bits could be skipped, as any new recursion
would first have to go through the loop function.

This is true for architectures that do not support all the ftrace
features, because all functions being traced must first go through the
loop function before going to the callbacks. But this is not true for
architectures that support all the ftrace features. That's because the
loop function could be called due to two callbacks attached to the same
function, but then a recursion function inside the callback could be
called that does not share any other callback, and it will be called
directly.

i.e.

traced_function_1: [ more than one callback tracing it ]
call loop_func

loop_func:
trace_recursion set internal bit
call callback

callback:
trace_recursion [ skipped because internal bit is set, return 0 ]
call traced_function_2

traced_function_2: [ only traced by above callback ]
call callback

callback:
trace_recursion [ skipped because internal bit is set, return 0 ]
call traced_function_2

[ wash, rinse, repeat, BOOM! out of shampoo! ]

Thus, the "bit == 0 skip" trick is not safe, unless the loop function is
call for all functions.

Since we want to encourage architectures to implement all ftrace features,
having them slow down due to this extra logic may encourage the
maintainers to update to the latest ftrace features. And because this
logic is only safe for them, remove it completely.

[*] There is on layer of recursion that is allowed, and that is to allow
for the transition between interrupt context (normal -> softirq ->
irq -> NMI), because a trace may occur before the context update is
visible to the trace recursion logic.

Link: https://lore.kernel.org/all/609b565a-ed6e-a1da-f025-166691b5d994@linux.alibaba.com/
Link: https://lkml.kernel.org/r/20211018154412.09fcad3c@gandalf.local.home

Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "James E.J. Bottomley" <James.Bottomley@hansenpartnership.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Miroslav Benes <mbenes@suse.cz>
Cc: Joe Lawrence <joe.lawrence@redhat.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Jisheng Zhang <jszhang@kernel.org>
Cc: =?utf-8?b?546L6LSH?= <yun.wang@linux.alibaba.com>
Cc: Guo Ren <guoren@kernel.org>
Cc: stable@vger.kernel.org
Fixes: edc15cafcbfa3 ("tracing: Avoid unnecessary multiple recursion checks")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 67ccddf8 28-Jul-2021 Ilya Leoshkevich <iii@linux.ibm.com>

ftrace: Introduce ftrace_need_init_nop()

Implementing live patching on s390 requires each function's prologue to
contain a very special kind of nop, which gcc and clang don't generate.
However, the current code assumes that if CC_USING_NOP_MCOUNT is
defined, then whatever the compiler generates is good enough.

Move the CC_USING_NOP_MCOUNT check into the new ftrace_need_init_nop()
macro, that the architectures can override.

An alternative solution is to disable using -mnop-mcount in the
Makefile, however, this makes the build logic (even) more complicated
and forces the arch-specific code to deal with the useless __fentry__
symbol.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Link: https://lore.kernel.org/r/20210728212546.128248-2-iii@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>


# 3b1a8f45 21-Jul-2021 Colin Ian King <colin.king@canonical.com>

ftrace: Remove redundant initialization of variable ret

The variable ret is being initialized with a value that is never
read, it is being updated later on. The assignment is redundant and
can be removed.

Link: https://lkml.kernel.org/r/20210721120915.122278-1-colin.king@canonical.com

Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 68e83498 21-Jul-2021 Nicolas Saenz Julienne <nsaenzju@redhat.com>

ftrace: Avoid synchronize_rcu_tasks_rude() call when not necessary

synchronize_rcu_tasks_rude() triggers IPIs and forces rescheduling on
all CPUs. It is a costly operation and, when targeting nohz_full CPUs,
very disrupting (hence the name). So avoid calling it when 'old_hash'
doesn't need to be freed.

Link: https://lkml.kernel.org/r/20210721114726.1545103-1-nsaenzju@redhat.com

Signed-off-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 3ecda644 07-Jun-2021 Baokun Li <libaokun1@huawei.com>

ftrace: Use list_move instead of list_del/list_add

Using list_move() instead of list_del() + list_add().

Link: https://lkml.kernel.org/r/20210608031108.2820996-1-libaokun1@huawei.com

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 6c14133d 07-Jun-2021 Steven Rostedt (VMware) <rostedt@goodmis.org>

ftrace: Do not blindly read the ip address in ftrace_bug()

It was reported that a bug on arm64 caused a bad ip address to be used for
updating into a nop in ftrace_init(), but the error path (rightfully)
returned -EINVAL and not -EFAULT, as the bug caused more than one error to
occur. But because -EINVAL was returned, the ftrace_bug() tried to report
what was at the location of the ip address, and read it directly. This
caused the machine to panic, as the ip was not pointing to a valid memory
address.

Instead, read the ip address with copy_from_kernel_nofault() to safely
access the memory, and if it faults, report that the address faulted,
otherwise report what was in that location.

Link: https://lore.kernel.org/lkml/20210607032329.28671-1-mark-pk.tsai@mediatek.com/

Cc: stable@vger.kernel.org
Fixes: 05736a427f7e1 ("ftrace: warn on failure to disable mcount callers")
Reported-by: Mark-PK Tsai <mark-pk.tsai@mediatek.com>
Tested-by: Mark-PK Tsai <mark-pk.tsai@mediatek.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 8c9af478 05-May-2021 Steven Rostedt (VMware) <rostedt@goodmis.org>

ftrace: Handle commands when closing set_ftrace_filter file

# echo switch_mm:traceoff > /sys/kernel/tracing/set_ftrace_filter

will cause switch_mm to stop tracing by the traceoff command.

# echo -n switch_mm:traceoff > /sys/kernel/tracing/set_ftrace_filter

does nothing.

The reason is that the parsing in the write function only processes
commands if it finished parsing (there is white space written after the
command). That's to handle:

write(fd, "switch_mm:", 10);
write(fd, "traceoff", 8);

cases, where the command is broken over multiple writes.

The problem is if the file descriptor is closed, then the write call is
not processed, and the command needs to be processed in the release code.
The release code can handle matching of functions, but does not handle
commands.

Cc: stable@vger.kernel.org
Fixes: eda1e32855656 ("tracing: handle broken names in ftrace filter")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# ceaaa129 01-Apr-2021 Steven Rostedt (VMware) <rostedt@goodmis.org>

ftrace: Simplify the calculation of page number for ftrace_page->records some more

Commit b40c6eabfcd40 ("ftrace: Simplify the calculation of page number for
ftrace_page->records") simplified the calculation of the number of pages
needed for each page group without having any empty pages, but it can be
simplified even further.

Link: https://lore.kernel.org/lkml/CAHk-=wjt9b7kxQ2J=aDNKbR1QBMB3Hiqb_hYcZbKsxGRSEb+gQ@mail.gmail.com/

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# db42523b 01-Apr-2021 Linus Torvalds <torvalds@linux-foundation.org>

ftrace: Store the order of pages allocated in ftrace_page

Instead of saving the size of the records field of the ftrace_page, store
the order it uses to allocate the pages, as that is what is needed to know
in order to free the pages. This simplifies the code.

Link: https://lore.kernel.org/lkml/CAHk-=whyMxheOqXAORt9a7JK9gc9eHTgCJ55Pgs4p=X3RrQubQ@mail.gmail.com/

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[ change log written by Steven Rostedt ]
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# f2cc020d 23-Mar-2021 Ingo Molnar <mingo@kernel.org>

tracing: Fix various typos in comments

Fix ~59 single-word typos in the tracing code comments, and fix
the grammar in a handful of places.

Link: https://lore.kernel.org/r/20210322224546.GA1981273@gmail.com
Link: https://lkml.kernel.org/r/20210323174935.GA4176821@gmail.com

Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 59300b36 30-Mar-2021 Steven Rostedt (VMware) <rostedt@goodmis.org>

ftrace: Check if pages were allocated before calling free_pages()

It is possible that on error pg->size can be zero when getting its order,
which would return a -1 value. It is dangerous to pass in an order of -1
to free_pages(). Check if order is greater than or equal to zero before
calling free_pages().

Link: https://lore.kernel.org/lkml/20210330093916.432697c7@gandalf.local.home/

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 8a141dd7 16-Mar-2021 Alexei Starovoitov <ast@kernel.org>

ftrace: Fix modify_ftrace_direct.

The following sequence of commands:
register_ftrace_direct(ip, addr1);
modify_ftrace_direct(ip, addr1, addr2);
unregister_ftrace_direct(ip, addr2);
will cause the kernel to warn:
[ 30.179191] WARNING: CPU: 2 PID: 1961 at kernel/trace/ftrace.c:5223 unregister_ftrace_direct+0x130/0x150
[ 30.180556] CPU: 2 PID: 1961 Comm: test_progs W O 5.12.0-rc2-00378-g86bc10a0a711-dirty #3246
[ 30.182453] RIP: 0010:unregister_ftrace_direct+0x130/0x150

When modify_ftrace_direct() changes the addr from old to new it should update
the addr stored in ftrace_direct_funcs. Otherwise the final
unregister_ftrace_direct() won't find the address and will cause the splat.

Fixes: 0567d6809182 ("ftrace: Add modify_ftrace_direct()")
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Link: https://lore.kernel.org/bpf/20210316195815.34714-1-alexei.starovoitov@gmail.com


# 4c75b0ff 26-Nov-2020 Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>

ftrace: Fix updating FTRACE_FL_TRAMP

On powerpc, kprobe-direct.tc triggered FTRACE_WARN_ON() in
ftrace_get_addr_new() followed by the below message:
Bad trampoline accounting at: 000000004222522f (wake_up_process+0xc/0x20) (f0000001)

The set of steps leading to this involved:
- modprobe ftrace-direct-too
- enable_probe
- modprobe ftrace-direct
- rmmod ftrace-direct <-- trigger

The problem turned out to be that we were not updating flags in the
ftrace record properly. From the above message about the trampoline
accounting being bad, it can be seen that the ftrace record still has
FTRACE_FL_TRAMP set though ftrace-direct module is going away. This
happens because we are checking if any ftrace_ops has the
FTRACE_FL_TRAMP flag set _before_ updating the filter hash.

The fix for this is to look for any _other_ ftrace_ops that also needs
FTRACE_FL_TRAMP.

Link: https://lkml.kernel.org/r/56c113aa9c3e10c19144a36d9684c7882bf09af5.1606412433.git.naveen.n.rao@linux.vnet.ibm.com

Cc: stable@vger.kernel.org
Fixes: a124692b698b0 ("ftrace: Enable trampoline when rec count returns back to one")
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# d19ad077 28-Oct-2020 Steven Rostedt (VMware) <rostedt@goodmis.org>

ftrace: Have the callbacks receive a struct ftrace_regs instead of pt_regs

In preparation to have arguments of a function passed to callbacks attached
to functions as default, change the default callback prototype to receive a
struct ftrace_regs as the forth parameter instead of a pt_regs.

For callbacks that set the FL_SAVE_REGS flag in their ftrace_ops flags, they
will now need to get the pt_regs via a ftrace_get_regs() helper call. If
this is called by a callback that their ftrace_ops did not have a
FL_SAVE_REGS flag set, it that helper function will return NULL.

This will allow the ftrace_regs to hold enough just to get the parameters
and stack pointer, but without the worry that callbacks may have a pt_regs
that is not completely filled.

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 045e269c 06-Nov-2020 Alex Shi <alex.shi@linux.alibaba.com>

ftrace: Remove unused varible 'ret'

'ret' in 2 functions are not used. and one of them is a void function.
So remove them to avoid gcc warning:
kernel/trace/ftrace.c:4166:6: warning: variable ‘ret’ set but not used
[-Wunused-but-set-variable]
kernel/trace/ftrace.c:5571:6: warning: variable ‘ret’ set but not used
[-Wunused-but-set-variable]

Link: https://lkml.kernel.org/r/1604674486-52350-1-git-send-email-alex.shi@linux.alibaba.com

Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 773c1670 05-Nov-2020 Steven Rostedt (VMware) <rostedt@goodmis.org>

ftrace: Add recording of functions that caused recursion

This adds CONFIG_FTRACE_RECORD_RECURSION that will record to a file
"recursed_functions" all the functions that caused recursion while a
callback to the function tracer was running.

Link: https://lkml.kernel.org/r/20201106023548.102375687@goodmis.org

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Guo Ren <guoren@kernel.org>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: x86@kernel.org
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Anton Vorontsov <anton@enomsg.org>
Cc: Colin Cross <ccross@android.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Miroslav Benes <mbenes@suse.cz>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Joe Lawrence <joe.lawrence@redhat.com>
Cc: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: linux-doc@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-csky@vger.kernel.org
Cc: linux-parisc@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-s390@vger.kernel.org
Cc: live-patching@vger.kernel.org
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# a25d036d 05-Nov-2020 Steven Rostedt (VMware) <rostedt@goodmis.org>

ftrace: Reverse what the RECURSION flag means in the ftrace_ops

Now that all callbacks are recursion safe, reverse the meaning of the
RECURSION flag and rename it from RECURSION_SAFE to simply RECURSION.
Now only callbacks that request to have recursion protecting it will
have the added trampoline to do so.

Also remove the outdated comment about "PER_CPU" when determining to
use the ftrace_ops_assist_func.

Link: https://lkml.kernel.org/r/20201028115613.742454631@goodmis.org
Link: https://lkml.kernel.org/r/20201106023547.904270143@goodmis.org

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Miroslav Benes <mbenes@suse.cz>
Cc: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: linux-doc@vger.kernel.org
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 7ba031e8 05-Oct-2020 Steven Rostedt (VMware) <rostedt@goodmis.org>

ftrace: Format variable declarations of ftrace_allocate_records

I hate when unrelated variables are declared on the same line.
Split them.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# b40c6eab 30-Aug-2020 Wei Yang <richard.weiyang@linux.alibaba.com>

ftrace: Simplify the calculation of page number for ftrace_page->records

Based on the following two reasones, we could simplify the calculation:

- If the number after roundup count is not power of 2, we would
definitely have more than 1 empty page with a higher order.
- get_count_order() just return current order, so one lower order
could meet the requirement.

The calculation could be simplified by lower one order level when pages
are not power of 2.

Link: https://lkml.kernel.org/r/20200831031104.23322-5-richard.weiyang@linux.alibaba.com

Signed-off-by: Wei Yang <richard.weiyang@linux.alibaba.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# be493132 05-Oct-2020 Steven Rostedt (VMware) <rostedt@goodmis.org>

ftrace: Simplify the hash calculation

No need to add a check to subtract the number of bits if bits is zero after
fls(). Just divide the size by two before calling it. This does give the
same answer for size of 0 and 1, but that's fine.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 59e65b33 30-Aug-2020 Wei Yang <richard.weiyang@linux.alibaba.com>

ftrace: Use fls() to get the bits for dup_hash()

The effect here is to get the number of bits, lets use fls() to do
this job.

Link: https://lkml.kernel.org/r/20200831031104.23322-3-richard.weiyang@linux.alibaba.com

Signed-off-by: Wei Yang <richard.weiyang@linux.alibaba.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# fdda88d3 02-Oct-2020 Qiujun Huang <hqjagain@gmail.com>

ftrace: Fix some typos in comment

s/coorditate/coordinate/
s/emty/empty/
s/preeptive/preemptive/
s/succes/success/
s/carefule/careful/

Link: https://lkml.kernel.org/r/20201002143126.2890-1-hqjagain@gmail.com

Signed-off-by: Qiujun Huang <hqjagain@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# b40341fa 28-Sep-2020 Steven Rostedt (VMware) <rostedt@goodmis.org>

ftrace: Move RCU is watching check after recursion check

The first thing that the ftrace function callback helper functions should do
is to check for recursion. Peter Zijlstra found that when
"rcu_is_watching()" had its notrace removed, it caused perf function tracing
to crash. This is because the call of rcu_is_watching() is tested before
function recursion is checked and and if it is traced, it will cause an
infinite recursion loop.

rcu_is_watching() should still stay notrace, but to prevent this should
never had crashed in the first place. The recursion prevention must be the
first thing done in callback functions.

Link: https://lore.kernel.org/r/20200929112541.GM2628@hirez.programming.kicks-ass.net

Cc: stable@vger.kernel.org
Cc: Paul McKenney <paulmck@kernel.org>
Fixes: c68c0fa293417 ("ftrace: Have ftrace_ops_get_func() handle RCU and PER_CPU flags too")
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reported-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 5c8c206e 06-Aug-2020 Randy Dunlap <rdunlap@infradead.org>

tracing: Delete repeated words in comments

Drop repeated words in kernel/trace/.
{and, the, not}

Link: https://lkml.kernel.org/r/20200807033259.13778-1-rdunlap@infradead.org

Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 7bb82ac3 18-Sep-2020 Tobias Klauser <tklauser@distanz.ch>

ftrace: let ftrace_enable_sysctl take a kernel pointer buffer

Commit 32927393dc1c ("sysctl: pass kernel pointers to ->proc_handler")
changed ctl_table.proc_handler to take a kernel pointer. Adjust the
signature of ftrace_enable_sysctl to match ctl_table.proc_handler which
fixes the following sparse warning:

kernel/trace/ftrace.c:7544:43: warning: incorrect type in argument 3 (different address spaces)
kernel/trace/ftrace.c:7544:43: expected void *
kernel/trace/ftrace.c:7544:43: got void [noderef] __user *buffer

Fixes: 32927393dc1c ("sysctl: pass kernel pointers to ->proc_handler")
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Link: https://lkml.kernel.org/r/20200907093207.13540-1-tklauser@distanz.ch
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 54fa9ba5 07-Sep-2020 Tobias Klauser <tklauser@distanz.ch>

ftrace: Let ftrace_enable_sysctl take a kernel pointer buffer

Commit 32927393dc1c ("sysctl: pass kernel pointers to ->proc_handler")
changed ctl_table.proc_handler to take a kernel pointer. Adjust the
signature of ftrace_enable_sysctl to match ctl_table.proc_handler which
fixes the following sparse warning:

kernel/trace/ftrace.c:7544:43: warning: incorrect type in argument 3 (different address spaces)
kernel/trace/ftrace.c:7544:43: expected void *
kernel/trace/ftrace.c:7544:43: got void [noderef] __user *buffer

Link: https://lkml.kernel.org/r/20200907093207.13540-1-tklauser@distanz.ch

Fixes: 32927393dc1c ("sysctl: pass kernel pointers to ->proc_handler")
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 478ece95 31-Aug-2020 Adrian Hunter <adrian.hunter@intel.com>

ftrace: Fix missing synchronize_rcu() removing trampoline from kallsyms

Add synchronize_rcu() after list_del_rcu() in
ftrace_remove_trampoline_from_kallsyms() to protect readers of
ftrace_ops_trampoline_list (in ftrace_get_trampoline_kallsym)
which is used when kallsyms is read.

Link: https://lkml.kernel.org/r/20200901091617.31837-1-adrian.hunter@intel.com

Fixes: fc0ea795f53c8d ("ftrace: Add symbols for ftrace trampolines")
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# d5e47505 31-Aug-2020 Miroslav Benes <mbenes@suse.cz>

ftrace: Free the trampoline when ftrace_startup() fails

Commit fc0ea795f53c ("ftrace: Add symbols for ftrace trampolines")
missed to remove ops from new ftrace_ops_trampoline_list in
ftrace_startup() if ftrace_hash_ipmodify_enable() fails there. It may
lead to BUG if such ops come from a module which may be removed.

Moreover, the trampoline itself is not freed in this case.

Fix it by calling ftrace_trampoline_free() during the rollback.

Link: https://lkml.kernel.org/r/20200831122631.28057-1-mbenes@suse.cz

Fixes: fc0ea795f53c ("ftrace: Add symbols for ftrace trampolines")
Fixes: f8b8be8a310a ("ftrace, kprobes: Support IPMODIFY flag to find IP modify conflict")
Signed-off-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# afcab636 04-Aug-2020 Steven Rostedt (VMware) <rostedt@goodmis.org>

tracing: Use trace_sched_process_free() instead of exit() for pid tracing

On exit, if a process is preempted after the trace_sched_process_exit()
tracepoint but before the process is done exiting, then when it gets
scheduled in, the function tracers will not filter it properly against the
function tracing pid filters.

That is because the function tracing pid filters hooks to the
sched_process_exit() tracepoint to remove the exiting task's pid from the
filter list. Because the filtering happens at the sched_switch tracepoint,
when the exiting task schedules back in to finish up the exit, it will no
longer be in the function pid filtering tables.

This was noticeable in the notrace self tests on a preemptable kernel, as
the tests would fail as it exits and preempted after being taken off the
notrace filter table and on scheduling back in it would not be in the
notrace list, and then the ending of the exit function would trace. The test
detected this and would fail.

Cc: stable@vger.kernel.org
Cc: Namhyung Kim <namhyung@kernel.org>
Fixes: 1e10486ffee0a ("ftrace: Add 'function-fork' trace option")
Fixes: c37775d57830a ("tracing: Add infrastructure to allow set_event_pid to follow children"
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# c58b6b03 24-Jul-2020 Josef Bacik <josef@toxicpanda.com>

ftrace: Fix ftrace_trace_task return value

I was attempting to use pid filtering with function_graph, but it wasn't
allowing anything to make it through. Turns out ftrace_trace_task
returns false if ftrace_ignore_pid is not-empty, which isn't correct
anymore. We're now setting it to FTRACE_PID_IGNORE if we need to ignore
that pid, otherwise it's set to the pid (which is weird considering the
name) or to FTRACE_PID_TRACE. Fix the check to check for !=
FTRACE_PID_IGNORE. With this we can now use function_graph with pid
filtering.

Link: https://lkml.kernel.org/r/20200725005048.1790-1-josef@toxicpanda.com

Fixes: 717e3f5ebc82 ("ftrace: Make function trace pid filtering a bit more exact")
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# c5f51572 28-Jul-2020 Chengming Zhou <zhouchengming@bytedance.com>

ftrace: Do not let direct or IPMODIFY ftrace_ops be added to module and set trampolines

When inserting a module, we find all ftrace_ops referencing it on the
ftrace_ops_list. But FTRACE_OPS_FL_DIRECT and FTRACE_OPS_FL_IPMODIFY
flags are special, and should not be set automatically. So warn and
skip ftrace_ops that have these two flags set and adding new code.
Also check if only one ftrace_ops references the module, in which case
we can use a trampoline as an optimization.

Link: https://lkml.kernel.org/r/20200728180554.65203-2-zhouchengming@bytedance.com

Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 8a224ffb 28-Jul-2020 Chengming Zhou <zhouchengming@bytedance.com>

ftrace: Setup correct FTRACE_FL_REGS flags for module

When module loaded and enabled, we will use __ftrace_replace_code
for module if any ftrace_ops referenced it found. But we will get
wrong ftrace_addr for module rec in ftrace_get_addr_new, because
rec->flags has not been setup correctly. It can cause the callback
function of a ftrace_ops has FTRACE_OPS_FL_SAVE_REGS to be called
with pt_regs set to NULL.
So setup correct FTRACE_FL_REGS flags for rec when we call
referenced_filters to find ftrace_ops references it.

Link: https://lkml.kernel.org/r/20200728180554.65203-1-zhouchengming@bytedance.com

Cc: stable@vger.kernel.org
Fixes: 8c4f3c3fa9681 ("ftrace: Check module functions being traced on reload")
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 5da7cd11 21-Apr-2020 Steven Rostedt (VMware) <rostedt@goodmis.org>

x86/ftrace: Only have the builtin ftrace_regs_caller call direct hooks

If a direct hook is attached to a function that ftrace also has a function
attached to it, then it is required that the ftrace_ops_list_func() is used
to iterate over the registered ftrace callbacks. This will also include the
direct ftrace_ops helper, that tells ftrace_regs_caller where to return to
(the direct callback and not the function that called it).

As this direct helper is only to handle the case of ftrace callbacks
attached to the same function as the direct callback, the ftrace callback
allocated trampolines (used to only call them), should never be used to
return back to a direct callback.

Only copy the portion of the ftrace_regs_caller that will return back to
what called it, and not the portion that returns back to the direct caller.

The direct ftrace_ops must then pick the ftrace_regs_caller builtin function
as its own trampoline to ensure that it will never have one allocated for
it (which would not include the handling of direct callbacks).

Link: http://lkml.kernel.org/r/20200422162750.495903799@goodmis.org

Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 026bb845 29-May-2020 Kaitao Cheng <pilgrimtao@gmail.com>

ftrace: Fix maybe-uninitialized compiler warning

During build compiler reports some 'false positive' warnings about
variables {'seq_ops', 'filtered_pids', 'other_pids'} may be used
uninitialized. This patch silences these warnings.
Also delete some useless spaces

Link: https://lkml.kernel.org/r/20200529141214.37648-1-pilgrimtao@gmail.com

Signed-off-by: Kaitao Cheng <pilgrimtao@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 548e1f6c 12-May-2020 Adrian Hunter <adrian.hunter@intel.com>

ftrace: Add perf text poke events for ftrace trampolines

Add perf text poke events for ftrace trampolines when created and when
freed.

There can be 3 text_poke events for ftrace trampolines:

1. NULL -> trampoline
By ftrace_update_trampoline() when !ops->trampoline
Trampoline created

2. [e.g. on x86] CALL rel32 -> CALL rel32
By arch_ftrace_update_trampoline() when ops->trampoline and
ops->flags & FTRACE_OPS_FL_ALLOC_TRAMP
[e.g. on x86] via text_poke_bp() which generates text poke events
Trampoline-called function target updated

3. trampoline -> NULL
By ftrace_trampoline_free() when ops->trampoline and
ops->flags & FTRACE_OPS_FL_ALLOC_TRAMP
Trampoline freed

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200512121922.8997-9-adrian.hunter@intel.com


# dd9ddf46 12-May-2020 Adrian Hunter <adrian.hunter@intel.com>

ftrace: Add perf ksymbol events for ftrace trampolines

Symbols are needed for tools to describe instruction addresses. Pages
allocated for ftrace's purposes need symbols to be created for them.
Add such symbols to be visible via perf ksymbol events.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200512121922.8997-8-adrian.hunter@intel.com


# fc0ea795 12-May-2020 Adrian Hunter <adrian.hunter@intel.com>

ftrace: Add symbols for ftrace trampolines

Symbols are needed for tools to describe instruction addresses. Pages
allocated for ftrace's purposes need symbols to be created for them.
Add such symbols to be visible via /proc/kallsyms.

Example on x86 with CONFIG_DYNAMIC_FTRACE=y

# echo function > /sys/kernel/debug/tracing/current_tracer
# cat /proc/kallsyms | grep '\[__builtin__ftrace\]'
ffffffffc0238000 t ftrace_trampoline [__builtin__ftrace]

Note: This patch adds "__builtin__ftrace" as a module name in /proc/kallsyms for
symbols for pages allocated for ftrace's purposes, even though "__builtin__ftrace"
is not a module.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200512121922.8997-7-adrian.hunter@intel.com


# 2062a4e8 08-Jun-2020 Dmitry Safonov <0x7f454c46@gmail.com>

kallsyms/printk: add loglvl to print_ip_sym()

Patch series "Add log level to show_stack()", v3.

Add log level argument to show_stack().

Done in three stages:
1. Introducing show_stack_loglvl() for every architecture
2. Migrating old users with an explicit log level
3. Renaming show_stack_loglvl() into show_stack()

Justification:

- It's a design mistake to move a business-logic decision into platform
realization detail.

- I have currently two patches sets that would benefit from this work:
Removing console_loglevel jumps in sysrq driver [1] Hung task warning
before panic [2] - suggested by Tetsuo (but he probably didn't realise
what it would involve).

- While doing (1), (2) the backtraces were adjusted to headers and other
messages for each situation - so there won't be a situation when the
backtrace is printed, but the headers are missing because they have
lesser log level (or the reverse).

- As the result in (2) plays with console_loglevel for kdb are removed.

The least important for upstream, but maybe still worth to note that every
company I've worked in so far had an off-list patch to print backtrace
with the needed log level (but only for the architecture they cared
about). If you have other ideas how you will benefit from show_stack()
with a log level - please, reply to this cover letter.

See also discussion on v1:
https://lore.kernel.org/linux-riscv/20191106083538.z5nlpuf64cigxigh@pathway.suse.cz/

This patch (of 50):

print_ip_sym() needs to have a log level parameter to comply with other
parts being printed. Otherwise, half of the expected backtrace would be
printed and other may be missing with some logging level.

The following callee(s) are using now the adjusted log level:
- microblaze/unwind: the same level as headers & userspace unwind.
Note that pr_debug()'s there are for debugging the unwinder itself.
- nds32/traps: symbol addresses are printed with the same log level
as backtrace headers.
- lockdep: ip for locking issues is printed with the same log level
as other part of the warning.
- sched: ip where preemption was disabled is printed as error like
the rest part of the message.
- ftrace: bug reports are now consistent in the log level being used.

Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Ben Segall <bsegall@google.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Hogan <jhogan@kernel.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Burton <paulburton@kernel.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vincent Chen <deanbo422@gmail.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Will Deacon <will@kernel.org>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: Dmitry Safonov <dima@arista.com>
Cc: Jiri Slaby <jslaby@suse.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Aurelien Jacquiot <jacquiot.aurelien@gmail.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Guo Ren <guoren@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Brian Cain <bcain@codeaurora.org>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Ley Foon Tan <lftan@altera.com>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
Cc: Helge Deller <deller@gmx.de>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Rich Felker <dalias@libc.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Daniel Thompson <daniel.thompson@linaro.org>
Cc: Douglas Anderson <dianders@chromium.org>
Cc: Jason Wessel <jason.wessel@windriver.com>
Link: http://lkml.kernel.org/r/20200418201944.482088-2-dima@arista.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# c143b775 15-May-2020 Cheng Jian <cj.chengjian@huawei.com>

ftrace: show debugging information when panic_on_warn set

When an anomaly is detected in the function call modification
code, ftrace_bug() is called to disable function tracing as well
as give some warn and information that may help debug the problem.

But currently, we call FTRACE_WARN_ON_ONCE() first in ftrace_bug(),
so when panic_on_warn is set, we can't see the debugging information
here. Call FTRACE_WARN_ON_ONCE() at the end of ftrace_bug() to ensure
that the debugging information is displayed first.

after this patch, the dmesg looks like:

------------[ ftrace bug ]------------
ftrace failed to modify
[<ffff800010081004>] bcm2835_handle_irq+0x4/0x58
actual: 1f:20:03:d5
Setting ftrace call site to call ftrace function
ftrace record flags: 80000001
(1)
expected tramp: ffff80001009d6f0
------------[ cut here ]------------
WARNING: CPU: 2 PID: 1635 at kernel/trace/ftrace.c:2078 ftrace_bug+0x204/0x238
Kernel panic - not syncing: panic_on_warn set ...
CPU: 2 PID: 1635 Comm: sh Not tainted 5.7.0-rc5-00033-gb922183867f5 #14
Hardware name: linux,dummy-virt (DT)
Call trace:
dump_backtrace+0x0/0x1b0
show_stack+0x20/0x30
dump_stack+0xc0/0x10c
panic+0x16c/0x368
__warn+0x120/0x160
report_bug+0xc8/0x160
bug_handler+0x28/0x98
brk_handler+0x70/0xd0
do_debug_exception+0xcc/0x1ac
el1_sync_handler+0xe4/0x120
el1_sync+0x7c/0x100
ftrace_bug+0x204/0x238

Link: https://lkml.kernel.org/r/20200515100828.7091-1-cj.chengjian@huawei.com

Signed-off-by: Cheng Jian <cj.chengjian@huawei.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# e5a971d7 03-Apr-2020 Paul E. McKenney <paulmck@kernel.org>

ftrace: Use synchronize_rcu_tasks_rude() instead of ftrace_sync()

This commit replaces the schedule_on_each_cpu(ftrace_sync) instances
with synchronize_rcu_tasks_rude().

Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@redhat.com>
[ paulmck: Make Kconfig adjustments noted by kbuild test robot. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>


# 353da879 22-Apr-2020 Steven Rostedt (VMware) <rostedt@goodmis.org>

ftrace: Fix memory leak caused by not freeing entry in unregister_ftrace_direct()

kmemleak reported the following:

unreferenced object 0xffff90d47127a920 (size 32):
comm "modprobe", pid 1766, jiffies 4294792031 (age 162.568s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 22 01 00 00 00 00 ad de ........".......
00 78 12 a7 ff ff ff ff 00 00 b6 c0 ff ff ff ff .x..............
backtrace:
[<00000000bb79e72e>] register_ftrace_direct+0xcb/0x3a0
[<00000000295e4f79>] do_one_initcall+0x72/0x340
[<00000000873ead18>] do_init_module+0x5a/0x220
[<00000000974d9de5>] load_module+0x2235/0x2550
[<0000000059c3d6ce>] __do_sys_finit_module+0xc0/0x120
[<000000005a8611b4>] do_syscall_64+0x60/0x230
[<00000000a0cdc49e>] entry_SYSCALL_64_after_hwframe+0x49/0xb3

The entry used to save the direct descriptor needs to be freed
when unregistering.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 27683626 25-Mar-2020 Steven Rostedt (VMware) <rostedt@goodmis.org>

tracing: Create set_event_notrace_pid to not trace tasks

There's currently a way to select a task that should only have its events
traced, but there's no way to select a task not to have itsevents traced.
Add a set_event_notrace_pid file that acts the same as set_event_pid (and is
also affected by event-fork), but the task pids in this file will not be
traced even if they are listed in the set_event_pid file. This makes it easy
for tools like trace-cmd to "hide" itself from beint traced by events when
it is recording other tasks.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# b3b1e6ed 19-Mar-2020 Steven Rostedt (VMware) <rostedt@goodmis.org>

ftrace: Create set_ftrace_notrace_pid to not trace tasks

There's currently a way to select a task that should only be traced by
functions, but there's no way to select a task not to be traced by the
function tracer. Add a set_ftrace_notrace_pid file that acts the same as
set_ftrace_pid (and is also affected by function-fork), but the task pids in
this file will not be traced even if they are listed in the set_ftrace_pid
file. This makes it easy for tools like trace-cmd to "hide" itself from the
function tracer when it is recording other tasks.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 717e3f5e 19-Mar-2020 Steven Rostedt (VMware) <rostedt@goodmis.org>

ftrace: Make function trace pid filtering a bit more exact

The set_ftrace_pid file is used to filter function tracing to only trace
tasks that are listed in that file. Instead of testing the pids listed in
that file (it's a bitmask) at each function trace event, the logic is done
via a sched_switch hook. A flag is set when the next task to run is in the
list of pids in the set_ftrace_pid file. But the sched_switch hook is not at
the exact location of when the task switches, and the flag gets set before
the task to be traced actually runs. This leaves a residue of traced
functions that do not belong to the pid that should be filtered on.

By changing the logic slightly, where instead of having a boolean flag to
test, record the pid that should be traced, with special values for not to
trace and always trace. Then at each function call, a check will be made to
see if the function should be ignored, or if the current pid matches the
function that should be traced, and only trace if it matches (or if it has
the special value to always trace).

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# d9815bff 06-Mar-2020 Artem Savkov <asavkov@redhat.com>

ftrace: Return the first found result in lookup_rec()

It appears that ip ranges can overlap so. In that case lookup_rec()
returns whatever results it got last even if it found nothing in last
searched page.

This breaks an obscure livepatch late module patching usecase:
- load livepatch
- load the patched module
- unload livepatch
- try to load livepatch again

To fix this return from lookup_rec() as soon as it found the record
containing searched-for ip. This used to be this way prior lookup_rec()
introduction.

Link: http://lkml.kernel.org/r/20200306174317.21699-1-asavkov@redhat.com

Cc: stable@vger.kernel.org
Fixes: 7e16f581a817 ("ftrace: Separate out functionality from ftrace_location_range()")
Signed-off-by: Artem Savkov <asavkov@redhat.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 54a16ff6 05-Feb-2020 Steven Rostedt (VMware) <rostedt@goodmis.org>

ftrace: Protect ftrace_graph_hash with ftrace_sync

As function_graph tracer can run when RCU is not "watching", it can not be
protected by synchronize_rcu() it requires running a task on each CPU before
it can be freed. Calling schedule_on_each_cpu(ftrace_sync) needs to be used.

Link: https://lore.kernel.org/r/20200205131110.GT2935@paulmck-ThinkPad-P72

Cc: stable@vger.kernel.org
Fixes: b9b0c831bed26 ("ftrace: Convert graph filter to use hash tables")
Reported-by: "Paul E. McKenney" <paulmck@kernel.org>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# fd0e6852 04-Feb-2020 Amol Grover <frextrite@gmail.com>

tracing: Annotate ftrace_graph_notrace_hash pointer with __rcu

Fix following instances of sparse error
kernel/trace/ftrace.c:5667:29: error: incompatible types in comparison
kernel/trace/ftrace.c:5813:21: error: incompatible types in comparison
kernel/trace/ftrace.c:5868:36: error: incompatible types in comparison
kernel/trace/ftrace.c:5870:25: error: incompatible types in comparison

Use rcu_dereference_protected to dereference the newly annotated pointer.

Link: http://lkml.kernel.org/r/20200205055701.30195-1-frextrite@gmail.com

Signed-off-by: Amol Grover <frextrite@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 24a9729f 31-Jan-2020 Amol Grover <frextrite@gmail.com>

tracing: Annotate ftrace_graph_hash pointer with __rcu

Fix following instances of sparse error
kernel/trace/ftrace.c:5664:29: error: incompatible types in comparison
kernel/trace/ftrace.c:5785:21: error: incompatible types in comparison
kernel/trace/ftrace.c:5864:36: error: incompatible types in comparison
kernel/trace/ftrace.c:5866:25: error: incompatible types in comparison

Use rcu_dereference_protected to access the __rcu annotated pointer.

Link: http://lkml.kernel.org/r/20200201072703.17330-1-frextrite@gmail.com

Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Amol Grover <frextrite@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# e4075e8b 24-Jan-2020 Vasily Averin <vvs@virtuozzo.com>

ftrace: fpid_next() should increase position index

if seq_file .next fuction does not change position index,
read after some lseek can generate unexpected output.

Without patch:
# dd bs=4 skip=1 if=/sys/kernel/tracing/set_ftrace_pid
dd: /sys/kernel/tracing/set_ftrace_pid: cannot skip to specified offset
id
no pid
2+1 records in
2+1 records out
10 bytes copied, 0.000213285 s, 46.9 kB/s

Notice the "id" followed by "no pid".

With the patch:
# dd bs=4 skip=1 if=/sys/kernel/tracing/set_ftrace_pid
dd: /sys/kernel/tracing/set_ftrace_pid: cannot skip to specified offset
id
0+1 records in
0+1 records out
3 bytes copied, 0.000202112 s, 14.8 kB/s

Notice that it only prints "id" and not the "no pid" afterward.

Link: http://lkml.kernel.org/r/4f87c6ad-f114-30bb-8506-c32274ce2992@virtuozzo.com

https://bugzilla.kernel.org/show_bug.cgi?id=206283
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 24589e3a 25-Jan-2020 Steven Rostedt (VMware) <rostedt@goodmis.org>

tracing: Use pr_err() instead of WARN() for memory failures

As warnings can trigger panics, especially when "panic_on_warn" is set,
memory failure warnings can cause panics and fail fuzz testers that are
stressing memory.

Create a MEM_FAIL() macro to use instead of WARN() in the tracing code
(perhaps this should be a kernel wide macro?), and use that for memory
failure issues. This should stop failing fuzz tests due to warnings.

Link: https://lore.kernel.org/r/CACT4Y+ZP-7np20GVRu3p+eZys9GPtbu+JpfV+HtsufAzvTgJrg@mail.gmail.com

Suggested-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# aff4866d 20-Jan-2020 Alex Shi <alex.shi@linux.alibaba.com>

ftrace: Remove NR_TO_INIT macro

This macro isn't used from commit cb7be3b2fc2c ("ftrace: remove
daemon"). So no needs to keep it.

Link: http://lkml.kernel.org/r/1579586063-44984-1-git-send-email-alex.shi@linux.alibaba.com

Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 9a09cd74 20-Jan-2020 Alex Shi <alex.shi@linux.alibaba.com>

ftrace: Remove abandoned macros

These 2 macros aren't used from commit eee8ded131f1 ("ftrace: Have the
function probes call their own function"), so remove them.

Link: http://lkml.kernel.org/r/1579585807-43316-1-git-send-email-alex.shi@linux.alibaba.com

Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 1c5eb448 09-Jan-2020 Steven Rostedt (VMware) <rostedt@goodmis.org>

tracing: Rename trace_buffer to array_buffer

As we are working to remove the generic "ring_buffer" name that is used by
both tracing and perf, the ring_buffer name for tracing will be renamed to
trace_buffer, and perf's ring buffer will be renamed to perf_buffer.

As there already exists a trace_buffer that is used by the trace_arrays, it
needs to be first renamed to array_buffer.

Link: https://lore.kernel.org/r/20191213153553.GE20583@krava

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# e31f7939 02-Jan-2020 Wen Yang <wenyang@linux.alibaba.com>

ftrace: Avoid potential division by zero in function profiler

The ftrace_profile->counter is unsigned long and
do_div truncates it to 32 bits, which means it can test
non-zero and be truncated to zero for division.
Fix this issue by using div64_ul() instead.

Link: http://lkml.kernel.org/r/20200103030248.14516-1-wenyang@linux.alibaba.com

Cc: stable@vger.kernel.org
Fixes: e330b3bcd8319 ("tracing: Show sample std dev in function profiling")
Fixes: 34886c8bc590f ("tracing: add average time in function to function profiler")
Signed-off-by: Wen Yang <wenyang@linux.alibaba.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# ff205766 08-Dec-2019 Alexei Starovoitov <ast@kernel.org>

ftrace: Fix function_graph tracer interaction with BPF trampoline

Depending on type of BPF programs served by BPF trampoline it can call original
function. In such case the trampoline will skip one stack frame while
returning. That will confuse function_graph tracer and will cause crashes with
bad RIP. Teach graph tracer to skip functions that have BPF trampoline attached.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# ea806eb3 17-Nov-2019 Steven Rostedt (VMware) <rostedt@goodmis.org>

ftrace: Add a helper function to modify_ftrace_direct() to allow arch optimization

If a direct ftrace callback is at a location that does not have any other
ftrace helpers attached to it, it is possible to simply just change the
text to call the new caller (if the architecture supports it). But this
requires special architecture code. Currently, modify_ftrace_direct() uses a
trick to add a stub ftrace callback to the location forcing it to call the
ftrace iterator. Then it can change the direct helper to call the new
function in C, and then remove the stub. Removing the stub will have the
location now call the new location that the direct helper is using.

The new helper function does the registering the stub trick, but is a weak
function, allowing an architecture to override it to do something a bit more
direct.

Link: https://lore.kernel.org/r/20191115215125.mbqv7taqnx376yed@ast-mbp.dhcp.thefacebook.com

Suggested-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 128161f4 15-Nov-2019 Steven Rostedt (VMware) <rostedt@goodmis.org>

ftrace: Add helper find_direct_entry() to consolidate code

Both unregister_ftrace_direct() and modify_ftrace_direct() needs to
normalize the ip passed in to match the rec->ip, as it is acceptable to have
the ip on the ftrace call site but not the start. There are also common
validity checks with the record found by the ip, these should be done for
both unregister_ftrace_direct() and modify_ftrace_direct().

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 406acdd3 15-Nov-2019 Steven Rostedt (VMware) <rostedt@goodmis.org>

ftrace: Add another check for match in register_ftrace_direct()

As an instruction pointer passed into register_ftrace_direct() may just
exist on the ftrace call site, but may not be the start of the call site
itself, register_ftrace_direct() still needs to update test if a direct call
exists on the normalized site, as only one direct call is allowed at any one
time.

Fixes: 763e34e74bb7d ("ftrace: Add register_ftrace_direct()")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 1c7f9b67 15-Nov-2019 Steven Rostedt (VMware) <rostedt@goodmis.org>

ftrace: Fix accounting bug with direct->count in register_ftrace_direct()

The direct->count wasn't being updated properly, where it only was updated
when the first entry was added, but should be updated every time.

Fixes: 013bf0da04748 ("ftrace: Add ftrace_find_direct_func()")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 0567d680 14-Nov-2019 Steven Rostedt (VMware) <rostedt@goodmis.org>

ftrace: Add modify_ftrace_direct()

Add a new function modify_ftrace_direct() that will allow a user to update
an existing direct caller to a new trampoline, without missing hits due to
unregistering one and then adding another.

Link: https://lore.kernel.org/r/20191109022907.6zzo6orhxpt5n2sv@ast-mbp.dhcp.thefacebook.com

Suggested-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 80042c8f 07-Oct-2019 Andy Shevchenko <andriy.shevchenko@linux.intel.com>

tracing: Use generic type for comparator function

Comparator function type, cmp_func_t, is defined in the types.h,
use it in the code.

Link: http://lkml.kernel.org/r/20191007135656.37734-3-andriy.shevchenko@linux.intel.com

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# da537f0a 01-Oct-2019 Steven Rostedt (VMware) <rostedt@goodmis.org>

ftrace: Add information on number of page groups allocated

Looking for ways to shrink the size of the dyn_ftrace structure, knowing the
information about how many pages and the number of groups of those pages, is
useful in working out the best ways to save on memory.

This adds one info print on how many groups of pages were used to allocate
the ftrace dyn_ftrace structures, and also shows the number of pages and
groups in the dyn_ftrace_total_info (which is used for debugging).

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# a3ad1a7e 08-Nov-2019 Steven Rostedt (VMware) <rostedt@goodmis.org>

ftrace/x86: Add a counter to test function_graph with direct

As testing for direct calls from the function graph tracer adds a little
overhead (which is a lot when tracing every function), add a counter that
can be used to test if function_graph tracer needs to test for a direct
caller or not.

It would have been nicer if we could use a static branch, but the static
branch logic fails when used within the function graph tracer trampoline.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 013bf0da 08-Nov-2019 Steven Rostedt (VMware) <rostedt@goodmis.org>

ftrace: Add ftrace_find_direct_func()

As function_graph tracer modifies the return address to insert a trampoline
to trace the return of a function, it must be aware of a direct caller, as
when it gets called, the function's return address may not be at on the
stack where it expects. It may have to see if that return address points to
the a direct caller and adjust if it is.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 763e34e7 08-Nov-2019 Steven Rostedt (VMware) <rostedt@goodmis.org>

ftrace: Add register_ftrace_direct()

Add the start of the functionality to allow other trampolines to use the
ftrace mcount/fentry/nop location. This adds two new functions:

register_ftrace_direct() and unregister_ftrace_direct()

Both take two parameters: the first is the instruction address of where the
mcount/fentry/nop exists, and the second is the trampoline to have that
location called.

This will handle cases where ftrace is already used on that same location,
and will make it still work, where the registered direct called trampoline
will get called after all the registered ftrace callers are handled.

Currently, it will not allow for IP_MODIFY functions to be called at the
same locations, which include some kprobes and live kernel patching.

At this point, no architecture supports this. This is only the start of
implementing the framework.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 7e16f581 07-Nov-2019 Steven Rostedt (VMware) <rostedt@goodmis.org>

ftrace: Separate out functionality from ftrace_location_range()

Create a new function called lookup_rec() from the functionality of
ftrace_location_range(). The difference between lookup_rec() is that it
returns the record that it finds, where as ftrace_location_range() returns
only if it found a match or not.

The lookup_rec() is static, and can be used for new functionality where
ftrace needs to find a record of a specific address.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 714641c3 07-Nov-2019 Steven Rostedt (VMware) <rostedt@goodmis.org>

ftrace: Separate out the copying of a ftrace_hash from __ftrace_hash_move()

Most of the functionality of __ftrace_hash_move() can be reused, but not all
of it. That is, __ftrace_hash_move() is used to simply make a new hash from
an existing one, using the same size as the original. Creating a dup_hash(),
where we can specify a new size will be useful when we want to create a hash
with a default size, or simply copy the old one.

Signed-off-by: Steven Rostedt (VMWare) <rostedt@goodmis.org>


# fbf6c73c 16-Oct-2019 Mark Rutland <mark.rutland@arm.com>

ftrace: add ftrace_init_nop()

Architectures may need to perform special initialization of ftrace
callsites, and today they do so by special-casing ftrace_make_nop() when
the expected branch address is MCOUNT_ADDR. In some cases (e.g. for
patchable-function-entry), we don't have an mcount-like symbol and don't
want a synthetic MCOUNT_ADDR, but we may need to perform some
initialization of callsites.

To make it possible to separate initialization from runtime
modification, and to handle cases without an mcount-like symbol, this
patch adds an optional ftrace_init_nop() function that architectures can
implement, which does not pass a branch address.

Where an architecture does not provide ftrace_init_nop(), we will fall
back to the existing behaviour of calling ftrace_make_nop() with
MCOUNT_ADDR.

At the same time, ftrace_code_disable() is renamed to
ftrace_nop_initialize() to make it clearer that it is intended to
intialize a callsite into a disabled state, and is not for disabling a
callsite that has been runtime enabled. The kerneldoc description of rec
arguments is updated to cover non-mcount callsites.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Amit Daniel Kachhap <amit.kachhap@arm.com>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Reviewed-by: Torsten Duwe <duwe@suse.de>
Tested-by: Amit Daniel Kachhap <amit.kachhap@arm.com>
Tested-by: Sven Schnelle <svens@stackframe.org>
Tested-by: Torsten Duwe <duwe@suse.de>
Cc: Ingo Molnar <mingo@redhat.com>


# 7162431d 16-Oct-2019 Miroslav Benes <mbenes@suse.cz>

ftrace: Introduce PERMANENT ftrace_ops flag

Livepatch uses ftrace for redirection to new patched functions. It means
that if ftrace is disabled, all live patched functions are disabled as
well. Toggling global 'ftrace_enabled' sysctl thus affect it directly.
It is not a problem per se, because only administrator can set sysctl
values, but it still may be surprising.

Introduce PERMANENT ftrace_ops flag to amend this. If the
FTRACE_OPS_FL_PERMANENT is set on any ftrace ops, the tracing cannot be
disabled by disabling ftrace_enabled. Equally, a callback with the flag
set cannot be registered if ftrace_enabled is disabled.

Link: http://lkml.kernel.org/r/20191016113316.13415-2-mbenes@suse.cz

Reviewed-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
Signed-off-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 17911ff3 11-Oct-2019 Steven Rostedt (VMware) <rostedt@goodmis.org>

tracing: Add locked_down checks to the open calls of files created for tracefs

Added various checks on open tracefs calls to see if tracefs is in lockdown
mode, and if so, to return -EPERM.

Note, the event format files (which are basically standard on all machines)
as well as the enabled_functions file (which shows what is currently being
traced) are not lockde down. Perhaps they should be, but it seems counter
intuitive to lockdown information to help you know if the system has been
modified.

Link: http://lkml.kernel.org/r/CAHk-=wj7fGPKUspr579Cii-w_y60PtRaiDgKuxVtBAMK0VNNkA@mail.gmail.com

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 8530dec6 11-Oct-2019 Steven Rostedt (VMware) <rostedt@goodmis.org>

tracing: Add tracing_check_open_get_tr()

Currently, most files in the tracefs directory test if tracing_disabled is
set. If so, it should return -ENODEV. The tracing_disabled is called when
tracing is found to be broken. Originally it was done in case the ring
buffer was found to be corrupted, and we wanted to prevent reading it from
crashing the kernel. But it's also called if a tracing selftest fails on
boot. It's a one way switch. That is, once it is triggered, tracing is
disabled until reboot.

As most tracefs files can also be used by instances in the tracefs
directory, they need to be carefully done. Each instance has a trace_array
associated to it, and when the instance is removed, the trace_array is
freed. But if an instance is opened with a reference to the trace_array,
then it requires looking up the trace_array to get its ref counter (as there
could be a race with it being deleted and the open itself). Once it is
found, a reference is added to prevent the instance from being removed (and
the trace_array associated with it freed).

Combine the two checks (tracing_disabled and trace_array_get()) into a
single helper function. This will also make it easier to add lockdown to
tracefs later.

Link: http://lkml.kernel.org/r/20191011135458.7399da44@gandalf.local.home

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 9ef16693 11-Oct-2019 Steven Rostedt (VMware) <rostedt@goodmis.org>

ftrace: Get a reference counter for the trace_array on filter files

The ftrace set_ftrace_filter and set_ftrace_notrace files are specific for
an instance now. They need to take a reference to the instance otherwise
there could be a race between accessing the files and deleting the instance.

It wasn't until the :mod: caching where these file operations started
referencing the trace_array directly.

Cc: stable@vger.kernel.org
Fixes: 673feb9d76ab3 ("ftrace: Add :mod: caching infrastructure to trace_array")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 08468754 10-Sep-2019 Changbin Du <changbin.du@intel.com>

ftrace: Simplify ftrace hash lookup code in clear_func_from_hash()

Function ftrace_lookup_ip() will check empty hash table. So we don't
need extra check outside.

Link: http://lkml.kernel.org/r/20190910143336.13472-1-changbin.du@gmail.com

Signed-off-by: Changbin Du <changbin.du@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 5b0022dd 04-Jul-2019 Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>

ftrace: Check for successful allocation of hash

In register_ftrace_function_probe(), we are not checking the return
value of alloc_and_copy_ftrace_hash(). The subsequent call to
ftrace_match_records() may end up dereferencing the same. Add a check to
ensure this doesn't happen.

Link: http://lkml.kernel.org/r/26e92574f25ad23e7cafa3cf5f7a819de1832cbe.1562249521.git.naveen.n.rao@linux.vnet.ibm.com

Cc: stable@vger.kernel.org
Fixes: 1ec3a81a0cf42 ("ftrace: Have each function probe use its own ftrace_ops")
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 372e0d01 30-Aug-2019 Steven Rostedt (VMware) <rostedt@goodmis.org>

ftrace: Check for empty hash and comment the race with registering probes

The race between adding a function probe and reading the probes that exist
is very subtle. It needs a comment. Also, the issue can also happen if the
probe has has the EMPTY_HASH as its func_hash.

Cc: stable@vger.kernel.org
Fixes: 7b60f3d876156 ("ftrace: Dynamically create the probe ftrace_ops for the trace_array")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 7bd46644 04-Jul-2019 Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>

ftrace: Fix NULL pointer dereference in t_probe_next()

LTP testsuite on powerpc results in the below crash:

Unable to handle kernel paging request for data at address 0x00000000
Faulting instruction address: 0xc00000000029d800
Oops: Kernel access of bad area, sig: 11 [#1]
LE SMP NR_CPUS=2048 NUMA PowerNV
...
CPU: 68 PID: 96584 Comm: cat Kdump: loaded Tainted: G W
NIP: c00000000029d800 LR: c00000000029dac4 CTR: c0000000001e6ad0
REGS: c0002017fae8ba10 TRAP: 0300 Tainted: G W
MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 28022422 XER: 20040000
CFAR: c00000000029d90c DAR: 0000000000000000 DSISR: 40000000 IRQMASK: 0
...
NIP [c00000000029d800] t_probe_next+0x60/0x180
LR [c00000000029dac4] t_mod_start+0x1a4/0x1f0
Call Trace:
[c0002017fae8bc90] [c000000000cdbc40] _cond_resched+0x10/0xb0 (unreliable)
[c0002017fae8bce0] [c0000000002a15b0] t_start+0xf0/0x1c0
[c0002017fae8bd30] [c0000000004ec2b4] seq_read+0x184/0x640
[c0002017fae8bdd0] [c0000000004a57bc] sys_read+0x10c/0x300
[c0002017fae8be30] [c00000000000b388] system_call+0x5c/0x70

The test (ftrace_set_ftrace_filter.sh) is part of ftrace stress tests
and the crash happens when the test does 'cat
$TRACING_PATH/set_ftrace_filter'.

The address points to the second line below, in t_probe_next(), where
filter_hash is dereferenced:
hash = iter->probe->ops.func_hash->filter_hash;
size = 1 << hash->size_bits;

This happens due to a race with register_ftrace_function_probe(). A new
ftrace_func_probe is created and added into the func_probes list in
trace_array under ftrace_lock. However, before initializing the filter,
we drop ftrace_lock, and re-acquire it after acquiring regex_lock. If
another process is trying to read set_ftrace_filter, it will be able to
acquire ftrace_lock during this window and it will end up seeing a NULL
filter_hash.

Fix this by just checking for a NULL filter_hash in t_probe_next(). If
the filter_hash is NULL, then this probe is just being added and we can
simply return from here.

Link: http://lkml.kernel.org/r/05e021f757625cbbb006fad41380323dbe4e3b43.1562249521.git.naveen.n.rao@linux.vnet.ibm.com

Cc: stable@vger.kernel.org
Fixes: 7b60f3d876156 ("ftrace: Dynamically create the probe ftrace_ops for the trace_array")
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 30c93704 26-Jul-2019 Thomas Gleixner <tglx@linutronix.de>

tracing: Use CONFIG_PREEMPTION

CONFIG_PREEMPTION is selected by CONFIG_PREEMPT and by
CONFIG_PREEMPT_RT. Both PREEMPT and PREEMPT_RT require the same
functionality which today depends on CONFIG_PREEMPT.

Switch the conditionals in the tracer over to CONFIG_PREEMPTION.

This is the first step to make the tracer work on RT. The other small
tweaks are submitted separately.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Paul E. McKenney <paulmck@linux.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20190726212124.409766323@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# d5b844a2 27-Jun-2019 Petr Mladek <pmladek@suse.com>

ftrace/x86: Remove possible deadlock between register_kprobe() and ftrace_run_update_code()

The commit 9f255b632bf12c4dd7 ("module: Fix livepatch/ftrace module text
permissions race") causes a possible deadlock between register_kprobe()
and ftrace_run_update_code() when ftrace is using stop_machine().

The existing dependency chain (in reverse order) is:

-> #1 (text_mutex){+.+.}:
validate_chain.isra.21+0xb32/0xd70
__lock_acquire+0x4b8/0x928
lock_acquire+0x102/0x230
__mutex_lock+0x88/0x908
mutex_lock_nested+0x32/0x40
register_kprobe+0x254/0x658
init_kprobes+0x11a/0x168
do_one_initcall+0x70/0x318
kernel_init_freeable+0x456/0x508
kernel_init+0x22/0x150
ret_from_fork+0x30/0x34
kernel_thread_starter+0x0/0xc

-> #0 (cpu_hotplug_lock.rw_sem){++++}:
check_prev_add+0x90c/0xde0
validate_chain.isra.21+0xb32/0xd70
__lock_acquire+0x4b8/0x928
lock_acquire+0x102/0x230
cpus_read_lock+0x62/0xd0
stop_machine+0x2e/0x60
arch_ftrace_update_code+0x2e/0x40
ftrace_run_update_code+0x40/0xa0
ftrace_startup+0xb2/0x168
register_ftrace_function+0x64/0x88
klp_patch_object+0x1a2/0x290
klp_enable_patch+0x554/0x980
do_one_initcall+0x70/0x318
do_init_module+0x6e/0x250
load_module+0x1782/0x1990
__s390x_sys_finit_module+0xaa/0xf0
system_call+0xd8/0x2d0

Possible unsafe locking scenario:

CPU0 CPU1
---- ----
lock(text_mutex);
lock(cpu_hotplug_lock.rw_sem);
lock(text_mutex);
lock(cpu_hotplug_lock.rw_sem);

It is similar problem that has been solved by the commit 2d1e38f56622b9b
("kprobes: Cure hotplug lock ordering issues"). Many locks are involved.
To be on the safe side, text_mutex must become a low level lock taken
after cpu_hotplug_lock.rw_sem.

This can't be achieved easily with the current ftrace design.
For example, arm calls set_all_modules_text_rw() already in
ftrace_arch_code_modify_prepare(), see arch/arm/kernel/ftrace.c.
This functions is called:

+ outside stop_machine() from ftrace_run_update_code()
+ without stop_machine() from ftrace_module_enable()

Fortunately, the problematic fix is needed only on x86_64. It is
the only architecture that calls set_all_modules_text_rw()
in ftrace path and supports livepatching at the same time.

Therefore it is enough to move text_mutex handling from the generic
kernel/trace/ftrace.c into arch/x86/kernel/ftrace.c:

ftrace_arch_code_modify_prepare()
ftrace_arch_code_modify_post_process()

This patch basically reverts the ftrace part of the problematic
commit 9f255b632bf12c4dd7 ("module: Fix livepatch/ftrace module
text permissions race"). And provides x86_64 specific-fix.

Some refactoring of the ftrace code will be needed when livepatching
is implemented for arm or nds32. These architectures call
set_all_modules_text_rw() and use stop_machine() at the same time.

Link: http://lkml.kernel.org/r/20190627081334.12793-1-pmladek@suse.com

Fixes: 9f255b632bf12c4dd7 ("module: Fix livepatch/ftrace module text permissions race")
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Reported-by: Miroslav Benes <mbenes@suse.cz>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
[
As reviewed by Miroslav Benes <mbenes@suse.cz>, removed return value of
ftrace_run_update_code() as it is a void function.
]
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 04e03d9a 05-Jun-2019 Wei Li <liwei391@huawei.com>

ftrace: Fix NULL pointer dereference in free_ftrace_func_mapper()

The mapper may be NULL when called from register_ftrace_function_probe()
with probe->data == NULL.

This issue can be reproduced as follow (it may be covered by compiler
optimization sometime):

/ # cat /sys/kernel/debug/tracing/set_ftrace_filter
#### all functions enabled ####
/ # echo foo_bar:dump > /sys/kernel/debug/tracing/set_ftrace_filter
[ 206.949100] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
[ 206.952402] Mem abort info:
[ 206.952819] ESR = 0x96000006
[ 206.955326] Exception class = DABT (current EL), IL = 32 bits
[ 206.955844] SET = 0, FnV = 0
[ 206.956272] EA = 0, S1PTW = 0
[ 206.956652] Data abort info:
[ 206.957320] ISV = 0, ISS = 0x00000006
[ 206.959271] CM = 0, WnR = 0
[ 206.959938] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000419f3a000
[ 206.960483] [0000000000000000] pgd=0000000411a87003, pud=0000000411a83003, pmd=0000000000000000
[ 206.964953] Internal error: Oops: 96000006 [#1] SMP
[ 206.971122] Dumping ftrace buffer:
[ 206.973677] (ftrace buffer empty)
[ 206.975258] Modules linked in:
[ 206.976631] Process sh (pid: 281, stack limit = 0x(____ptrval____))
[ 206.978449] CPU: 10 PID: 281 Comm: sh Not tainted 5.2.0-rc1+ #17
[ 206.978955] Hardware name: linux,dummy-virt (DT)
[ 206.979883] pstate: 60000005 (nZCv daif -PAN -UAO)
[ 206.980499] pc : free_ftrace_func_mapper+0x2c/0x118
[ 206.980874] lr : ftrace_count_free+0x68/0x80
[ 206.982539] sp : ffff0000182f3ab0
[ 206.983102] x29: ffff0000182f3ab0 x28: ffff8003d0ec1700
[ 206.983632] x27: ffff000013054b40 x26: 0000000000000001
[ 206.984000] x25: ffff00001385f000 x24: 0000000000000000
[ 206.984394] x23: ffff000013453000 x22: ffff000013054000
[ 206.984775] x21: 0000000000000000 x20: ffff00001385fe28
[ 206.986575] x19: ffff000013872c30 x18: 0000000000000000
[ 206.987111] x17: 0000000000000000 x16: 0000000000000000
[ 206.987491] x15: ffffffffffffffb0 x14: 0000000000000000
[ 206.987850] x13: 000000000017430e x12: 0000000000000580
[ 206.988251] x11: 0000000000000000 x10: cccccccccccccccc
[ 206.988740] x9 : 0000000000000000 x8 : ffff000013917550
[ 206.990198] x7 : ffff000012fac2e8 x6 : ffff000012fac000
[ 206.991008] x5 : ffff0000103da588 x4 : 0000000000000001
[ 206.991395] x3 : 0000000000000001 x2 : ffff000013872a28
[ 206.991771] x1 : 0000000000000000 x0 : 0000000000000000
[ 206.992557] Call trace:
[ 206.993101] free_ftrace_func_mapper+0x2c/0x118
[ 206.994827] ftrace_count_free+0x68/0x80
[ 206.995238] release_probe+0xfc/0x1d0
[ 206.995555] register_ftrace_function_probe+0x4a8/0x868
[ 206.995923] ftrace_trace_probe_callback.isra.4+0xb8/0x180
[ 206.996330] ftrace_dump_callback+0x50/0x70
[ 206.996663] ftrace_regex_write.isra.29+0x290/0x3a8
[ 206.997157] ftrace_filter_write+0x44/0x60
[ 206.998971] __vfs_write+0x64/0xf0
[ 206.999285] vfs_write+0x14c/0x2f0
[ 206.999591] ksys_write+0xbc/0x1b0
[ 206.999888] __arm64_sys_write+0x3c/0x58
[ 207.000246] el0_svc_common.constprop.0+0x408/0x5f0
[ 207.000607] el0_svc_handler+0x144/0x1c8
[ 207.000916] el0_svc+0x8/0xc
[ 207.003699] Code: aa0003f8 a9025bf5 aa0103f5 f946ea80 (f9400303)
[ 207.008388] ---[ end trace 7b6d11b5f542bdf1 ]---
[ 207.010126] Kernel panic - not syncing: Fatal exception
[ 207.011322] SMP: stopping secondary CPUs
[ 207.013956] Dumping ftrace buffer:
[ 207.014595] (ftrace buffer empty)
[ 207.015632] Kernel Offset: disabled
[ 207.017187] CPU features: 0x002,20006008
[ 207.017985] Memory Limit: none
[ 207.019825] ---[ end Kernel panic - not syncing: Fatal exception ]---

Link: http://lkml.kernel.org/r/20190606031754.10798-1-liwei391@huawei.com

Signed-off-by: Wei Li <liwei391@huawei.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 9f255b63 13-Jun-2019 Josh Poimboeuf <jpoimboe@redhat.com>

module: Fix livepatch/ftrace module text permissions race

It's possible for livepatch and ftrace to be toggling a module's text
permissions at the same time, resulting in the following panic:

BUG: unable to handle page fault for address: ffffffffc005b1d9
#PF: supervisor write access in kernel mode
#PF: error_code(0x0003) - permissions violation
PGD 3ea0c067 P4D 3ea0c067 PUD 3ea0e067 PMD 3cc13067 PTE 3b8a1061
Oops: 0003 [#1] PREEMPT SMP PTI
CPU: 1 PID: 453 Comm: insmod Tainted: G O K 5.2.0-rc1-a188339ca5 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-20181126_142135-anatol 04/01/2014
RIP: 0010:apply_relocate_add+0xbe/0x14c
Code: fa 0b 74 21 48 83 fa 18 74 38 48 83 fa 0a 75 40 eb 08 48 83 38 00 74 33 eb 53 83 38 00 75 4e 89 08 89 c8 eb 0a 83 38 00 75 43 <89> 08 48 63 c1 48 39 c8 74 2e eb 48 83 38 00 75 32 48 29 c1 89 08
RSP: 0018:ffffb223c00dbb10 EFLAGS: 00010246
RAX: ffffffffc005b1d9 RBX: 0000000000000000 RCX: ffffffff8b200060
RDX: 000000000000000b RSI: 0000004b0000000b RDI: ffff96bdfcd33000
RBP: ffffb223c00dbb38 R08: ffffffffc005d040 R09: ffffffffc005c1f0
R10: ffff96bdfcd33c40 R11: ffff96bdfcd33b80 R12: 0000000000000018
R13: ffffffffc005c1f0 R14: ffffffffc005e708 R15: ffffffff8b2fbc74
FS: 00007f5f447beba8(0000) GS:ffff96bdff900000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffffc005b1d9 CR3: 000000003cedc002 CR4: 0000000000360ea0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
klp_init_object_loaded+0x10f/0x219
? preempt_latency_start+0x21/0x57
klp_enable_patch+0x662/0x809
? virt_to_head_page+0x3a/0x3c
? kfree+0x8c/0x126
patch_init+0x2ed/0x1000 [livepatch_test02]
? 0xffffffffc0060000
do_one_initcall+0x9f/0x1c5
? kmem_cache_alloc_trace+0xc4/0xd4
? do_init_module+0x27/0x210
do_init_module+0x5f/0x210
load_module+0x1c41/0x2290
? fsnotify_path+0x3b/0x42
? strstarts+0x2b/0x2b
? kernel_read+0x58/0x65
__do_sys_finit_module+0x9f/0xc3
? __do_sys_finit_module+0x9f/0xc3
__x64_sys_finit_module+0x1a/0x1c
do_syscall_64+0x52/0x61
entry_SYSCALL_64_after_hwframe+0x44/0xa9

The above panic occurs when loading two modules at the same time with
ftrace enabled, where at least one of the modules is a livepatch module:

CPU0 CPU1
klp_enable_patch()
klp_init_object_loaded()
module_disable_ro()
ftrace_module_enable()
ftrace_arch_code_modify_post_process()
set_all_modules_text_ro()
klp_write_object_relocations()
apply_relocate_add()
*patches read-only code* - BOOM

A similar race exists when toggling ftrace while loading a livepatch
module.

Fix it by ensuring that the livepatch and ftrace code patching
operations -- and their respective permissions changes -- are protected
by the text_mutex.

Link: http://lkml.kernel.org/r/ab43d56ab909469ac5d2520c5d944ad6d4abd476.1560474114.git.jpoimboe@redhat.com

Reported-by: Johannes Erdfelt <johannes@erdfelt.com>
Fixes: 444d13ff10fb ("modules: add ro_after_init support")
Acked-by: Jessica Yu <jeyu@kernel.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# cbdaeaf0 05-Jun-2019 Vasily Gorbik <gor@linux.ibm.com>

tracing: avoid build warning with HAVE_NOP_MCOUNT

Selecting HAVE_NOP_MCOUNT enables -mnop-mcount (if gcc supports it)
and sets CC_USING_NOP_MCOUNT. Reuse __is_defined (which is suitable for
testing CC_USING_* defines) to avoid conditional compilation and fix
the following gcc 9 warning on s390:

kernel/trace/ftrace.c:2514:1: warning: ‘ftrace_code_disable’ defined
but not used [-Wunused-function]

Link: http://lkml.kernel.org/r/patch.git-1a82d13f33ac.your-ad-here.call-01559732716-ext-6629@work.hours

Fixes: 2f4df0017baed ("tracing: Add -mcount-nop option support")
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# a124692b 04-May-2019 Cheng Jian <cj.chengjian@huawei.com>

ftrace: Enable trampoline when rec count returns back to one

Custom trampolines can only be enabled if there is only a single ops
attached to it. If there's only a single callback registered to a function,
and the ops has a trampoline registered for it, then we can call the
trampoline directly. This is very useful for improving the performance of
ftrace and livepatch.

If more than one callback is registered to a function, the general
trampoline is used, and the custom trampoline is not restored back to the
direct call even if all the other callbacks were unregistered and we are
back to one callback for the function.

To fix this, set FTRACE_FL_TRAMP flag if rec count is decremented
to one, and the ops that left has a trampoline.

Testing After this patch :

insmod livepatch_unshare_files.ko
cat /sys/kernel/debug/tracing/enabled_functions

unshare_files (1) R I tramp: 0xffffffffc0000000(klp_ftrace_handler+0x0/0xa0) ->ftrace_ops_assist_func+0x0/0xf0

echo unshare_files > /sys/kernel/debug/tracing/set_ftrace_filter
echo function > /sys/kernel/debug/tracing/current_tracer
cat /sys/kernel/debug/tracing/enabled_functions

unshare_files (2) R I ->ftrace_ops_list_func+0x0/0x150

echo nop > /sys/kernel/debug/tracing/current_tracer
cat /sys/kernel/debug/tracing/enabled_functions

unshare_files (1) R I tramp: 0xffffffffc0000000(klp_ftrace_handler+0x0/0xa0) ->ftrace_ops_assist_func+0x0/0xf0

Link: http://lkml.kernel.org/r/1556969979-111047-1-git-send-email-cj.chengjian@huawei.com

Signed-off-by: Cheng Jian <cj.chengjian@huawei.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 7375dca1 20-May-2019 Steven Rostedt (VMware) <rostedt@goodmis.org>

ftrace: Make enable and update parameters bool when applicable

The code modification functions have "enable" and "update" variables that
are sometimes "int" but used as "bool". Remove the ambiguity and make them
"bool" when they are only used for true or false values.

Link: http://lkml.kernel.org/r/e1429923d9eda92a3cf5ee9e33c7eacce539781d.1558115654.git.naveen.n.rao@linux.vnet.ibm.com

Reported-by: "Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 0f5e5a3a 20-Mar-2019 Rasmus Villemoes <linux@rasmusvillemoes.dk>

tracing: Eliminate const char[] auto variables

Automatic const char[] variables cause unnecessary code
generation. For example, the this_mod variable leads to

3f04: 48 b8 5f 5f 74 68 69 73 5f 6d movabs $0x6d5f736968745f5f,%rax # __this_m
3f0e: 4c 8d 44 24 02 lea 0x2(%rsp),%r8
3f13: 48 8d 7c 24 10 lea 0x10(%rsp),%rdi
3f18: 48 89 44 24 02 mov %rax,0x2(%rsp)
3f1d: 4c 89 e9 mov %r13,%rcx
3f20: b8 65 00 00 00 mov $0x65,%eax # e
3f25: 48 c7 c2 00 00 00 00 mov $0x0,%rdx
3f28: R_X86_64_32S .rodata.str1.1+0x18d
3f2c: be 48 00 00 00 mov $0x48,%esi
3f31: c7 44 24 0a 6f 64 75 6c movl $0x6c75646f,0xa(%rsp) # odul
3f39: 66 89 44 24 0e mov %ax,0xe(%rsp)

i.e., the string gets built on the stack at runtime. Similar code can be
found for the other instances I'm replacing here. Putting the string
in .rodata reduces the combined .text+.rodata size and saves time and
stack space at runtime.

The simplest fix, and what I've done for the this_mod case, is to just
make the variable static.

However, for the "<faulted>" case where the same string is used twice,
that prevents the linker from merging those two literals, so instead use
a macro - that also keeps the two instances automatically in
sync (instead of only the compile-time strlen expression).

Finally, for the two runs of spaces, it turns out that the "build
these strings on the stack" is not the worst part of what gcc does -
it turns print_func_help_header_irq() into "if (tgid) { /*
print_event_info + five seq_printf calls */ } else { /* print
event_info + another five seq_printf */}". Taking inspiration from a
suggestion from Al Viro, use %.*s to make snprintf either stop after
the first two spaces or print the whole string. As a bonus, the
seq_printfs now fit on single lines (at least, they are not longer
than the existing ones in the function just above), making it easier
to see that the ascii art lines up.

x86-64 defconfig + CONFIG_FUNCTION_TRACER:

$ scripts/stackdelta /tmp/stackusage.{0,1}
./kernel/trace/ftrace.c ftrace_mod_callback 152 136 -16
./kernel/trace/trace.c trace_default_header 56 32 -24
./kernel/trace/trace.c tracing_mark_raw_write 96 72 -24
./kernel/trace/trace.c tracing_mark_write 104 80 -24

bloat-o-meter

add/remove: 1/0 grow/shrink: 0/4 up/down: 14/-375 (-361)
Function old new delta
this_mod - 14 +14
ftrace_mod_callback 577 542 -35
tracing_mark_raw_write 444 374 -70
tracing_mark_write 616 540 -76
trace_default_header 600 406 -194

Link: http://lkml.kernel.org/r/20190320081757.6037-1-linux@rasmusvillemoes.dk

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# fabe38ab 23-Feb-2019 Masami Hiramatsu <mhiramat@kernel.org>

kprobes: Mark ftrace mcount handler functions nokprobe

Mark ftrace mcount handler functions nokprobe since
probing on these functions with kretprobe pushes
return address incorrectly on kretprobe shadow stack.

Reported-by: Francis Deslauriers <francis.deslauriers@efficios.com>
Tested-by: Andrea Righi <righi.andrea@gmail.com>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/155094062044.6137.6419622920568680640.stgit@devbox
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 2fa717a0 11-Apr-2019 Steven Rostedt (VMware) <rostedt@goodmis.org>

ftrace: Do not process STUB functions in ftrace_ops_list_func()

The function_graph tracer has a stub function and its ops flag has the
FTRACE_OPS_FL_STUB set. As the function graph does not use the
ftrace_ops->func pointer but instead is called by a separate part of the
ftrace trampoline. The function_graph tracer still requires to pass in a
ftrace_ops that may also hold the hash of the functions to call. But there's
no reason to test that hash in the function tracing portion. Instead of
testing to see if we should call the stub function, just test if the ops has
FTRACE_OPS_FL_STUB set, and just skip it.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# ee6a6500 10-Apr-2019 Steven Rostedt (VMware) <rostedt@goodmis.org>

ftrace: Remove ASSIGN_OPS_HASH() macro from ftrace.c

The ASSIGN_OPS_HASH() macro was moved to fgraph.c where it was used, but for
some reason it wasn't removed from ftrace.c, as it is no longer referenced
there.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 9efb85c5 23-Mar-2019 Hariprasad Kelam <hariprasad.kelam@gmail.com>

ftrace: Fix warning using plain integer as NULL & spelling corrections

Changed 0 --> NULL to avoid sparse warning
Corrected spelling mistakes reported by checkpatch.pl
Sparse warning below:

sudo make C=2 CF=-D__CHECK_ENDIAN__ M=kernel/trace

CHECK kernel/trace/ftrace.c
kernel/trace/ftrace.c:3007:24: warning: Using plain integer as NULL pointer
kernel/trace/ftrace.c:4758:37: warning: Using plain integer as NULL pointer

Link: http://lkml.kernel.org/r/20190323183523.GA2244@hari-Inspiron-1545

Signed-off-by: Hariprasad Kelam <hariprasad.kelam@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# f79b3f33 11-Feb-2019 Steven Rostedt (VMware) <rostedt@goodmis.org>

ftrace: Allow enabling of filters via index of available_filter_functions

Enabling of large number of functions by echoing in a large subset of the
functions in available_filter_functions can take a very long time. The
process requires testing all functions registered by the function tracer
(which is in the 10s of thousands), and doing a kallsyms lookup to convert
the ip address into a name, then comparing that name with the string passed
in.

When a function causes the function tracer to crash the system, a binary
bisect of the available_filter_functions can be done to find the culprit.
But this requires passing in half of the functions in
available_filter_functions over and over again, which makes it basically a
O(n^2) operation. With 40,000 functions, that ends up bing 1,600,000,000
opertions! And enabling this can take over 20 minutes.

As a quick speed up, if a number is passed into one of the filter files,
instead of doing a search, it just enables the function at the corresponding
line of the available_filter_functions file. That is:

# echo 50 > set_ftrace_filter
# cat set_ftrace_filter
x86_pmu_commit_txn

# head -50 available_filter_functions | tail -1
x86_pmu_commit_txn

This allows setting of half the available_filter_functions to take place in
less than a second!

# time seq 20000 > set_ftrace_filter
real 0m0.042s
user 0m0.005s
sys 0m0.015s

# wc -l set_ftrace_filter
20000 set_ftrace_filter

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 2840f84f 10-Dec-2018 Steven Rostedt (VMware) <rostedt@goodmis.org>

tracing: Fix memory leak of instance function hash filters

The following commands will cause a memory leak:

# cd /sys/kernel/tracing
# mkdir instances/foo
# echo schedule > instance/foo/set_ftrace_filter
# rmdir instances/foo

The reason is that the hashes that hold the filters to set_ftrace_filter and
set_ftrace_notrace are not freed if they contain any data on the instance
and the instance is removed.

Found by kmemleak detector.

Cc: stable@vger.kernel.org
Fixes: 591dffdade9f ("ftrace: Allow for function tracing instance to filter functions")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# a0572f68 04-Dec-2018 Steven Rostedt (VMware) <rostedt@goodmis.org>

ftrace: Allow ftrace_replace_code() to be schedulable

The function ftrace_replace_code() is the ftrace engine that does the
work to modify all the nops into the calls to the function callback in
all the functions being traced.

The generic version which is normally called from stop machine, but an
architecture can implement a non stop machine version and still use the
generic ftrace_replace_code(). When an architecture does this,
ftrace_replace_code() may be called from a schedulable context, where
it can allow the code to be preemptible, and schedule out.

In order to allow an architecture to make ftrace_replace_code()
schedulable, a new command flag is added called:

FTRACE_MAY_SLEEP

Which can be or'd to the command that is passed to
ftrace_modify_all_code() that calls ftrace_replace_code() and will have
it call cond_resched() in the loop that modifies the nops into the
calls to the ftrace trampolines.

Link: http://lkml.kernel.org/r/20181204192903.8193-1-anders.roxell@linaro.org
Link: http://lkml.kernel.org/r/20181205183303.828422192@goodmis.org

Reported-by: Anders Roxell <anders.roxell@linaro.org>
Tested-by: Anders Roxell <anders.roxell@linaro.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# b0e21a61 19-Nov-2018 Steven Rostedt (VMware) <rostedt@goodmis.org>

function_graph: Have profiler use new helper ftrace_graph_get_ret_stack()

The ret_stack processing is going to change, and that is going
to break anything that is accessing the ret_stack directly. One user is the
function graph profiler. By using the ftrace_graph_get_ret_stack() helper
function, the profiler can access the ret_stack entry without relying on the
implementation details of the stack itself.

Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 688f7089 15-Nov-2018 Steven Rostedt (VMware) <rostedt@goodmis.org>

fgraph: Add new fgraph_ops structure to enable function graph hooks

Currently the registering of function graph is to pass in a entry and return
function. We need to have a way to associate those functions together where
the entry can determine to run the return hook. Having a structure that
contains both functions will facilitate the process of converting the code
to be able to do such.

This is similar to the way function hooks are enabled (it passes in
ftrace_ops). Instead of passing in the functions to use, a single structure
is passed in to the registering function.

The unregister function is now passed in the fgraph_ops handle. When we
allow more than one callback to the function graph hooks, this will let the
system know which one to remove.

Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# e73e679f 14-Nov-2018 Steven Rostedt (VMware) <rostedt@goodmis.org>

fgraph: Move function graph specific code into fgraph.c

To make the function graph infrastructure more managable, the code needs to
be in its own file (fgraph.c). Move the code that is specific for managing
the function graph infrastructure out of ftrace.c and into fgraph.c

Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 3306fc4a 14-Nov-2018 Steven Rostedt (VMware) <rostedt@goodmis.org>

ftrace: Create new ftrace_internal.h header

In order to move function graph infrastructure into its own file (fgraph.h)
it needs to access various functions and variables in ftrace.c that are
currently static. Create a new file called ftrace-internal.h that holds the
function prototypes and the extern declarations of the variables needed by
fgraph.c as well, and make them global in ftrace.c such that they can be
used outside that file.

Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# b1b35f2e 19-Nov-2018 Steven Rostedt (VMware) <rostedt@goodmis.org>

function_graph: Have profiler use curr_ret_stack and not depth

The profiler uses trace->depth to find its entry on the ret_stack, but the
depth may not match the actual location of where its entry is (if an
interrupt were to preempt the processing of the profiler for another
function, the depth and the curr_ret_stack will be different).

Have it use the curr_ret_stack as the index to find its ret_stack entry
instead of using the depth variable, as that is no longer guaranteed to be
the same.

Cc: stable@kernel.org
Fixes: 03274a3ffb449 ("tracing/fgraph: Adjust fgraph depth before calling trace return callback")
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 39eb456d 19-Nov-2018 Steven Rostedt (VMware) <rostedt@goodmis.org>

function_graph: Use new curr_ret_depth to manage depth instead of curr_ret_stack

Currently, the depth of the ret_stack is determined by curr_ret_stack index.
The issue is that there's a race between setting of the curr_ret_stack and
calling of the callback attached to the return of the function.

Commit 03274a3ffb44 ("tracing/fgraph: Adjust fgraph depth before calling
trace return callback") moved the calling of the callback to after the
setting of the curr_ret_stack, even stating that it was safe to do so, when
in fact, it was the reason there was a barrier() there (yes, I should have
commented that barrier()).

Not only does the curr_ret_stack keep track of the current call graph depth,
it also keeps the ret_stack content from being overwritten by new data.

The function profiler, uses the "subtime" variable of ret_stack structure
and by moving the curr_ret_stack, it allows for interrupts to use the same
structure it was using, corrupting the data, and breaking the profiler.

To fix this, there needs to be two variables to handle the call stack depth
and the pointer to where the ret_stack is being used, as they need to change
at two different locations.

Cc: stable@kernel.org
Fixes: 03274a3ffb449 ("tracing/fgraph: Adjust fgraph depth before calling trace return callback")
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 74401729 06-Nov-2018 Paul E. McKenney <paulmck@kernel.org>

tracing: Replace synchronize_sched() and call_rcu_sched()

Now that synchronize_rcu() waits for preempt-disable regions of code
as well as RCU read-side critical sections, synchronize_sched() can
be replaced by synchronize_rcu(). Similarly, call_rcu_sched() can be
replaced by call_rcu(). This commit therefore makes these changes.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: <linux-kernel@vger.kernel.org>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# bcea3f96 16-Aug-2018 Steven Rostedt (VMware) <rostedt@goodmis.org>

tracing: Add SPDX License format tags to tracing files

Add the SPDX License header to ease license compliance management.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 2f4df001 06-Aug-2018 Vasily Gorbik <gor@linux.ibm.com>

tracing: Add -mcount-nop option support

-mcount-nop gcc option generates the calls to the profiling functions
as nops which allows to avoid patching mcount jump with NOP instructions
initially.

-mcount-nop gcc option will be activated if platform selects
HAVE_NOP_MCOUNT and gcc actually supports it.
In addition to that CC_USING_NOP_MCOUNT is defined and could be used by
architectures to adapt ftrace patching behavior.

Link: http://lkml.kernel.org/r/patch-3.thread-aa7b8d.git-e02ed2dc082b.your-ad-here.call-01533557518-ext-9465@work.hours

Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# b207de3e 09-Aug-2018 Colin Ian King <colin.king@canonical.com>

ftrace: Remove unused pointer ftrace_swapper_pid

Pointer ftrace_swapper_pid is defined but is never used hence it is
redundant and can be removed. The use of this variable was removed
in commit 345ddcc882d8 ("ftrace: Have set_ftrace_pid use the bitmap
like events do").

Cleans up clang warning:
warning: 'ftrace_swapper_pid' defined but not used [-Wunused-const-variable=]

Link: http://lkml.kernel.org/r/20180809125609.13142-1-colin.king@canonical.com

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 44ec3ec0 01-Aug-2018 Gustavo A. R. Silva <gustavo@embeddedor.com>

ftrace: Use true and false for boolean values in ops_references_rec()

Return statements in functions returning bool should use true or false
instead of an integer value.

This code was detected with the help of Coccinelle.

Link: http://lkml.kernel.org/r/20180802010056.GA31012@embeddedor.com

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 72809cbf 26-Jul-2018 Masami Hiramatsu <mhiramat@kernel.org>

tracing: Remove orphaned function ftrace_nr_registered_ops()

Remove ftrace_nr_registered_ops() because it is no longer used.

ftrace_nr_registered_ops() has been introduced by commit ea701f11da44
("ftrace: Add selftest to test function trace recursion protection"), but
its caller has been removed by commit 05cbbf643b8e ("tracing: Fix selftest
function recursion accounting"). So it is not called anymore.

Link: http://lkml.kernel.org/r/153260907227.12474.5234899025934963683.stgit@devbox

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 7b144b6c 26-Jul-2018 Masami Hiramatsu <mhiramat@kernel.org>

tracing: Remove orphaned function using_ftrace_ops_list_func().

Remove using_ftrace_ops_list_func() since it is no longer used.

Using ftrace_ops_list_func() has been introduced by commit 7eea4fce0246
("tracing/stack_trace: Skip 4 instead of 3 when using ftrace_ops_list_func")
as a helper function, but its caller has been removed by commit 72ac426a5bb0
("tracing: Clean up stack tracing and fix fentry updates"). So it is not
called anymore.

Link: http://lkml.kernel.org/r/153260904427.12474.9952096317439329851.stgit@devbox

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 5ccba64a 01-Feb-2018 Yisheng Xie <xieyisheng1@huawei.com>

ftrace: Nuke clear_ftrace_function

clear_ftrace_function is not used outside of ftrace.c and is not help to
use a function, so nuke it per Steve's suggestion.

Link: http://lkml.kernel.org/r/1517537689-34947-1-git-send-email-xieyisheng1@huawei.com

Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Yisheng Xie <xieyisheng1@huawei.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 6396bb22 12-Jun-2018 Kees Cook <keescook@chromium.org>

treewide: kzalloc() -> kcalloc()

The kzalloc() function has a 2-factor argument form, kcalloc(). This
patch replaces cases of:

kzalloc(a * b, gfp)

with:
kcalloc(a * b, gfp)

as well as handling cases of:

kzalloc(a * b * c, gfp)

with:

kzalloc(array3_size(a, b, c), gfp)

as it's slightly less ugly than:

kzalloc_array(array_size(a, b), c, gfp)

This does, however, attempt to ignore constant size factors like:

kzalloc(4 * 1024, gfp)

though any constants defined via macros get caught up in the conversion.

Any factors with a sizeof() of "unsigned char", "char", and "u8" were
dropped, since they're redundant.

The Coccinelle script used for this was:

// Fix redundant parens around sizeof().
@@
type TYPE;
expression THING, E;
@@

(
kzalloc(
- (sizeof(TYPE)) * E
+ sizeof(TYPE) * E
, ...)
|
kzalloc(
- (sizeof(THING)) * E
+ sizeof(THING) * E
, ...)
)

// Drop single-byte sizes and redundant parens.
@@
expression COUNT;
typedef u8;
typedef __u8;
@@

(
kzalloc(
- sizeof(u8) * (COUNT)
+ COUNT
, ...)
|
kzalloc(
- sizeof(__u8) * (COUNT)
+ COUNT
, ...)
|
kzalloc(
- sizeof(char) * (COUNT)
+ COUNT
, ...)
|
kzalloc(
- sizeof(unsigned char) * (COUNT)
+ COUNT
, ...)
|
kzalloc(
- sizeof(u8) * COUNT
+ COUNT
, ...)
|
kzalloc(
- sizeof(__u8) * COUNT
+ COUNT
, ...)
|
kzalloc(
- sizeof(char) * COUNT
+ COUNT
, ...)
|
kzalloc(
- sizeof(unsigned char) * COUNT
+ COUNT
, ...)
)

// 2-factor product with sizeof(type/expression) and identifier or constant.
@@
type TYPE;
expression THING;
identifier COUNT_ID;
constant COUNT_CONST;
@@

(
- kzalloc
+ kcalloc
(
- sizeof(TYPE) * (COUNT_ID)
+ COUNT_ID, sizeof(TYPE)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(TYPE) * COUNT_ID
+ COUNT_ID, sizeof(TYPE)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(TYPE) * (COUNT_CONST)
+ COUNT_CONST, sizeof(TYPE)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(TYPE) * COUNT_CONST
+ COUNT_CONST, sizeof(TYPE)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(THING) * (COUNT_ID)
+ COUNT_ID, sizeof(THING)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(THING) * COUNT_ID
+ COUNT_ID, sizeof(THING)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(THING) * (COUNT_CONST)
+ COUNT_CONST, sizeof(THING)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(THING) * COUNT_CONST
+ COUNT_CONST, sizeof(THING)
, ...)
)

// 2-factor product, only identifiers.
@@
identifier SIZE, COUNT;
@@

- kzalloc
+ kcalloc
(
- SIZE * COUNT
+ COUNT, SIZE
, ...)

// 3-factor product with 1 sizeof(type) or sizeof(expression), with
// redundant parens removed.
@@
expression THING;
identifier STRIDE, COUNT;
type TYPE;
@@

(
kzalloc(
- sizeof(TYPE) * (COUNT) * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
kzalloc(
- sizeof(TYPE) * (COUNT) * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
kzalloc(
- sizeof(TYPE) * COUNT * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
kzalloc(
- sizeof(TYPE) * COUNT * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
kzalloc(
- sizeof(THING) * (COUNT) * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
|
kzalloc(
- sizeof(THING) * (COUNT) * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
|
kzalloc(
- sizeof(THING) * COUNT * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
|
kzalloc(
- sizeof(THING) * COUNT * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
)

// 3-factor product with 2 sizeof(variable), with redundant parens removed.
@@
expression THING1, THING2;
identifier COUNT;
type TYPE1, TYPE2;
@@

(
kzalloc(
- sizeof(TYPE1) * sizeof(TYPE2) * COUNT
+ array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
, ...)
|
kzalloc(
- sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+ array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
, ...)
|
kzalloc(
- sizeof(THING1) * sizeof(THING2) * COUNT
+ array3_size(COUNT, sizeof(THING1), sizeof(THING2))
, ...)
|
kzalloc(
- sizeof(THING1) * sizeof(THING2) * (COUNT)
+ array3_size(COUNT, sizeof(THING1), sizeof(THING2))
, ...)
|
kzalloc(
- sizeof(TYPE1) * sizeof(THING2) * COUNT
+ array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
, ...)
|
kzalloc(
- sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+ array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
, ...)
)

// 3-factor product, only identifiers, with redundant parens removed.
@@
identifier STRIDE, SIZE, COUNT;
@@

(
kzalloc(
- (COUNT) * STRIDE * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kzalloc(
- COUNT * (STRIDE) * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kzalloc(
- COUNT * STRIDE * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kzalloc(
- (COUNT) * (STRIDE) * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kzalloc(
- COUNT * (STRIDE) * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kzalloc(
- (COUNT) * STRIDE * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kzalloc(
- (COUNT) * (STRIDE) * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kzalloc(
- COUNT * STRIDE * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
)

// Any remaining multi-factor products, first at least 3-factor products,
// when they're not all constants...
@@
expression E1, E2, E3;
constant C1, C2, C3;
@@

(
kzalloc(C1 * C2 * C3, ...)
|
kzalloc(
- (E1) * E2 * E3
+ array3_size(E1, E2, E3)
, ...)
|
kzalloc(
- (E1) * (E2) * E3
+ array3_size(E1, E2, E3)
, ...)
|
kzalloc(
- (E1) * (E2) * (E3)
+ array3_size(E1, E2, E3)
, ...)
|
kzalloc(
- E1 * E2 * E3
+ array3_size(E1, E2, E3)
, ...)
)

// And then all remaining 2 factors products when they're not all constants,
// keeping sizeof() as the second factor argument.
@@
expression THING, E1, E2;
type TYPE;
constant C1, C2, C3;
@@

(
kzalloc(sizeof(THING) * C2, ...)
|
kzalloc(sizeof(TYPE) * C2, ...)
|
kzalloc(C1 * C2 * C3, ...)
|
kzalloc(C1 * C2, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(TYPE) * (E2)
+ E2, sizeof(TYPE)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(TYPE) * E2
+ E2, sizeof(TYPE)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(THING) * (E2)
+ E2, sizeof(THING)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(THING) * E2
+ E2, sizeof(THING)
, ...)
|
- kzalloc
+ kcalloc
(
- (E1) * E2
+ E1, E2
, ...)
|
- kzalloc
+ kcalloc
(
- (E1) * (E2)
+ E1, E2
, ...)
|
- kzalloc
+ kcalloc
(
- E1 * E2
+ E1, E2
, ...)
)

Signed-off-by: Kees Cook <keescook@chromium.org>


# 6da2ec56 12-Jun-2018 Kees Cook <keescook@chromium.org>

treewide: kmalloc() -> kmalloc_array()

The kmalloc() function has a 2-factor argument form, kmalloc_array(). This
patch replaces cases of:

kmalloc(a * b, gfp)

with:
kmalloc_array(a * b, gfp)

as well as handling cases of:

kmalloc(a * b * c, gfp)

with:

kmalloc(array3_size(a, b, c), gfp)

as it's slightly less ugly than:

kmalloc_array(array_size(a, b), c, gfp)

This does, however, attempt to ignore constant size factors like:

kmalloc(4 * 1024, gfp)

though any constants defined via macros get caught up in the conversion.

Any factors with a sizeof() of "unsigned char", "char", and "u8" were
dropped, since they're redundant.

The tools/ directory was manually excluded, since it has its own
implementation of kmalloc().

The Coccinelle script used for this was:

// Fix redundant parens around sizeof().
@@
type TYPE;
expression THING, E;
@@

(
kmalloc(
- (sizeof(TYPE)) * E
+ sizeof(TYPE) * E
, ...)
|
kmalloc(
- (sizeof(THING)) * E
+ sizeof(THING) * E
, ...)
)

// Drop single-byte sizes and redundant parens.
@@
expression COUNT;
typedef u8;
typedef __u8;
@@

(
kmalloc(
- sizeof(u8) * (COUNT)
+ COUNT
, ...)
|
kmalloc(
- sizeof(__u8) * (COUNT)
+ COUNT
, ...)
|
kmalloc(
- sizeof(char) * (COUNT)
+ COUNT
, ...)
|
kmalloc(
- sizeof(unsigned char) * (COUNT)
+ COUNT
, ...)
|
kmalloc(
- sizeof(u8) * COUNT
+ COUNT
, ...)
|
kmalloc(
- sizeof(__u8) * COUNT
+ COUNT
, ...)
|
kmalloc(
- sizeof(char) * COUNT
+ COUNT
, ...)
|
kmalloc(
- sizeof(unsigned char) * COUNT
+ COUNT
, ...)
)

// 2-factor product with sizeof(type/expression) and identifier or constant.
@@
type TYPE;
expression THING;
identifier COUNT_ID;
constant COUNT_CONST;
@@

(
- kmalloc
+ kmalloc_array
(
- sizeof(TYPE) * (COUNT_ID)
+ COUNT_ID, sizeof(TYPE)
, ...)
|
- kmalloc
+ kmalloc_array
(
- sizeof(TYPE) * COUNT_ID
+ COUNT_ID, sizeof(TYPE)
, ...)
|
- kmalloc
+ kmalloc_array
(
- sizeof(TYPE) * (COUNT_CONST)
+ COUNT_CONST, sizeof(TYPE)
, ...)
|
- kmalloc
+ kmalloc_array
(
- sizeof(TYPE) * COUNT_CONST
+ COUNT_CONST, sizeof(TYPE)
, ...)
|
- kmalloc
+ kmalloc_array
(
- sizeof(THING) * (COUNT_ID)
+ COUNT_ID, sizeof(THING)
, ...)
|
- kmalloc
+ kmalloc_array
(
- sizeof(THING) * COUNT_ID
+ COUNT_ID, sizeof(THING)
, ...)
|
- kmalloc
+ kmalloc_array
(
- sizeof(THING) * (COUNT_CONST)
+ COUNT_CONST, sizeof(THING)
, ...)
|
- kmalloc
+ kmalloc_array
(
- sizeof(THING) * COUNT_CONST
+ COUNT_CONST, sizeof(THING)
, ...)
)

// 2-factor product, only identifiers.
@@
identifier SIZE, COUNT;
@@

- kmalloc
+ kmalloc_array
(
- SIZE * COUNT
+ COUNT, SIZE
, ...)

// 3-factor product with 1 sizeof(type) or sizeof(expression), with
// redundant parens removed.
@@
expression THING;
identifier STRIDE, COUNT;
type TYPE;
@@

(
kmalloc(
- sizeof(TYPE) * (COUNT) * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
kmalloc(
- sizeof(TYPE) * (COUNT) * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
kmalloc(
- sizeof(TYPE) * COUNT * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
kmalloc(
- sizeof(TYPE) * COUNT * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
kmalloc(
- sizeof(THING) * (COUNT) * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
|
kmalloc(
- sizeof(THING) * (COUNT) * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
|
kmalloc(
- sizeof(THING) * COUNT * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
|
kmalloc(
- sizeof(THING) * COUNT * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
)

// 3-factor product with 2 sizeof(variable), with redundant parens removed.
@@
expression THING1, THING2;
identifier COUNT;
type TYPE1, TYPE2;
@@

(
kmalloc(
- sizeof(TYPE1) * sizeof(TYPE2) * COUNT
+ array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
, ...)
|
kmalloc(
- sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+ array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
, ...)
|
kmalloc(
- sizeof(THING1) * sizeof(THING2) * COUNT
+ array3_size(COUNT, sizeof(THING1), sizeof(THING2))
, ...)
|
kmalloc(
- sizeof(THING1) * sizeof(THING2) * (COUNT)
+ array3_size(COUNT, sizeof(THING1), sizeof(THING2))
, ...)
|
kmalloc(
- sizeof(TYPE1) * sizeof(THING2) * COUNT
+ array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
, ...)
|
kmalloc(
- sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+ array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
, ...)
)

// 3-factor product, only identifiers, with redundant parens removed.
@@
identifier STRIDE, SIZE, COUNT;
@@

(
kmalloc(
- (COUNT) * STRIDE * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kmalloc(
- COUNT * (STRIDE) * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kmalloc(
- COUNT * STRIDE * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kmalloc(
- (COUNT) * (STRIDE) * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kmalloc(
- COUNT * (STRIDE) * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kmalloc(
- (COUNT) * STRIDE * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kmalloc(
- (COUNT) * (STRIDE) * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kmalloc(
- COUNT * STRIDE * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
)

// Any remaining multi-factor products, first at least 3-factor products,
// when they're not all constants...
@@
expression E1, E2, E3;
constant C1, C2, C3;
@@

(
kmalloc(C1 * C2 * C3, ...)
|
kmalloc(
- (E1) * E2 * E3
+ array3_size(E1, E2, E3)
, ...)
|
kmalloc(
- (E1) * (E2) * E3
+ array3_size(E1, E2, E3)
, ...)
|
kmalloc(
- (E1) * (E2) * (E3)
+ array3_size(E1, E2, E3)
, ...)
|
kmalloc(
- E1 * E2 * E3
+ array3_size(E1, E2, E3)
, ...)
)

// And then all remaining 2 factors products when they're not all constants,
// keeping sizeof() as the second factor argument.
@@
expression THING, E1, E2;
type TYPE;
constant C1, C2, C3;
@@

(
kmalloc(sizeof(THING) * C2, ...)
|
kmalloc(sizeof(TYPE) * C2, ...)
|
kmalloc(C1 * C2 * C3, ...)
|
kmalloc(C1 * C2, ...)
|
- kmalloc
+ kmalloc_array
(
- sizeof(TYPE) * (E2)
+ E2, sizeof(TYPE)
, ...)
|
- kmalloc
+ kmalloc_array
(
- sizeof(TYPE) * E2
+ E2, sizeof(TYPE)
, ...)
|
- kmalloc
+ kmalloc_array
(
- sizeof(THING) * (E2)
+ E2, sizeof(THING)
, ...)
|
- kmalloc
+ kmalloc_array
(
- sizeof(THING) * E2
+ E2, sizeof(THING)
, ...)
|
- kmalloc
+ kmalloc_array
(
- (E1) * E2
+ E1, E2
, ...)
|
- kmalloc
+ kmalloc_array
(
- (E1) * (E2)
+ E1, E2
, ...)
|
- kmalloc
+ kmalloc_array
(
- E1 * E2
+ E1, E2
, ...)
)

Signed-off-by: Kees Cook <keescook@chromium.org>


# 1ce0500d 03-Sep-2014 Chen LinX <linx.z.chen@intel.com>

ftrace: Have set_graph_* files have normal file modes

The set_graph_function and set_graph_notrace file mode should be 0644
instead of 0444 as they are writeable. Note, the mode appears to be ignored
regardless, but they should at least look sane.

Link: http://lkml.kernel.org/r/1409725869-4501-1-git-send-email-linx.z.chen@intel.com

Acked-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Chen LinX <linx.z.chen@intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 419e9fe5 30-Mar-2018 Salvatore Mesoraca <s.mesoraca16@gmail.com>

ftrace: Drop a VLA in module_exists()

Avoid a VLA by using a real constant expression instead of a variable.
The compiler should be able to optimize the original code and avoid using
an actual VLA. Anyway this change is useful because it will avoid a false
positive with -Wvla, it might also help the compiler generating better
code.

Link: http://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qPXydAacU1RqZWA@mail.gmail.com
Link: http://lkml.kernel.org/r/1522399988-8815-1-git-send-email-s.mesoraca16@gmail.com

Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 7b658656 05-Feb-2018 Steven Rostedt (VMware) <rostedt@goodmis.org>

ftrace: Remove incorrect setting of glob search field

__unregister_ftrace_function_probe() will incorrectly parse the glob filter
because it resets the search variable that was setup by filter_parse_regex().

Al Viro reported this:

After that call of filter_parse_regex() we could have func_g.search not
equal to glob only if glob started with '!' or '*'. In the former case
we would've buggered off with -EINVAL (not = 1). In the latter we
would've set func_g.search equal to glob + 1, calculated the length of
that thing in func_g.len and proceeded to reset func_g.search back to
glob.

Suppose the glob is e.g. *foo*. We end up with
func_g.type = MATCH_MIDDLE_ONLY;
func_g.len = 3;
func_g.search = "*foo";
Feeding that to ftrace_match_record() will not do anything sane - we
will be looking for names containing "*foo" (->len is ignored for that
one).

Link: http://lkml.kernel.org/r/20180127031706.GE13338@ZenIV.linux.org.uk

Cc: stable@vger.kernel.org
Fixes: 3ba009297149f ("ftrace: Introduce ftrace_glob structure")
Reviewed-by: Dmitry Safonov <0x7f454c46@gmail.com>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# f4d0706c 16-Jan-2018 Changbin Du <changbin.du@intel.com>

tracing: Make sure the parsed string always terminates with '\0'

Always mark the parsed string with a terminated nul '\0' character. This removes
the need for the users to have to append the '\0' before using the parsed string.

Link: http://lkml.kernel.org/r/1516093350-12045-4-git-send-email-changbin.du@intel.com

Acked-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Changbin Du <changbin.du@intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 6be7fa3c 22-Jan-2018 Steven Rostedt (VMware) <rostedt@goodmis.org>

ftrace, orc, x86: Handle ftrace dynamically allocated trampolines

The function tracer can create a dynamically allocated trampoline that is
called by the function mcount or fentry hook that is used to call the
function callback that is registered. The problem is that the orc undwinder
will bail if it encounters one of these trampolines. This breaks the stack
trace of function callbacks, which include the stack tracer and setting the
stack trace for individual functions.

Since these dynamic trampolines are basically copies of the static ftrace
trampolines defined in ftrace_*.S, we do not need to create new orc entries
for the dynamic trampolines. Finding the return address on the stack will be
identical as the functions that were copied to create the dynamic
trampolines. When encountering a ftrace dynamic trampoline, we can just use
the orc entry of the ftrace static function that was copied for that
trampoline.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# b3a88803 11-Oct-2017 Peter Zijlstra <peterz@infradead.org>

ftrace: Kill FTRACE_OPS_FL_PER_CPU

The one and only user of FTRACE_OPS_FL_PER_CPU is gone, remove the
lot.

Link: http://lkml.kernel.org/r/20171011080224.372422809@infradead.org

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 8715b108 09-Oct-2017 Joel Fernandes <joelaf@google.com>

ftrace: Clear hashes of stale ips of init memory

Filters should be cleared of init functions during freeing of init
memory when the ftrace dyn records are released. However in current
code, the filters are left as is. This patch clears the hashes of the
saved init functions when the init memory is freed. This fixes the
following issue reproducible with the following sequence of commands for
a test module:
================================================

void bar(void)
{
printk(KERN_INFO "bar!\n");
}

void foo(void)
{
printk(KERN_INFO "foo!\n");
bar();
}

static int __init hello_init(void)
{
printk(KERN_INFO "Hello world!\n");
foo();
return 0;
}

static void __exit hello_cleanup(void)
{
printk(KERN_INFO "Cleaning up module.\n");
}

module_init(hello_init);
module_exit(hello_cleanup);
================================================

Commands:
echo '*:mod:test' > /d/tracing/set_ftrace_filter
echo function > /d/tracing/current_tracer
modprobe test
rmmod test
sleep 1
modprobe test
cat /d/tracing/set_ftrace_filter

Behavior without patch: Init function is still in the filter
Expected behavior: Shouldn't have any of the filters set

Link: http://lkml.kernel.org/r/20171009192931.56401-1-joelaf@google.com

Signed-off-by: Joel Fernandes <joelaf@google.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 6171a031 06-Sep-2017 Steven Rostedt (VMware) <rostedt@goodmis.org>

ftrace/kallsyms: Have /proc/kallsyms show saved mod init functions

If a module is loaded while tracing is enabled, then there's a possibility
that the module init functions were traced. These functions have their name
and address stored by ftrace such that it can translate the function address
that is written into the buffer into a human readable function name.

As userspace tools may be doing the same, they need a way to map function
names to their address as well. This is done through reading /proc/kallsyms.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 6aa69784 05-Sep-2017 Steven Rostedt (VMware) <rostedt@goodmis.org>

ftrace: Add freeing algorithm to free ftrace_mod_maps

The ftrace_mod_map is a descriptor to save module init function names in
case they were traced, and the trace output needs to reference the function
name from the function address. But after the function is unloaded, it
the maps should be freed, as the rest of the function names are as well.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# aba4b5c2 01-Sep-2017 Steven Rostedt (VMware) <rostedt@goodmis.org>

ftrace: Save module init functions kallsyms symbols for tracing

If function tracing is active when the module init functions are freed, then
store them to be referenced by kallsyms. As module init functions can now be
traced on module load, they were useless:

># echo ':mod:snd_seq' > set_ftrace_filter
># echo function > current_tracer
># modprobe snd_seq
># cat trace
# tracer: function
#
# _-----=> irqs-off
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# ||| / delay
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
# | | | |||| | |
modprobe-2786 [000] .... 3189.037874: 0xffffffffa0860000 <-do_one_initcall
modprobe-2786 [000] .... 3189.037876: 0xffffffffa086004d <-0xffffffffa086000f
modprobe-2786 [000] .... 3189.037876: 0xffffffffa086010d <-0xffffffffa0860018
modprobe-2786 [000] .... 3189.037877: 0xffffffffa086011a <-0xffffffffa0860021
modprobe-2786 [000] .... 3189.037877: 0xffffffffa0860080 <-0xffffffffa086002a
modprobe-2786 [000] .... 3189.039523: 0xffffffffa0860400 <-0xffffffffa0860033
modprobe-2786 [000] .... 3189.039523: 0xffffffffa086038a <-0xffffffffa086041c
modprobe-2786 [000] .... 3189.039591: 0xffffffffa086038a <-0xffffffffa0860436
modprobe-2786 [000] .... 3189.039657: 0xffffffffa086038a <-0xffffffffa0860450
modprobe-2786 [000] .... 3189.039719: 0xffffffffa0860127 <-0xffffffffa086003c
modprobe-2786 [000] .... 3189.039742: snd_seq_create_kernel_client <-0xffffffffa08601f6

When the output is shown, the kallsyms for the module init functions have
already been freed, and the output of the trace can not convert them to
their function names.

Now this looks like this:

# tracer: function
#
# _-----=> irqs-off
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# ||| / delay
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
# | | | |||| | |
modprobe-2463 [002] .... 174.243237: alsa_seq_init <-do_one_initcall
modprobe-2463 [002] .... 174.243239: client_init_data <-alsa_seq_init
modprobe-2463 [002] .... 174.243240: snd_sequencer_memory_init <-alsa_seq_init
modprobe-2463 [002] .... 174.243240: snd_seq_queues_init <-alsa_seq_init
modprobe-2463 [002] .... 174.243240: snd_sequencer_device_init <-alsa_seq_init
modprobe-2463 [002] .... 174.244860: snd_seq_info_init <-alsa_seq_init
modprobe-2463 [002] .... 174.244861: create_info_entry <-snd_seq_info_init
modprobe-2463 [002] .... 174.244936: create_info_entry <-snd_seq_info_init
modprobe-2463 [002] .... 174.245003: create_info_entry <-snd_seq_info_init
modprobe-2463 [002] .... 174.245072: snd_seq_system_client_init <-alsa_seq_init
modprobe-2463 [002] .... 174.245094: snd_seq_create_kernel_client <-snd_seq_system_client_init

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 3e234289 03-Mar-2017 Steven Rostedt (VMware) <rostedt@goodmis.org>

ftrace: Allow module init functions to be traced

Allow for module init sections to be traced as well as core kernel init
sections. Now that filtering modules functions can be stored, for when they
are loaded, it makes sense to be able to trace them.

Cc: Jessica Yu <jeyu@kernel.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 6cafbe15 20-Jun-2017 Steven Rostedt (VMware) <rostedt@goodmis.org>

ftrace: Add a ftrace_free_mem() function for modules to use

In order to be able to trace module init functions, the module code needs to
tell ftrace what is being freed when the init sections are freed. Use the
code that the main init calls to tell ftrace to free the main init sections.
This requires passing in a start and end address to free.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 2b0b8499 11-Sep-2017 Shu Wang <shuwang@redhat.com>

ftrace: Fix kmemleak in unregister_ftrace_graph

The trampoline allocated by function tracer was overwriten by function_graph
tracer, and caused a memory leak. The save_global_trampoline should have
saved the previous trampoline in register_ftrace_graph() and restored it in
unregister_ftrace_graph(). But as it is implemented, save_global_trampoline was
only used in unregister_ftrace_graph as default value 0, and it overwrote the
previous trampoline's value. Causing the previous allocated trampoline to be
lost.

kmmeleak backtrace:
kmemleak_vmalloc+0x77/0xc0
__vmalloc_node_range+0x1b5/0x2c0
module_alloc+0x7c/0xd0
arch_ftrace_update_trampoline+0xb5/0x290
ftrace_startup+0x78/0x210
register_ftrace_function+0x8b/0xd0
function_trace_init+0x4f/0x80
tracing_set_tracer+0xe6/0x170
tracing_set_trace_write+0x90/0xd0
__vfs_write+0x37/0x170
vfs_write+0xb2/0x1b0
SyS_write+0x55/0xc0
do_syscall_64+0x67/0x180
return_from_SYSCALL_64+0x0/0x6a

[
Looking further into this, I found that this was left over from when the
function and function graph tracers shared the same ftrace_ops. But in
commit 5f151b2401 ("ftrace: Fix function_profiler and function tracer
together"), the two were separated, and the save_global_trampoline no
longer was necessary (and it may have been broken back then too).
-- Steven Rostedt
]

Link: http://lkml.kernel.org/r/20170912021454.5976-1-shuwang@redhat.com

Cc: stable@vger.kernel.org
Fixes: 5f151b2401 ("ftrace: Fix function_profiler and function tracer together")
Signed-off-by: Shu Wang <shuwang@redhat.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# edb096e0 31-Aug-2017 Steven Rostedt (VMware) <rostedt@goodmis.org>

ftrace: Fix memleak when unregistering dynamic ops when tracing disabled

If function tracing is disabled by the user via the function-trace option or
the proc sysctl file, and a ftrace_ops that was allocated on the heap is
unregistered, then the shutdown code exits out without doing the proper
clean up. This was found via kmemleak and running the ftrace selftests, as
one of the tests unregisters with function tracing disabled.

# cat kmemleak
unreferenced object 0xffffffffa0020000 (size 4096):
comm "swapper/0", pid 1, jiffies 4294668889 (age 569.209s)
hex dump (first 32 bytes):
55 ff 74 24 10 55 48 89 e5 ff 74 24 18 55 48 89 U.t$.UH...t$.UH.
e5 48 81 ec a8 00 00 00 48 89 44 24 50 48 89 4c .H......H.D$PH.L
backtrace:
[<ffffffff81d64665>] kmemleak_vmalloc+0x85/0xf0
[<ffffffff81355631>] __vmalloc_node_range+0x281/0x3e0
[<ffffffff8109697f>] module_alloc+0x4f/0x90
[<ffffffff81091170>] arch_ftrace_update_trampoline+0x160/0x420
[<ffffffff81249947>] ftrace_startup+0xe7/0x300
[<ffffffff81249bd2>] register_ftrace_function+0x72/0x90
[<ffffffff81263786>] trace_selftest_ops+0x204/0x397
[<ffffffff82bb8971>] trace_selftest_startup_function+0x394/0x624
[<ffffffff81263a75>] run_tracer_selftest+0x15c/0x1d7
[<ffffffff82bb83f1>] init_trace_selftests+0x75/0x192
[<ffffffff81002230>] do_one_initcall+0x90/0x1e2
[<ffffffff82b7d620>] kernel_init_freeable+0x350/0x3fe
[<ffffffff81d61ec3>] kernel_init+0x13/0x122
[<ffffffff81d72c6a>] ret_from_fork+0x2a/0x40
[<ffffffffffffffff>] 0xffffffffffffffff

Cc: stable@vger.kernel.org
Fixes: 12cce594fa ("ftrace/x86: Allow !CONFIG_PREEMPT dynamic ops to use allocated trampolines")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 2a5bfe47 31-Aug-2017 Steven Rostedt (VMware) <rostedt@goodmis.org>

ftrace: Zero out ftrace hashes when a module is removed

When a ftrace filter has a module function, and that module is removed, the
filter still has its address as being enabled. This can cause interesting
side effects. Nothing dangerous, but unwanted functions can be traced
because of it.

# cd /sys/kernel/tracing
# echo ':mod:snd_seq' > set_ftrace_filter
# cat set_ftrace_filter
snd_use_lock_sync_helper [snd_seq]
check_event_type_and_length [snd_seq]
snd_seq_ioctl_pversion [snd_seq]
snd_seq_ioctl_client_id [snd_seq]
snd_seq_ioctl_get_queue_tempo [snd_seq]
update_timestamp_of_queue [snd_seq]
snd_seq_ioctl_get_queue_status [snd_seq]
snd_seq_set_queue_tempo [snd_seq]
snd_seq_ioctl_set_queue_tempo [snd_seq]
snd_seq_ioctl_get_queue_timer [snd_seq]
seq_free_client1 [snd_seq]
[..]
# rmmod snd_seq
# cat set_ftrace_filter

# modprobe kvm
# cat set_ftrace_filter
kvm_set_cr4 [kvm]
kvm_emulate_hypercall [kvm]
kvm_set_dr [kvm]

This is because removing the snd_seq module after it was being filtered,
left the address of the snd_seq functions in the hash. When the kvm module
was loaded, some of its functions were loaded at the same address as the
snd_seq module. This would enable them to be filtered and traced.

Now we don't want to clear the hash completely. That would cause removing a
module where only its functions are filtered, to cause the tracing to enable
all functions, as an empty filter means to trace all functions. Instead,
just set the hash ip address to zero. Then it will never match any function.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# a8f0f9e4 17-Aug-2017 Steven Rostedt (VMware) <rostedt@goodmis.org>

ftrace: Check for null ret_stack on profile function graph entry function

There's a small race when function graph shutsdown and the calling of the
registered function graph entry callback. The callback must not reference
the task's ret_stack without first checking that it is not NULL. Note, when
a ret_stack is allocated for a task, it stays allocated until the task exits.
The problem here, is that function_graph is shutdown, and a new task was
created, which doesn't have its ret_stack allocated. But since some of the
functions are still being traced, the callbacks can still be called.

The normal function_graph code handles this, but starting with commit
8861dd303c ("ftrace: Access ret_stack->subtime only in the function
profiler") the profiler code references the ret_stack on function entry, but
doesn't check if it is NULL first.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=196611

Cc: stable@vger.kernel.org
Fixes: 8861dd303c ("ftrace: Access ret_stack->subtime only in the function profiler")
Reported-by: lilydjwg@gmail.com
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# f86f4180 07-Jun-2017 Chunyan Zhang <zhang.chunyan@linaro.org>

trace: fix the errors caused by incompatible type of RCU variables

The variables which are processed by RCU functions should be annotated
as RCU, otherwise sparse will report the errors like below:

"error: incompatible types in comparison expression (different
address spaces)"

Link: http://lkml.kernel.org/r/1496823171-7758-1-git-send-email-zhang.chunyan@linaro.org

Signed-off-by: Chunyan Zhang <zhang.chunyan@linaro.org>
[ Updated to not be 100% 80 column strict ]
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 2e028c4f 12-Jul-2017 Dan Carpenter <dan.carpenter@oracle.com>

ftrace: Fix uninitialized variable in match_records()

My static checker complains that if "func" is NULL then "clear_filter"
is uninitialized. This seems like it could be true, although it's
possible something subtle is happening that I haven't seen.

kernel/trace/ftrace.c:3844 match_records()
error: uninitialized symbol 'clear_filter'.

Link: http://lkml.kernel.org/r/20170712073556.h6tkpjcdzjaozozs@mwanda

Cc: stable@vger.kernel.org
Fixes: f0a3b154bd7 ("ftrace: Clarify code for mod command")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 44925dff 12-Jul-2017 Dan Carpenter <dan.carpenter@oracle.com>

ftrace: Remove an unneeded NULL check

"func" can't be NULL and it doesn't make sense to check because we've
already derefenced it.

Link: http://lkml.kernel.org/r/20170712073340.4enzeojeoupuds5a@mwanda

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 69449bbd 10-Jul-2017 Arnd Bergmann <arnd@arndb.de>

ftrace: Hide cached module code for !CONFIG_MODULES

When modules are disabled, we get a harmless build warning:

kernel/trace/ftrace.c:4051:13: error: 'process_cached_mods' defined but not used [-Werror=unused-function]

This adds the same #ifdef around the new code that exists around
its caller.

Link: http://lkml.kernel.org/r/20170710084413.1820568-1-arnd@arndb.de

Fixes: d7fbf8df7ca0 ("ftrace: Implement cached modules tracing on module load")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 69d71879 05-Jul-2017 Steven Rostedt (VMware) <rostedt@goodmis.org>

ftrace: Test for NULL iter->tr in regex for stack_trace_filter changes

As writing into stack_trace_filter, the iter-tr is not set and is NULL.
Check if it is NULL before dereferencing it in ftrace_regex_release().

Fixes: 8c08f0d5c6fb ("ftrace: Have cached module filters be an active filter")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 0f179765 29-Jun-2017 Steven Rostedt (VMware) <rostedt@goodmis.org>

ftrace: Fix regression with module command in stack_trace_filter

When doing the following command:

# echo ":mod:kvm_intel" > /sys/kernel/tracing/stack_trace_filter

it triggered a crash.

This happened with the clean up of probes. It required all callers to the
regex function (doing ftrace filtering) to have ops->private be a pointer to
a trace_array. But for the stack tracer, that is not the case.

Allow for the ops->private to be NULL, and change the function command
callbacks to handle the trace_array pointer being NULL as well.

Fixes: d2afd57a4b96 ("tracing/ftrace: Allow instances to have their own function probes")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 4ec78467 28-Jun-2017 Steven Rostedt (VMware) <rostedt@goodmis.org>

ftrace: Decrement count for dyn_ftrace_total_info for init functions

Init boot up functions may be traced, but they are also freed when the
kernel finishes booting. These are removed from the ftrace tables, and the
debug variable for dyn_ftrace_total_info needs to reflect that as well.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 3b58a3c7 28-Jun-2017 Steven Rostedt (VMware) <rostedt@goodmis.org>

ftrace: Unlock hash mutex on failed allocation in process_mod_list()

If the new_hash fails to allocate, then unlock the hash mutex on error.

Reported-by: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 83dd1493 27-Jun-2017 Steven Rostedt (VMware) <rostedt@goodmis.org>

ftrace: Decrement count for dyn_ftrace_total_info file

The dyn_ftrace_total_info file is used to show how many functions have been
converted into nops and can be used by ftrace. The problem is that it does
not get decremented when functions are removed (init boot code being freed,
and modules being freed). That means the number is very inaccurate everytime
functions are removed from the ftrace tables. Decrement it when functions
are removed.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 8c08f0d5 26-Jun-2017 Steven Rostedt (VMware) <rostedt@goodmis.org>

ftrace: Have cached module filters be an active filter

When a module filter is added to set_ftrace_filter, if the module is not
loaded, it is cached. This should be considered an active filter, and
function tracing should be filtered by this. That is, if a cached module
filter is the only filter set, then no function tracing should be happening,
as all the functions available will be filtered out.

This makes sense, as the reason to add a cached module filter, is to trace
the module when you load it. There shouldn't be any other tracing happening
until then.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# d7fbf8df 26-Jun-2017 Steven Rostedt (VMware) <rostedt@goodmis.org>

ftrace: Implement cached modules tracing on module load

If a module is cached in the set_ftrace_filter, and that module is loaded,
then enable tracing on that module as if the cached module text was written
into set_ftrace_filter just as the module is loaded.

# echo ":mod:kvm_intel" >
# cat /sys/kernel/tracing/set_ftrace_filter
#### all functions enabled ####
:mod:kvm_intel
# modprobe kvm_intel
# cat /sys/kernel/tracing/set_ftrace_filter
vmx_get_rflags [kvm_intel]
vmx_get_pkru [kvm_intel]
vmx_get_interrupt_shadow [kvm_intel]
vmx_rdtscp_supported [kvm_intel]
vmx_invpcid_supported [kvm_intel]
[..]

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 5985ea8b 23-Jun-2017 Steven Rostedt (VMware) <rostedt@goodmis.org>

ftrace: Have the cached module list show in set_ftrace_filter

When writing in a module filter into set_ftrace_filter for a module that is
not yet loaded, it it cached, and will be executed when the module is loaded
(although that is not implemented yet at this commit). Display the list of
cached modules to be traced.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 673feb9d 23-Jun-2017 Steven Rostedt (VMware) <rostedt@goodmis.org>

ftrace: Add :mod: caching infrastructure to trace_array

This is the start of the infrastructure work to allow for tracing module
functions before it is loaded.

Currently the following command:

# echo :mod:some-mod > set_ftrace_filter

will enable tracing of all functions within the module "some-mod" if it is
loaded. What we want, is if the module is not loaded, that line will be
saved. When the module is loaded, then the "some-mod" will have that line
executed on it, so that the functions within it starts being traced.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# f9797c2f 25-May-2017 Luis Henriques <lhenriques@suse.com>

ftrace: Fix memory leak in ftrace_graph_release()

ftrace_hash is being kfree'ed in ftrace_graph_release(), however the
->buckets field is not. This results in a memory leak that is easily
captured by kmemleak:

unreferenced object 0xffff880038afe000 (size 8192):
comm "trace-cmd", pid 238, jiffies 4294916898 (age 9.736s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<ffffffff815f561e>] kmemleak_alloc+0x4e/0xb0
[<ffffffff8113964d>] __kmalloc+0x12d/0x1a0
[<ffffffff810bf6d1>] alloc_ftrace_hash+0x51/0x80
[<ffffffff810c0523>] __ftrace_graph_open.isra.39.constprop.46+0xa3/0x100
[<ffffffff810c05e8>] ftrace_graph_open+0x68/0xa0
[<ffffffff8114003d>] do_dentry_open.isra.1+0x1bd/0x2d0
[<ffffffff81140df7>] vfs_open+0x47/0x60
[<ffffffff81150f95>] path_openat+0x2a5/0x1020
[<ffffffff81152d6a>] do_filp_open+0x8a/0xf0
[<ffffffff811411df>] do_sys_open+0x12f/0x200
[<ffffffff811412ce>] SyS_open+0x1e/0x20
[<ffffffff815fa6e0>] entry_SYSCALL_64_fastpath+0x13/0x94
[<ffffffffffffffff>] 0xffffffffffffffff

Link: http://lkml.kernel.org/r/20170525152038.7661-1-lhenriques@suse.com

Cc: stable@vger.kernel.org
Fixes: b9b0c831bed2 ("ftrace: Convert graph filter to use hash tables")
Signed-off-by: Luis Henriques <lhenriques@suse.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# a0e6369e 16-May-2017 Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>

ftrace/instances: Clear function triggers when removing instances

If instance directories are deleted while there are registered function
triggers:

# cd /sys/kernel/debug/tracing/instances
# mkdir test
# echo "schedule:enable_event:sched:sched_switch" > test/set_ftrace_filter
# rmdir test
Unable to handle kernel paging request for data at address 0x00000008
Unable to handle kernel paging request for data at address 0x00000008
Faulting instruction address: 0xc0000000021edde8
Oops: Kernel access of bad area, sig: 11 [#1]
SMP NR_CPUS=2048
NUMA
pSeries
Modules linked in: iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp tun bridge stp llc kvm iptable_filter fuse binfmt_misc pseries_rng rng_core vmx_crypto ib_iser rdma_cm iw_cm ib_cm ib_core libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c multipath virtio_net virtio_blk virtio_pci crc32c_vpmsum virtio_ring virtio
CPU: 8 PID: 8694 Comm: rmdir Not tainted 4.11.0-nnr+ #113
task: c0000000bab52800 task.stack: c0000000baba0000
NIP: c0000000021edde8 LR: c0000000021f0590 CTR: c000000002119620
REGS: c0000000baba3870 TRAP: 0300 Not tainted (4.11.0-nnr+)
MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE>
CR: 22002422 XER: 20000000
CFAR: 00007fffabb725a8 DAR: 0000000000000008 DSISR: 40000000 SOFTE: 0
GPR00: c00000000220f750 c0000000baba3af0 c000000003157e00 0000000000000000
GPR04: 0000000000000040 00000000000000eb 0000000000000040 0000000000000000
GPR08: 0000000000000000 0000000000000113 0000000000000000 c00000000305db98
GPR12: c000000002119620 c00000000fd42c00 0000000000000000 0000000000000000
GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR20: 0000000000000000 0000000000000000 c0000000bab52e90 0000000000000000
GPR24: 0000000000000000 00000000000000eb 0000000000000040 c0000000baba3bb0
GPR28: c00000009cb06eb0 c0000000bab52800 c00000009cb06eb0 c0000000baba3bb0
NIP [c0000000021edde8] ring_buffer_lock_reserve+0x8/0x4e0
LR [c0000000021f0590] trace_event_buffer_lock_reserve+0xe0/0x1a0
Call Trace:
[c0000000baba3af0] [c0000000021f96c8] trace_event_buffer_commit+0x1b8/0x280 (unreliable)
[c0000000baba3b60] [c00000000220f750] trace_event_buffer_reserve+0x80/0xd0
[c0000000baba3b90] [c0000000021196b8] trace_event_raw_event_sched_switch+0x98/0x180
[c0000000baba3c10] [c0000000029d9980] __schedule+0x6e0/0xab0
[c0000000baba3ce0] [c000000002122230] do_task_dead+0x70/0xc0
[c0000000baba3d10] [c0000000020ea9c8] do_exit+0x828/0xd00
[c0000000baba3dd0] [c0000000020eaf70] do_group_exit+0x60/0x100
[c0000000baba3e10] [c0000000020eb034] SyS_exit_group+0x24/0x30
[c0000000baba3e30] [c00000000200bcec] system_call+0x38/0x54
Instruction dump:
60000000 60420000 7d244b78 7f63db78 4bffaa09 393efff8 793e0020 39200000
4bfffecc 60420000 3c4c00f7 3842a020 <81230008> 2f890000 409e02f0 a14d0008
---[ end trace b917b8985d0e650b ]---
Unable to handle kernel paging request for data at address 0x00000008
Faulting instruction address: 0xc0000000021edde8
Unable to handle kernel paging request for data at address 0x00000008
Faulting instruction address: 0xc0000000021edde8
Faulting instruction address: 0xc0000000021edde8

To address this, let's clear all registered function probes before
deleting the ftrace instance.

Link: http://lkml.kernel.org/r/c5f1ca624043690bd94642bb6bffd3f2fc504035.1494956770.git.naveen.n.rao@linux.vnet.ibm.com

Reported-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# cbab567c 16-May-2017 Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>

ftrace: Simplify glob handling in unregister_ftrace_function_probe_func()

Handle a NULL glob properly and simplify the check.

Link: http://lkml.kernel.org/r/5df74d4ffb4721db6d5a22fa08ca031d62ead493.1494956770.git.naveen.n.rao@linux.vnet.ibm.com

Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 77c0edde 03-May-2017 Steven Rostedt (VMware) <rostedt@goodmis.org>

ftrace: Simplify ftrace_match_record() even more

Dan Carpenter sent a patch to remove a check in ftrace_match_record()
because the logic of the code made the check redundant. I looked deeper into
the code, and made the following logic table, with the three variables and
the result of the original code.

modname mod_matches exclude_mod result
------- ----------- ----------- ------
0 0 0 return 0
0 0 1 func_match
0 1 * < cannot exist >
1 0 0 return 0
1 0 1 func_match
1 1 0 func_match
1 1 1 return 0

Notice that when mod_matches == exclude mod, the result is always to
return 0, and when mod_matches != exclude_mod, then the result is to test
the function. This means we only need test if mod_matches is equal to
exclude_mod.

Cc: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 31805c90 31-Mar-2017 Dan Carpenter <dan.carpenter@oracle.com>

ftrace: Remove an unneeded condition

We know that "mod_matches" is true here so there is no need to check
again.

Link: http://lkml.kernel.org/r/20170331152130.GA4947@mwanda

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# d2afd57a 20-Apr-2017 Steven Rostedt (VMware) <rostedt@goodmis.org>

tracing/ftrace: Allow instances to have their own function probes

Pass around the local trace_array that is the descriptor for tracing
instances, when enabling and disabling probes. This by default sets the
enable/disable of event probe triggers to work with instances.

The other probes will need some more work to get them working with
instances.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 6e444319 19-Apr-2017 Steven Rostedt (VMware) <rostedt@goodmis.org>

tracing/ftrace: Add a better way to pass data via the probe functions

With the redesign of the registration and execution of the function probes
(triggers), data can now be passed from the setup of the probe to the probe
callers that are specific to the trace_array it is on. Although, all probes
still only affect the toplevel trace array, this change will allow for
instances to have their own probes separated from other instances and the
top array.

That is, something like the stacktrace probe can be set to trace only in an
instance and not the toplevel trace array. This isn't implement yet, but
this change sets the ground work for the change.

When a probe callback is triggered (someone writes the probe format into
set_ftrace_filter), it calls register_ftrace_function_probe() passing in
init_data that will be used to initialize the probe. Then for every matching
function, register_ftrace_function_probe() will call the probe_ops->init()
function with the init data that was passed to it, as well as an address to
a place holder that is associated with the probe and the instance. The first
occurrence will have a NULL in the pointer. The init() function will then
initialize it. If other probes are added, or more functions are part of the
probe, the place holder will be passed to the init() function with the place
holder data that it was initialized to the last time.

Then this place_holder is passed to each of the other probe_ops functions,
where it can be used in the function callback. When the probe_ops free()
function is called, it can be called either with the rip of the function
that is being removed from the probe, or zero, indicating that there are no
more functions attached to the probe, and the place holder is about to be
freed. This gives the probe_ops a way to free the data it assigned to the
place holder if it was allocade during the first init call.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 7b60f3d8 18-Apr-2017 Steven Rostedt (VMware) <rostedt@goodmis.org>

ftrace: Dynamically create the probe ftrace_ops for the trace_array

In order to eventually have each trace_array instance have its own unique
set of function probes (triggers), the trace array needs to hold the ops and
the filters for the probes.

This is the first step to accomplish this. Instead of having the private
data of the probe ops point to the trace_array, create a separate list that
the trace_array holds. There's only one private_data for a probe, we need
one per trace_array. The probe ftrace_ops will be dynamically created for
each instance, instead of being static.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# b5f081b5 10-Apr-2017 Steven Rostedt (VMware) <rostedt@goodmis.org>

tracing: Pass the trace_array into ftrace_probe_ops functions

Pass the trace_array associated to a ftrace_probe_ops into the probe_ops
func(), init() and free() functions. The trace_array is the descriptor that
describes a tracing instance. This will help create the infrastructure that
will allow having function probes unique to tracing instances.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 04ec7bb6 05-Apr-2017 Steven Rostedt (VMware) <rostedt@goodmis.org>

tracing: Have the trace_array hold the list of registered func probes

Add a link list to the trace_array to hold func probes that are registered.
Currently, all function probes are the same for all instances as it was
before, that is, only the top level trace_array holds the function probes.
But this lays the ground work to have function probes be attached to
individual instances, and having the event trigger only affect events in the
given instance. But that work is still to be done.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 8d70725e 05-Apr-2017 Steven Rostedt (VMware) <rostedt@goodmis.org>

ftrace: If the hash for a probe fails to update then free what was initialized

If the ftrace_hash_move_and_update_ops() fails, and an ops->free() function
exists, then it needs to be called on all the ops that were added by this
registration.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# eee8ded1 04-Apr-2017 Steven Rostedt (VMware) <rostedt@goodmis.org>

ftrace: Have the function probes call their own function

Now that the function probes have their own ftrace_ops, there's no reason to
continue using the ftrace_func_hash to find which probe to call in the
function callback. The ops that is passed in to the function callback is
part of the probe_ops to call.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 1ec3a81a 04-Apr-2017 Steven Rostedt (VMware) <rostedt@goodmis.org>

ftrace: Have each function probe use its own ftrace_ops

Have the function probes have their own ftrace_ops, and remove the
trace_probe_ops. This simplifies some of the ftrace infrastructure code.

Individual entries for each function is still allocated for the use of the
output for set_ftrace_filter, but they will be removed soon too.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# d3d532d7 04-Apr-2017 Steven Rostedt (VMware) <rostedt@goodmis.org>

ftrace: Have unregister_ftrace_function_probe_func() return a value

Currently unregister_ftrace_function_probe_func() is a void function. It
does not give any feedback if an error occurred or no item was found to
remove and nothing was done.

Change it to return status and success if it removed something. Also update
the callers to return that feedback to the user.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# e16b35dd 04-Apr-2017 Steven Rostedt (VMware) <rostedt@goodmis.org>

ftrace: Add helper function ftrace_hash_move_and_update_ops()

The processes of updating a ops filter_hash is a bit complex, and requires
setting up an old hash to perform the update. This is done exactly the same
in two locations for the same reasons. Create a helper function that does it
in one place.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 1a48df00 04-Apr-2017 Steven Rostedt (VMware) <rostedt@goodmis.org>

ftrace: Remove data field from ftrace_func_probe structure

No users of the function probes uses the data field anymore. Remove it, and
change the init function to take a void *data parameter instead of a
void **data, because the init will just get the data that the registering
function was received, and there's no state after it is called.

The other functions for ftrace_probe_ops still take the data parameter, but
it will currently only be passed NULL. It will stay as a parameter for
future data to be passed to these functions.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 02b77e2a 04-Apr-2017 Steven Rostedt (VMware) <rostedt@goodmis.org>

ftrace: Remove printing of data in showing of a function probe

None of the probe users uses the data field anymore of the entry. They all
have their own print() function. Remove showing the data field in the
generic function as the data field will be going away.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 78f78e07 04-Apr-2017 Steven Rostedt (VMware) <rostedt@goodmis.org>

ftrace: Remove unused unregister_ftrace_function_probe_all() function

There are no users of unregister_ftrace_function_probe_all(). The only probe
function that is used is unregister_ftrace_function_probe_func(). Rename the
internal static function __unregister_ftrace_function_probe() to
unregister_ftrace_function_probe_func() and make it global.

Also remove the PROBE_TEST_FUNC as it would be always set.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 0fe7e7e3 04-Apr-2017 Steven Rostedt (VMware) <rostedt@goodmis.org>

ftrace: Remove unused unregister_ftrace_function_probe() function

Nothing calls unregister_ftrace_function_probe(). Remove it as well as the
flag PROBE_TEST_DATA, as this function was the only one to set it.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 41794f19 03-Apr-2017 Steven Rostedt (VMware) <rostedt@goodmis.org>

ftrace: Added ftrace_func_mapper for function probe triggers

In order to move the ops to the function probes directly, they need a way to
map function ips to their own data without depending on the infrastructure
of the function probes, as the data field will be going away.

New helper functions are added that are based on the ftrace_hash code.
ftrace_func_mapper functions are there to let the probes map ips to their
data. These can be allocated by the probe ops, and referenced in the
function callbacks.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# bca6c8d0 03-Apr-2017 Steven Rostedt (VMware) <rostedt@goodmis.org>

ftrace: Pass probe ops to probe function

In preparation to cleaning up the probe function registration code, the
"data" parameter will eventually be removed from the probe->func() call.
Instead it will receive its own "ops" function, in which it can set up its
own data that it needs to map.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# e51a9896 03-Apr-2017 Steven Rostedt (VMware) <rostedt@goodmis.org>

ftrace: Remove unused "flags" field from struct ftrace_func_probe

Nothing uses "flags" in the ftrace_func_probe descriptor. Remove it.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 1e10486f 16-Apr-2017 Namhyung Kim <namhyung@kernel.org>

ftrace: Add 'function-fork' trace option

The function-fork option is same as event-fork that it tracks task
fork/exit and set the pid filter properly. This can be useful if user
wants to trace selected tasks including their children only.

Link: http://lkml.kernel.org/r/20170417024430.21194-3-namhyung@kernel.org

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# d879d0b8 16-Apr-2017 Namhyung Kim <namhyung@kernel.org>

ftrace: Fix function pid filter on instances

When function tracer has a pid filter, it adds a probe to sched_switch
to track if current task can be ignored. The probe checks the
ftrace_ignore_pid from current tr to filter tasks. But it misses to
delete the probe when removing an instance so that it can cause a crash
due to the invalid tr pointer (use-after-free).

This is easily reproducible with the following:

# cd /sys/kernel/debug/tracing
# mkdir instances/buggy
# echo $$ > instances/buggy/set_ftrace_pid
# rmdir instances/buggy

============================================================================
BUG: KASAN: use-after-free in ftrace_filter_pid_sched_switch_probe+0x3d/0x90
Read of size 8 by task kworker/0:1/17
CPU: 0 PID: 17 Comm: kworker/0:1 Tainted: G B 4.11.0-rc3 #198
Call Trace:
dump_stack+0x68/0x9f
kasan_object_err+0x21/0x70
kasan_report.part.1+0x22b/0x500
? ftrace_filter_pid_sched_switch_probe+0x3d/0x90
kasan_report+0x25/0x30
__asan_load8+0x5e/0x70
ftrace_filter_pid_sched_switch_probe+0x3d/0x90
? fpid_start+0x130/0x130
__schedule+0x571/0xce0
...

To fix it, use ftrace_clear_pids() to unregister the probe. As
instance_rmdir() already updated ftrace codes, it can just free the
filter safely.

Link: http://lkml.kernel.org/r/20170417024430.21194-2-namhyung@kernel.org

Fixes: 0c8916c34203 ("tracing: Add rmdir to remove multibuffer instances")
Cc: Ingo Molnar <mingo@kernel.org>
Cc: stable@vger.kernel.org
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# fcdc7125 17-Apr-2017 Steven Rostedt (VMware) <rostedt@goodmis.org>

ftrace: Fix indexing of t_hash_start() from t_next()

t_hash_start() does not increment *pos, where as t_next() must. But when
t_next() does increment *pos, it must still pass in the original *pos to
t_hash_start() otherwise it will skip the first instance:

# cd /sys/kernel/debug/tracing
# echo schedule:traceoff > set_ftrace_filter
# echo do_IRQ:traceoff > set_ftrace_filter
# echo call_rcu > set_ftrace_filter
# cat set_ftrace_filter
call_rcu
schedule:traceoff:unlimited
do_IRQ:traceoff:unlimited

The above called t_hash_start() from t_start() as there was only one
function (call_rcu), but if we add another function:

# echo xfrm_policy_destroy_rcu >> set_ftrace_filter
# cat set_ftrace_filter
call_rcu
xfrm_policy_destroy_rcu
do_IRQ:traceoff:unlimited

The "schedule:traceoff" disappears.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# acceb72e 14-Apr-2017 Steven Rostedt (VMware) <rostedt@goodmis.org>

ftrace: Fix removing of second function probe

When two function probes are added to set_ftrace_filter, and then one of
them is removed, the update to the function locations is not performed, and
the record keeping of the function states are corrupted, and causes an
ftrace_bug() to occur.

This is easily reproducable by adding two probes, removing one, and then
adding it back again.

# cd /sys/kernel/debug/tracing
# echo schedule:traceoff > set_ftrace_filter
# echo do_IRQ:traceoff > set_ftrace_filter
# echo \!do_IRQ:traceoff > /debug/tracing/set_ftrace_filter
# echo do_IRQ:traceoff > set_ftrace_filter

Causes:
------------[ cut here ]------------
WARNING: CPU: 2 PID: 1098 at kernel/trace/ftrace.c:2369 ftrace_get_addr_curr+0x143/0x220
Modules linked in: [...]
CPU: 2 PID: 1098 Comm: bash Not tainted 4.10.0-test+ #405
Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v02.05 05/07/2012
Call Trace:
dump_stack+0x68/0x9f
__warn+0x111/0x130
? trace_irq_work_interrupt+0xa0/0xa0
warn_slowpath_null+0x1d/0x20
ftrace_get_addr_curr+0x143/0x220
? __fentry__+0x10/0x10
ftrace_replace_code+0xe3/0x4f0
? ftrace_int3_handler+0x90/0x90
? printk+0x99/0xb5
? 0xffffffff81000000
ftrace_modify_all_code+0x97/0x110
arch_ftrace_update_code+0x10/0x20
ftrace_run_update_code+0x1c/0x60
ftrace_run_modify_code.isra.48.constprop.62+0x8e/0xd0
register_ftrace_function_probe+0x4b6/0x590
? ftrace_startup+0x310/0x310
? debug_lockdep_rcu_enabled.part.4+0x1a/0x30
? update_stack_state+0x88/0x110
? ftrace_regex_write.isra.43.part.44+0x1d3/0x320
? preempt_count_sub+0x18/0xd0
? mutex_lock_nested+0x104/0x800
? ftrace_regex_write.isra.43.part.44+0x1d3/0x320
? __unwind_start+0x1c0/0x1c0
? _mutex_lock_nest_lock+0x800/0x800
ftrace_trace_probe_callback.isra.3+0xc0/0x130
? func_set_flag+0xe0/0xe0
? __lock_acquire+0x642/0x1790
? __might_fault+0x1e/0x20
? trace_get_user+0x398/0x470
? strcmp+0x35/0x60
ftrace_trace_onoff_callback+0x48/0x70
ftrace_regex_write.isra.43.part.44+0x251/0x320
? match_records+0x420/0x420
ftrace_filter_write+0x2b/0x30
__vfs_write+0xd7/0x330
? do_loop_readv_writev+0x120/0x120
? locks_remove_posix+0x90/0x2f0
? do_lock_file_wait+0x160/0x160
? __lock_is_held+0x93/0x100
? rcu_read_lock_sched_held+0x5c/0xb0
? preempt_count_sub+0x18/0xd0
? __sb_start_write+0x10a/0x230
? vfs_write+0x222/0x240
vfs_write+0xef/0x240
SyS_write+0xab/0x130
? SyS_read+0x130/0x130
? trace_hardirqs_on_caller+0x182/0x280
? trace_hardirqs_on_thunk+0x1a/0x1c
entry_SYSCALL_64_fastpath+0x18/0xad
RIP: 0033:0x7fe61c157c30
RSP: 002b:00007ffe87890258 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: ffffffff8114a410 RCX: 00007fe61c157c30
RDX: 0000000000000010 RSI: 000055814798f5e0 RDI: 0000000000000001
RBP: ffff8800c9027f98 R08: 00007fe61c422740 R09: 00007fe61ca53700
R10: 0000000000000073 R11: 0000000000000246 R12: 0000558147a36400
R13: 00007ffe8788f160 R14: 0000000000000024 R15: 00007ffe8788f15c
? trace_hardirqs_off_caller+0xc0/0x110
---[ end trace 99fa09b3d9869c2c ]---
Bad trampoline accounting at: ffffffff81cc3b00 (do_IRQ+0x0/0x150)

Cc: stable@vger.kernel.org
Fixes: 59df055f1991 ("ftrace: trace different functions with a different tracer")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 82cc4fc2 14-Apr-2017 Steven Rostedt (VMware) <rostedt@goodmis.org>

ftrace: Fix removing of second function probe

When two function probes are added to set_ftrace_filter, and then one of
them is removed, the update to the function locations is not performed, and
the record keeping of the function states are corrupted, and causes an
ftrace_bug() to occur.

This is easily reproducable by adding two probes, removing one, and then
adding it back again.

# cd /sys/kernel/debug/tracing
# echo schedule:traceoff > set_ftrace_filter
# echo do_IRQ:traceoff > set_ftrace_filter
# echo \!do_IRQ:traceoff > /debug/tracing/set_ftrace_filter
# echo do_IRQ:traceoff > set_ftrace_filter

Causes:
------------[ cut here ]------------
WARNING: CPU: 2 PID: 1098 at kernel/trace/ftrace.c:2369 ftrace_get_addr_curr+0x143/0x220
Modules linked in: [...]
CPU: 2 PID: 1098 Comm: bash Not tainted 4.10.0-test+ #405
Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v02.05 05/07/2012
Call Trace:
dump_stack+0x68/0x9f
__warn+0x111/0x130
? trace_irq_work_interrupt+0xa0/0xa0
warn_slowpath_null+0x1d/0x20
ftrace_get_addr_curr+0x143/0x220
? __fentry__+0x10/0x10
ftrace_replace_code+0xe3/0x4f0
? ftrace_int3_handler+0x90/0x90
? printk+0x99/0xb5
? 0xffffffff81000000
ftrace_modify_all_code+0x97/0x110
arch_ftrace_update_code+0x10/0x20
ftrace_run_update_code+0x1c/0x60
ftrace_run_modify_code.isra.48.constprop.62+0x8e/0xd0
register_ftrace_function_probe+0x4b6/0x590
? ftrace_startup+0x310/0x310
? debug_lockdep_rcu_enabled.part.4+0x1a/0x30
? update_stack_state+0x88/0x110
? ftrace_regex_write.isra.43.part.44+0x1d3/0x320
? preempt_count_sub+0x18/0xd0
? mutex_lock_nested+0x104/0x800
? ftrace_regex_write.isra.43.part.44+0x1d3/0x320
? __unwind_start+0x1c0/0x1c0
? _mutex_lock_nest_lock+0x800/0x800
ftrace_trace_probe_callback.isra.3+0xc0/0x130
? func_set_flag+0xe0/0xe0
? __lock_acquire+0x642/0x1790
? __might_fault+0x1e/0x20
? trace_get_user+0x398/0x470
? strcmp+0x35/0x60
ftrace_trace_onoff_callback+0x48/0x70
ftrace_regex_write.isra.43.part.44+0x251/0x320
? match_records+0x420/0x420
ftrace_filter_write+0x2b/0x30
__vfs_write+0xd7/0x330
? do_loop_readv_writev+0x120/0x120
? locks_remove_posix+0x90/0x2f0
? do_lock_file_wait+0x160/0x160
? __lock_is_held+0x93/0x100
? rcu_read_lock_sched_held+0x5c/0xb0
? preempt_count_sub+0x18/0xd0
? __sb_start_write+0x10a/0x230
? vfs_write+0x222/0x240
vfs_write+0xef/0x240
SyS_write+0xab/0x130
? SyS_read+0x130/0x130
? trace_hardirqs_on_caller+0x182/0x280
? trace_hardirqs_on_thunk+0x1a/0x1c
entry_SYSCALL_64_fastpath+0x18/0xad
RIP: 0033:0x7fe61c157c30
RSP: 002b:00007ffe87890258 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: ffffffff8114a410 RCX: 00007fe61c157c30
RDX: 0000000000000010 RSI: 000055814798f5e0 RDI: 0000000000000001
RBP: ffff8800c9027f98 R08: 00007fe61c422740 R09: 00007fe61ca53700
R10: 0000000000000073 R11: 0000000000000246 R12: 0000558147a36400
R13: 00007ffe8788f160 R14: 0000000000000024 R15: 00007ffe8788f15c
? trace_hardirqs_off_caller+0xc0/0x110
---[ end trace 99fa09b3d9869c2c ]---
Bad trampoline accounting at: ffffffff81cc3b00 (do_IRQ+0x0/0x150)

Cc: stable@vger.kernel.org
Fixes: 59df055f1991 ("ftrace: trace different functions with a different tracer")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 0598e4f0 06-Apr-2017 Steven Rostedt (VMware) <rostedt@goodmis.org>

ftrace: Add use of synchronize_rcu_tasks() with dynamic trampolines

The function tracer needs to be more careful than other subsystems when it
comes to freeing data. Especially if that data is actually executable code.
When a single function is traced, a trampoline can be dynamically allocated
which is called to jump to the function trace callback. When the callback is
no longer needed, the dynamic allocated trampoline needs to be freed. This
is where the issues arise. The dynamically allocated trampoline must not be
used again. As function tracing can trace all subsystems, including
subsystems that are used to serialize aspects of freeing (namely RCU), it
must take extra care when doing the freeing.

Before synchronize_rcu_tasks() was around, there was no way for the function
tracer to know that nothing was using the dynamically allocated trampoline
when CONFIG_PREEMPT was enabled. That's because a task could be indefinitely
preempted while sitting on the trampoline. Now with synchronize_rcu_tasks(),
it will wait till all tasks have either voluntarily scheduled (not on the
trampoline) or goes into userspace (not on the trampoline). Then it is safe
to free the trampoline even with CONFIG_PREEMPT set.

Acked-by: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# b80f0f6c 02-Apr-2017 Steven Rostedt (VMware) <rostedt@goodmis.org>

ftrace: Have init/main.c call ftrace directly to free init memory

Relying on free_reserved_area() to call ftrace to free init memory proved to
not be sufficient. The issue is that on x86, when debug_pagealloc is
enabled, the init memory is not freed, but simply set as not present. Since
ftrace was uninformed of this, starting function tracing still tries to
update pages that are not present according to the page tables, causing
ftrace to bug, as well as killing the kernel itself.

Instead of relying on free_reserved_area(), have init/main.c call ftrace
directly just before it frees the init memory. Then it needs to use
__init_begin and __init_end to know where the init memory location is.
Looking at all archs (and testing what I can), it appears that this should
work for each of them.

Reported-by: kernel test robot <xiaolong.ye@intel.com>
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 5bd84629 29-Mar-2017 Steven Rostedt (VMware) <rostedt@goodmis.org>

ftrace: Create separate t_func_next() to simplify the function / hash logic

I noticed that if I use dd to read the set_ftrace_filter file that the first
hash command is repeated.

# cd /sys/kernel/debug/tracing
# echo schedule > set_ftrace_filter
# echo do_IRQ >> set_ftrace_filter
# echo schedule:traceoff >> set_ftrace_filter
# echo do_IRQ:traceoff >> set_ftrace_filter

# cat set_ftrace_filter
schedule
do_IRQ
schedule:traceoff:unlimited
do_IRQ:traceoff:unlimited

# dd if=set_ftrace_filter bs=1
schedule
do_IRQ
schedule:traceoff:unlimited
schedule:traceoff:unlimited
do_IRQ:traceoff:unlimited
98+0 records in
98+0 records out
98 bytes copied, 0.00265011 s, 37.0 kB/s

This is due to the way t_start() calls t_next() as well as the seq_file
calls t_next() and the state is slightly different between the two. Namely,
t_start() will call t_next() with a local "pos" variable.

By separating out the function listing from t_next() into its own function,
we can have better control of outputting the functions and the hash of
triggers. This simplifies the code.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 43ff926a 30-Mar-2017 Steven Rostedt (VMware) <rostedt@goodmis.org>

ftrace: Update func_pos in t_start() when all functions are enabled

If all functions are enabled, there's a comment displayed in the file to
denote that:

# cd /sys/kernel/debug/tracing
# cat set_ftrace_filter
#### all functions enabled ####

If a function trigger is set, those are displayed as well:

# echo schedule:traceoff >> /debug/tracing/set_ftrace_filter
# cat set_ftrace_filter
#### all functions enabled ####
schedule:traceoff:unlimited

But if you read that file with dd, the output can change:

# dd if=/debug/tracing/set_ftrace_filter bs=1
#### all functions enabled ####
32+0 records in
32+0 records out
32 bytes copied, 7.0237e-05 s, 456 kB/s

This is because the "pos" variable is updated for the comment, but func_pos
is not. "func_pos" is used by the triggers (or hashes) to know how many
functions were printed and it bases its index from the pos - func_pos.
func_pos should be 1 to count for the comment printed. But since it is not,
t_hash_start() thinks that one trigger was already printed.

The cat gets to t_hash_start() via t_next() and not t_start() which updates
both pos and func_pos.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 2d71d989 29-Mar-2017 Steven Rostedt (VMware) <rostedt@goodmis.org>

ftrace: Return NULL at end of t_start() instead of calling t_hash_start()

The loop in t_start() of calling t_next() will call t_hash_start() if the
pos is beyond the functions and enters the hash items. There's no reason to
check if p is NULL and call t_hash_start(), as that would be redundant.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# c20489da 29-Mar-2017 Steven Rostedt (VMware) <rostedt@goodmis.org>

ftrace: Assign iter->hash to filter or notrace hashes on seq read

Instead of testing if the hash to use is the filter_hash or the notrace_hash
at each iteration, do the test at open, and set the iter->hash to point to
the corresponding filter or notrace hash. Then use that directly instead of
testing which hash needs to be used each iteration.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# c1bc5919 29-Mar-2017 Steven Rostedt (VMware) <rostedt@goodmis.org>

ftrace: Clean up __seq_open_private() return check

The return status check of __seq_open_private() is rather strange:

iter = __seq_open_private();
if (iter) {
/* do stuff */
}

return iter ? 0 : -ENOMEM;

It makes much more sense to do the return of failure right away:

iter = __seq_open_private();
if (!iter)
return -ENOMEM;

/* do stuff */

return 0;

This clean up will make updates to this code a bit nicer.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 42c269c8 03-Mar-2017 Steven Rostedt (VMware) <rostedt@goodmis.org>

ftrace: Allow for function tracing to record init functions on boot up

Adding a hook into free_reserve_area() that informs ftrace that boot up init
text is being free, lets ftrace safely remove those init functions from its
records, which keeps ftrace from trying to modify text that no longer
exists.

Note, this still does not allow for tracing .init text of modules, as
modules require different work for freeing its init code.

Link: http://lkml.kernel.org/r/1488502497.7212.24.camel@linux.intel.com

Cc: linux-mm@kvack.org
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Requested-by: Todd Brandt <todd.e.brandt@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 65a50c65 02-Mar-2017 Todd Brandt <todd.e.brandt@linux.intel.com>

ftrace/graph: Add ftrace_graph_max_depth kernel parameter

Early trace callgraphs can be extremely large on systems with
several seconds of boot time. The max_depth parameter limits how
deep the graph trace goes and reduces the output size. This
parameter is the same as the max_graph_depth file in tracefs.

Link: http://lkml.kernel.org/r/1488499935-23216-1-git-send-email-todd.e.brandt@linux.intel.com

Signed-off-by: Todd Brandt <todd.e.brandt@linux.intel.com>
[ changed comments about debugfs to tracefs ]
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 92ad18ec 01-Mar-2017 Steven Rostedt (VMware) <rostedt@goodmis.org>

ftrace/graph: Do not modify the EMPTY_HASH for the function_graph filter

On boot up, if the kernel command line sets a graph funtion with the kernel
command line options "ftrace_graph_filter" or "ftrace_graph_notrace" then it
updates the corresponding function graph hash, ftrace_graph_hash or
ftrace_graph_notrace_hash respectively. Unfortunately, at boot up, these
variables are pointers to the "EMPTY_HASH" which is a constant used as a
placeholder when a hash has no entities. The problem was that the comand
line version to set the hashes updated the actual EMPTY_HASH instead of
creating a new hash for the function graph. This broke the EMPTY_HASH
because not only did it modify a constant (not sure how that was allowed to
happen, except maybe because it was done at early boot, const variables were
still mutable), but it made the filters have functions listed in them when
they were actually empty.

The kernel command line function needs to allocate a new hash for the
function graph filters and assign the necessary variables to that new hash
instead.

Link: http://lkml.kernel.org/r/1488420091.7212.17.camel@linux.intel.com

Cc: Namhyung Kim <namhyung@kernel.org>
Fixes: b9b0c831bed2 ("ftrace: Convert graph filter to use hash tables")
Reported-by: Todd Brandt <todd.e.brandt@linux.intel.com>
Tested-by: Todd Brandt <todd.e.brandt@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 29930025 08-Feb-2017 Ingo Molnar <mingo@kernel.org>

sched/headers: Prepare for new header dependencies before moving code to <linux/sched/task.h>

We are going to split <linux/sched/task.h> out of <linux/sched.h>, which
will have to be picked up from other headers and a couple of .c files.

Create a trivial placeholder <linux/sched/task.h> file that just
maps to <linux/sched.h> to make this patch obviously correct and
bisectable.

Include the new header in the files that are going to need it.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 3a150df9 21-Feb-2017 Chunyu Hu <chuhu@redhat.com>

tracing: Fix code comment for ftrace_ops_get_func()

There is no function 'ftrace_ops_recurs_func' existing in the current code,
it was renamed to ftrace_ops_assist_func() in commit c68c0fa29341
("ftrace: Have ftrace_ops_get_func() handle RCU and PER_CPU flags too").
Update the comment to the correct function name.

Link: http://lkml.kernel.org/r/1487723366-14463-1-git-send-email-chuhu@redhat.com

Signed-off-by: Chunyu Hu <chuhu@redhat.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# e704eff3 02-Feb-2017 Steven Rostedt (VMware) <rostedt@goodmis.org>

ftrace: Have set_graph_function handle multiple functions in one write

Currently, only one function can be written to set_graph_function and
set_graph_notrace. The last function in the list will have saved, even
though other functions will be added then removed.

Change the behavior to be the same as set_ftrace_function as to allow
multiple functions to be written. If any one fails, none of them will be
added. The addition of the functions are done at the end when the file is
closed.

Acked-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 649b988b 02-Feb-2017 Steven Rostedt (VMware) <rostedt@goodmis.org>

ftrace: Do not hold references of ftrace_graph_{notrace_}hash out of graph_lock

The hashs ftrace_graph_hash and ftrace_graph_notrace_hash are modified
within the graph_lock being held. Holding a pointer to them and passing them
along can lead to a use of a stale pointer (fgd->hash). Move assigning the
pointer and its use to within the holding of the lock. Note, it's an
rcu_sched protected data, and other instances of referencing them are done
with preemption disabled. But the file manipuation code must be protected by
the lock.

The fgd->hash pointer is set to NULL when the lock is being released.

Acked-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# ae98d27a 02-Feb-2017 Steven Rostedt (VMware) <rostedt@goodmis.org>

ftrace: Have set_graph_functions handle write with RDWR

Since reading the set_graph_functions uses seq functions, which sets the
file->private_data pointer to a seq_file descriptor. On writes the
ftrace_graph_data descriptor is set to file->private_data. But if the file
is opened for RDWR, the ftrace_graph_write() will incorrectly use the
file->private_data descriptor instead of
((struct seq_file *)file->private_data)->private pointer, and this can crash
the kernel.

Acked-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# d4ad9a1c 02-Feb-2017 Steven Rostedt (VMware) <rostedt@goodmis.org>

ftrace: Reset fgd->hash in ftrace_graph_write()

fgd->hash is saved and then freed, but is never reset to either
ftrace_graph_hash nor ftrace_graph_notrace_hash. But if multiple writes are
performed, then the freed hash could be accessed again.

# cd /sys/kernel/debug/tracing
# head -1000 available_filter_functions > /tmp/funcs
# cat /tmp/funcs > set_graph_function

Causes:

general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC
Modules linked in: [...]
CPU: 2 PID: 1337 Comm: cat Not tainted 4.10.0-rc2-test-00010-g6b052e9 #32
Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v02.05 05/07/2012
task: ffff880113a12200 task.stack: ffffc90001940000
RIP: 0010:free_ftrace_hash+0x7c/0x160
RSP: 0018:ffffc90001943db0 EFLAGS: 00010246
RAX: 6b6b6b6b6b6b6b6b RBX: 6b6b6b6b6b6b6b6b RCX: 6b6b6b6b6b6b6b6b
RDX: 0000000000000002 RSI: 0000000000000001 RDI: ffff8800ce1e1d40
RBP: ffff8800ce1e1d50 R08: 0000000000000000 R09: 0000000000006400
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: ffff8800ce1e1d40 R14: 0000000000004000 R15: 0000000000000001
FS: 00007f9408a07740(0000) GS:ffff88011e500000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000aee1f0 CR3: 0000000116bb4000 CR4: 00000000001406e0
Call Trace:
? ftrace_graph_write+0x150/0x190
? __vfs_write+0x1f6/0x210
? __audit_syscall_entry+0x17f/0x200
? rw_verify_area+0xdb/0x210
? _cond_resched+0x2b/0x50
? __sb_start_write+0xb4/0x130
? vfs_write+0x1c8/0x330
? SyS_write+0x62/0xf0
? do_syscall_64+0xa3/0x1b0
? entry_SYSCALL64_slow_path+0x25/0x25
Code: 01 48 85 db 0f 84 92 00 00 00 b8 01 00 00 00 d3 e0 85 c0 7e 3f 83 e8 01 48 8d 6f 10 45 31 e4 4c 8d 34 c5 08 00 00 00 49 8b 45 08 <4a> 8b 34 20 48 85 f6 74 13 48 8b 1e 48 89 ef e8 20 fa ff ff 48
RIP: free_ftrace_hash+0x7c/0x160 RSP: ffffc90001943db0
---[ end trace 999b48216bf4b393 ]---

Acked-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 555fc781 02-Feb-2017 Steven Rostedt (VMware) <rostedt@goodmis.org>

ftrace: Replace (void *)1 with a meaningful macro name FTRACE_GRAPH_EMPTY

When the set_graph_function or set_graph_notrace contains no records, a
banner is displayed of either "#### all functions enabled ####" or
"#### all functions disabled ####" respectively. To tell the seq operations
to do this, (void *)1 is passed as a return value. Instead of using a
hardcoded meaningless variable, define it as a macro.

Acked-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 2b2c279c 01-Feb-2017 Steven Rostedt (VMware) <rostedt@goodmis.org>

ftrace: Create a slight optimization on searching the ftrace_hash

This is a micro-optimization, but as it has to deal with a fast path of the
function tracer, these optimizations can be noticed.

The ftrace_lookup_ip() returns true if the given ip is found in the hash. If
it's not found or the hash is NULL, it returns false. But there's some cases
that a NULL hash is a true, and the ftrace_hash_empty() is tested before
calling ftrace_lookup_ip() in those cases. But as ftrace_lookup_ip() tests
that first, that adds a few extra unneeded instructions in those cases.

A new static "always_inlined" function is created that does not perform the
hash empty test. This most only be used by callers that do the check first
anyway, as an empty or NULL hash could cause a crash if a lookup is
performed on it.

Also add kernel doc for the ftrace_lookup_ip() main function.

Acked-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 2b0cce0e 31-Jan-2017 Steven Rostedt (VMware) <rostedt@goodmis.org>

tracing: Add ftrace_hash_key() helper function

Replace the couple of use cases that has small logic to produce the ftrace
function key id with a helper function. No need for duplicate code.

Acked-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# b9b0c831 19-Jan-2017 Namhyung Kim <namhyung@kernel.org>

ftrace: Convert graph filter to use hash tables

Use ftrace_hash instead of a static array of a fixed size. This is
useful when a graph filter pattern matches to a large number of
functions. Now hash lookup is done with preemption disabled to protect
from the hash being changed/freed.

Link: http://lkml.kernel.org/r/20170120024447.26097-3-namhyung@kernel.org

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 4046bf02 19-Jan-2017 Namhyung Kim <namhyung@kernel.org>

ftrace: Expose ftrace_hash_empty and ftrace_lookup_ip

It will be used when checking graph filter hashes later.

Link: http://lkml.kernel.org/r/20170120024447.26097-2-namhyung@kernel.org

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
[ Moved ftrace_hash dec and functions outside of FUNCTION_GRAPH define ]
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 3e278c0d 19-Jan-2017 Namhyung Kim <namhyung@kernel.org>

ftrace: Factor out __ftrace_hash_move()

The __ftrace_hash_move() is to allocates properly-sized hash and move
entries in the src ftrace_hash. It will be used to set function graph
filters which has nothing to do with the dyn_ftrace records.

Link: http://lkml.kernel.org/r/20170120024447.26097-1-namhyung@kernel.org

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# a5a1d1c2 21-Dec-2016 Thomas Gleixner <tglx@linutronix.de>

clocksource: Use a plain u64 instead of cycle_t

There is no point in having an extra type for extra confusion. u64 is
unambiguous.

Conversion was done with the following coccinelle script:

@rem@
@@
-typedef u64 cycle_t;

@fix@
typedef cycle_t;
@@
-cycle_t
+u64

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: John Stultz <john.stultz@linaro.org>


# d032ae89 15-Nov-2016 Joel Fernandes <joelaf@google.com>

ftrace: Provide API to use global filtering for ftrace ops

Currently the global_ops filtering hash is not available to outside users
registering for function tracing. Provide an API for those users to be
able to choose global filtering.

This is in preparation for pstore's ftrace feature to be able to
use the global filters.

Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Anton Vorontsov <anton@enomsg.org>
Cc: Colin Cross <ccross@android.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Joel Fernandes <joelaf@google.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Kees Cook <keescook@chromium.org>


# 60f1d5e3 05-Oct-2016 Masami Hiramatsu <mhiramat@kernel.org>

ftrace: Support full glob matching

Use glob_match() to support flexible glob wildcards (*,?)
and character classes ([) for ftrace.
Since the full glob matching is slower than the current
partial matching routines(*pat, pat*, *pat*), this leaves
those routines and just add MATCH_GLOB for complex glob
expression.

e.g.
----
[root@localhost tracing]# echo 'sched*group' > set_ftrace_filter
[root@localhost tracing]# cat set_ftrace_filter
sched_free_group
sched_change_group
sched_create_group
sched_online_group
sched_destroy_group
sched_offline_group
[root@localhost tracing]# echo '[Ss]y[Ss]_*' > set_ftrace_filter
[root@localhost tracing]# head set_ftrace_filter
sys_arch_prctl
sys_rt_sigreturn
sys_ioperm
SyS_iopl
sys_modify_ldt
SyS_mmap
SyS_set_thread_area
SyS_get_thread_area
SyS_set_tid_address
sys_fork
----

Link: http://lkml.kernel.org/r/147566869501.29136.6462645009894738056.stgit@devbox

Acked-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 546fece4 14-Nov-2016 Steven Rostedt (Red Hat) <rostedt@goodmis.org>

ftrace: Add more checks for FTRACE_FL_DISABLED in processing ip records

When a module is first loaded and its function ip records are added to the
ftrace list of functions to modify, they are set to DISABLED, as their text
is still in a read only state. When the module is fully loaded, and can be
updated, the flag is cleared, and if their's any functions that should be
tracing them, it is updated at that moment.

But there's several locations that do record accounting and should ignore
records that are marked as disabled, or they can cause issues.

Alexei already fixed one location, but others need to be addressed.

Cc: stable@vger.kernel.org
Fixes: b7ffffbb46f2 "ftrace: Add infrastructure for delayed enabling of module functions"
Reported-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 977c1f9c 07-Nov-2016 Alexei Starovoitov <ast@kernel.org>

ftrace: Ignore FTRACE_FL_DISABLED while walking dyn_ftrace records

ftrace_shutdown() checks for sanity of ftrace records
and if dyn_ftrace->flags is not zero, it will warn.
It can happen that 'flags' are set to FTRACE_FL_DISABLED at this point,
since some module was loaded, but before ftrace_module_enable()
cleared the flags for this module.

In other words the module.c is doing:
ftrace_module_init(mod); // calls ftrace_update_code() that sets flags=FTRACE_FL_DISABLED
... // here ftrace_shutdown() is called that warns, since
err = prepare_coming_module(mod); // didn't have a chance to clear FTRACE_FL_DISABLED

Fix it by ignoring disabled records.
It's similar to what __ftrace_hash_rec_update() is already doing.

Link: http://lkml.kernel.org/r/1478560460-3818619-1-git-send-email-ast@fb.com

Cc: stable@vger.kernel.org
Fixes: b7ffffbb46f2 "ftrace: Add infrastructure for delayed enabling of module functions"
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 8861dd30 30-Aug-2016 Namhyung Kim <namhyung@kernel.org>

ftrace: Access ret_stack->subtime only in the function profiler

The subtime is used only for function profiler with function graph
tracer enabled. Move the definition of subtime under
CONFIG_FUNCTION_PROFILER to reduce the memory usage. Also move the
initialization of subtime into the graph entry callback.

Link: http://lkml.kernel.org/r/20160831025529.24018-1-namhyung@kernel.org

Cc: Ingo Molnar <mingo@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 501c2375 05-Jul-2016 Steven Rostedt (Red Hat) <rostedt@goodmis.org>

ftrace: Move toplevel init out of ftrace_init_tracefs()

Commit 345ddcc882d8 ("ftrace: Have set_ftrace_pid use the bitmap like events
do") placed ftrace_init_tracefs into the instance creation, and encapsulated
the top level updating with an if conditional, as the top level only gets
updated at boot up. Unfortunately, this triggers section mismatch errors as
the init functions are called from a function that can be called later, and
the section mismatch logic is unaware of the if conditional that would
prevent it from happening at run time.

To make everyone happy, create a separate ftrace_init_tracefs_toplevel()
routine that only gets called by init functions, and this will be what calls
other init functions for the toplevel directory.

Link: http://lkml.kernel.org/r/20160704102139.19cbc0d9@gandalf.local.home

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Reported-by: Arnd Bergmann <arnd@arndb.de>
Fixes: 345ddcc882d8 ("ftrace: Have set_ftrace_pid use the bitmap like events do")
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 345ddcc8 22-Apr-2016 Steven Rostedt (Red Hat) <rostedt@goodmis.org>

ftrace: Have set_ftrace_pid use the bitmap like events do

Convert set_ftrace_pid to use the bitmap like set_event_pid does. This
allows for instances to use the pid filtering as well, and will allow for
function-fork option to set if the children of a traced function should be
traced or not.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 6112a300 17-May-2016 Soumya PN <soumya.p.n@hpe.com>

ftrace: Don't disable irqs when taking the tasklist_lock read_lock

In ftrace.c inside the function alloc_retstack_tasklist() (which will be
invoked when function_graph tracing is on) the tasklist_lock is being
held as reader while iterating through a list of threads. Here the lock
is being held as reader with irqs disabled. The tasklist_lock is never
write_locked in interrupt context so it is safe to not disable interrupts
for the duration of read_lock in this block which, can be significant,
given the block of code iterates through all threads. Hence changing the
code to call read_lock() and read_unlock() instead of read_lock_irqsave()
and read_unlock_irqrestore().

A similar change was made in commits: 8063e41d2ffc ("tracing: Change
syscall_*regfunc() to check PF_KTHREAD and use for_each_process_thread()")'
and 3472eaa1f12e ("sched: normalize_rt_tasks(): Don't use _irqsave for
tasklist_lock, use task_rq_lock()")'

Link: http://lkml.kernel.org/r/1463500874-77480-1-git-send-email-soumya.p.n@hpe.com

Signed-off-by: Soumya PN <soumya.p.n@hpe.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 7132e2d6 25-Apr-2016 Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>

ftrace: Match dot symbols when searching functions on ppc64

In the ppc64 big endian ABI, function symbols point to function
descriptors. The symbols which point to the function entry points
have a dot in front of the function name. Consequently, when the
ftrace filter mechanism searches for the symbol corresponding to
an entry point address, it gets the dot symbol.

As a result, ftrace filter users have to be aware of this ABI detail on
ppc64 and prepend a dot to the function name when setting the filter.

The perf probe command insulates the user from this by ignoring the dot
in front of the symbol name when matching function names to symbols,
but the sysfs interface does not. This patch makes the ftrace filter
mechanism do the same when searching symbols.

Fixes the following failure in ftracetest's kprobe_ftrace.tc:

.../kprobe_ftrace.tc: line 9: echo: write error: Invalid argument

That failure is on this line of kprobe_ftrace.tc:

echo _do_fork > set_ftrace_filter

This is because there's no _do_fork entry in the functions list:

# cat available_filter_functions | grep _do_fork
._do_fork

This change introduces no regressions on the perf and ftracetest
testsuite results.

Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 04cf31a7 24-Mar-2016 Michael Ellerman <mpe@ellerman.id.au>

ftrace: Make ftrace_location_range() global

In order to support live patching on powerpc we would like to call
ftrace_location_range(), so make it global.

Signed-off-by: Torsten Duwe <duwe@suse.de>
Signed-off-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# a395d6a7 22-Mar-2016 Joe Perches <joe@perches.com>

kernel/...: convert pr_warning to pr_warn

Use the more common logging method with the eventual goal of removing
pr_warning altogether.

Miscellanea:

- Realign arguments
- Coalesce formats
- Add missing space between a few coalesced formats

Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> [kernel/power/suspend.c]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 6363c6b5 15-Mar-2016 Geliang Tang <geliangtang@163.com>

ftrace: Use kasprintf() in ftrace_profile_tracefs()

Use kasprintf() instead of kmalloc() and snprintf().

Link: http://lkml.kernel.org/r/135a7bc36e51fd9eaa57124dd2140285b771f738.1458050835.git.geliangtang@163.com

Acked-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Geliang Tang <geliangtang@163.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 7f50d06b 16-Mar-2016 Jiri Olsa <jolsa@kernel.org>

ftrace: Update dynamic ftrace calls only if necessary

Currently dynamic ftrace calls are updated any time
the ftrace_ops is un/registered. If we do this update
only when it's needed, we save lot of time for perf
system wide ftrace function sampling/counting.

The reason is that for system wide sampling/counting,
perf creates event for each cpu in the system.

Each event then registers separate copy of ftrace_ops,
which ends up in FTRACE_UPDATE_CALLS updates. On servers
with many cpus that means serious stall (240 cpus server):

Counting:
# time ./perf stat -e ftrace:function -a sleep 1

Performance counter stats for 'system wide':

370,663 ftrace:function

1.401427505 seconds time elapsed

real 3m51.743s
user 0m0.023s
sys 3m48.569s

Sampling:
# time ./perf record -e ftrace:function -a sleep 1
[ perf record: Woken up 0 times to write data ]
Warning:
Processed 141200 events and lost 5 chunks!

[ perf record: Captured and wrote 10.703 MB perf.data (135950 samples) ]

real 2m31.429s
user 0m0.213s
sys 2m29.494s

There's no reason to do the FTRACE_UPDATE_CALLS update
for each event in perf case, because all the ftrace_ops
always share the same filter, so the updated calls are
always the same.

It's required that only first ftrace_ops registration
does the FTRACE_UPDATE_CALLS update (also sometimes
the second if the first one used the trampoline), but
the rest can be only cheaply linked into the ftrace_ops
list.

Counting:
# time ./perf stat -e ftrace:function -a sleep 1

Performance counter stats for 'system wide':

398,571 ftrace:function

1.377503733 seconds time elapsed

real 0m2.787s
user 0m0.005s
sys 0m1.883s

Sampling:
# time ./perf record -e ftrace:function -a sleep 1
[ perf record: Woken up 0 times to write data ]
Warning:
Processed 261730 events and lost 9 chunks!

[ perf record: Captured and wrote 19.907 MB perf.data (256293 samples) ]

real 1m31.948s
user 0m0.309s
sys 1m32.051s

Link: http://lkml.kernel.org/r/1458138873-1553-6-git-send-email-jolsa@kernel.org

Acked-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 84b6d3e6 16-Mar-2016 Jiri Olsa <jolsa@kernel.org>

ftrace: Make ftrace_hash_rec_enable return update bool

Change __ftrace_hash_rec_update to return true in case
we need to update dynamic ftrace call records. It return
false in case no update is needed.

Link: http://lkml.kernel.org/r/1458138873-1553-5-git-send-email-jolsa@kernel.org

Acked-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 7dcd182b 16-Feb-2016 Jessica Yu <jeyu@redhat.com>

ftrace/module: remove ftrace module notifier

Remove the ftrace module notifier in favor of directly calling
ftrace_module_enable() and ftrace_release_mod() in the module loader.
Hard-coding the function calls directly in the module loader removes
dependence on the module notifier call chain and provides better
visibility and control over what gets called when, which is important
to kernel utilities such as livepatch.

This fixes a notifier ordering issue in which the ftrace module notifier
(and hence ftrace_module_enable()) for coming modules was being called
after klp_module_notify(), which caused livepatch modules to initialize
incorrectly. This patch removes dependence on the module notifier call
chain in favor of hard coding the corresponding function calls in the
module loader. This ensures that ftrace and livepatch code get called in
the correct order on patch module load and unload.

Fixes: 5156dca34a3e ("ftrace: Fix the race between ftrace and insmod")
Signed-off-by: Jessica Yu <jeyu@redhat.com>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Reviewed-by: Petr Mladek <pmladek@suse.cz>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>


# 5156dca3 24-Dec-2015 Qiu Peiyang <peiyangx.qiu@intel.com>

ftrace: Fix the race between ftrace and insmod

We hit ftrace_bug report when booting Android on a 64bit ATOM SOC chip.
Basically, there is a race between insmod and ftrace_run_update_code.

After load_module=>ftrace_module_init, another thread jumps in to call
ftrace_run_update_code=>ftrace_arch_code_modify_prepare
=>set_all_modules_text_rw, to change all modules
as RW. Since the new module is at MODULE_STATE_UNFORMED, the text attribute
is not changed. Then, the 2nd thread goes ahead to change codes.
However, load_module continues to call complete_formation=>set_section_ro_nx,
then 2nd thread would fail when probing the module's TEXT.

The patch fixes it by using notifier to delay the enabling of ftrace
records to the time when module is at state MODULE_STATE_COMING.

Link: http://lkml.kernel.org/r/567CE628.3000609@intel.com

Signed-off-by: Qiu Peiyang <peiyangx.qiu@intel.com>
Signed-off-by: Zhang Yanmin <yanmin.zhang@intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# b7ffffbb 07-Jan-2016 Steven Rostedt (Red Hat) <rostedt@goodmis.org>

ftrace: Add infrastructure for delayed enabling of module functions

Qiu Peiyang pointed out that there's a race when enabling function tracing
and loading a module. In order to make the modifications of converting nops
in the prologue of functions into callbacks, the text needs to be converted
from read-only to read-write. When enabling function tracing, the text
permission is updated, the functions are modified, and then they are put
back.

When loading a module, the updates to convert function calls to mcount is
done before the module text is set to read-only. But after it is done, the
module text is visible by the function tracer. Thus we have the following
race:

CPU 0 CPU 1
----- -----
start function tracing
set text to read-write
load_module
add functions to ftrace
set module text read-only

update all functions to callbacks
modify module functions too
< Can't it's read-only >

When this happens, ftrace detects the issue and disables itself till the
next reboot.

To fix this, a new DISABLED flag is added for ftrace records, which all
module functions get when they are added. Then later, after the module code
is all set, the records will have the DISABLED flag cleared, and they will
be enabled if any callback wants all functions to be traced.

Note, this doesn't add the delay to later. It simply changes the
ftrace_module_init() to do both the setting of DISABLED records, and then
immediately calls the enable code. This helps with testing this new code as
it has the same behavior as previously. Another change will come after this
to have the ftrace_module_enable() called after the text is set to
read-only.

Cc: Qiu Peiyang <peiyangx.qiu@intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 97e9b4fc 22-Dec-2015 Steven Rostedt (Red Hat) <rostedt@goodmis.org>

ftrace: Clean up ftrace_module_init() code

The start and end variables were only used when ftrace_module_init() was
split up into multiple functions. No need to keep them around after the
merger.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# b6b71f66 02-Dec-2015 Abel Vesa <abelvesa@linux.com>

ftrace: Join functions ftrace_module_init() and ftrace_init_module()

Simple cleanup. No need for two functions here.
The whole work can simply be done inside 'ftrace_module_init'.

Link: http://lkml.kernel.org/r/1449067197-5718-1-git-send-email-abelvesa@linux.com

Signed-off-by: Abel Vesa <abelvesa@linux.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# c68c0fa2 01-Dec-2015 Steven Rostedt (Red Hat) <rostedt@goodmis.org>

ftrace: Have ftrace_ops_get_func() handle RCU and PER_CPU flags too

Jiri Olsa noted that the change to replace the control_ops did not update
the trampoline for when running perf on a single CPU and with CONFIG_PREEMPT
disabled (where dynamic ops, like perf, can use trampolines directly). The
result was that perf function could be called when RCU is not watching as
well as not handle the ftrace_local_disable().

Modify the ftrace_ops_get_func() to also check the RCU and PER_CPU ops flags
and use the recursive function if they are set. The recursive function is
modified to check those flags and execute the appropriate checks if they are
set.

Link: http://lkml.kernel.org/r/20151201134213.GA14155@krava.brq.redhat.com

Reported-by: Jiri Olsa <jolsa@redhat.com>
Patch-fixed-up-by: Jiri Olsa <jolsa@redhat.com>
Tested-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# ba27f2bc 30-Nov-2015 Steven Rostedt (Red Hat) <rostedt@goodmis.org>

ftrace: Remove use of control list and ops

Currently perf has its own list function within the ftrace infrastructure
that seems to be used only to allow for it to have per-cpu disabling as well
as a check to make sure that it's not called while RCU is not watching. It
uses something called the "control_ops" which is used to iterate over ops
under it with the control_list_func().

The problem is that this control_ops and control_list_func unnecessarily
complicates the code. By replacing FTRACE_OPS_FL_CONTROL with two new flags
(FTRACE_OPS_FL_RCU and FTRACE_OPS_FL_PER_CPU) we can remove all the code
that is special with the control ops and add the needed checks within the
generic ftrace_list_func().

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 030f4e1c 30-Nov-2015 Steven Rostedt (Red Hat) <rostedt@goodmis.org>

ftrace: Fix output of enabled_functions for showing tramp

When showing all tramps registered to a ftrace record in the file
enabled_functions, it exits the loop with ops == NULL. But then it is
suppose to show the function on the ops->trampoline and
add_trampoline_func() is called with the given ops. But because ops is now
NULL (to exit the loop), it always shows the static trampoline instead of
the one that is really registered to the record.

The call to add_trampoline_func() that shows the trampoline for the given
ops needs to be called at every iteration.

Fixes: 39daa7b9e895 "ftrace: Show all tramps registered to a record on ftrace_bug()"
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# b8ec330a 30-Nov-2015 Li Bin <huawei.libin@huawei.com>

ftrace: Fix a typo in comment

s/ARCH_SUPPORT_FTARCE_OPS/ARCH_SUPPORTS_FTRACE_OPS/

Link: http://lkml.kernel.org/r/1448879016-8659-1-git-send-email-huawei.libin@huawei.com

Signed-off-by: Li Bin <huawei.libin@huawei.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 39daa7b9 25-Nov-2015 Steven Rostedt (Red Hat) <rostedt@goodmis.org>

ftrace: Show all tramps registered to a record on ftrace_bug()

When an anomaly is detected in the function call modification code,
ftrace_bug() is called to disable function tracing as well as give any
information that may help debug the problem. Currently, only the first found
trampoline that is attached to the failed record is reported. Instead, show
all trampolines that are hooked to it.

Also, not only show the ops pointer but also report the function it calls.

While at it, add this info to the enabled_functions debug file too.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# b05086c7 25-Nov-2015 Steven Rostedt (Red Hat) <rostedt@goodmis.org>

ftrace: Add variable ftrace_expected for archs to show expected code

When an anomaly is found while modifying function code, ftrace_bug() is
called which disables the function tracing infrastructure and reports
information about what failed. If the code that is to be replaced does not
match what is expected, then actual code is shown. Currently there is no
arch generic way to show what was expected.

Add a new variable pointer calld ftrace_expected that the arch code can set
to point to what it expected so that ftrace_bug() can report the actual text
as well as the text that was expected to be there.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 02a392a0 24-Nov-2015 Steven Rostedt (Red Hat) <rostedt@goodmis.org>

ftrace: Add new type to distinguish what kind of ftrace_bug()

The ftrace function hook utility has several internal checks to make sure
that whatever it modifies is exactly what it expects to be modifying. This
is essential as modifying running code can be extremely dangerous to the
system.

When an anomaly is detected, ftrace_bug() is called which sends a splat to
the console and disables function tracing. There's some extra information
that is printed to help diagnose the issue.

One thing that is missing though is output of what ftrace was doing at the
time of the crash. Was it updating a call site or perhaps converting a call
site to a nop? A new global enum variable is created to state what ftrace
was doing at the time of the anomaly, and this is reported in ftrace_bug().

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 0b507e1e 29-Sep-2015 Dmitry Safonov <0x7f454c46@gmail.com>

ftrace: add module globbing

Extend module command for function filter selection with globbing.
It uses the same globbing as function filter.

sh# echo '*alloc*:mod:*' > set_ftrace_filter

Will trace any function with the letters 'alloc' in the name in any
module but not in kernel.

sh# echo '!*alloc*:mod:ipv6' >> set_ftrace_filter

Will prevent from tracing functions with 'alloc' in the name from module
ipv6 (do not forget to append to set_ftrace_filter file).

sh# echo '*alloc*:mod:!ipv6' > set_ftrace_filter

Will trace functions with 'alloc' in the name from kernel and any
module except ipv6.

sh# echo '*alloc*:mod:!*' > set_ftrace_filter

Will trace any function with the letters 'alloc' in the name only from
kernel, but not from any module.

sh# echo '*:mod:!*' > set_ftrace_filter
or
sh# echo ':mod:!' > set_ftrace_filter

Will trace every function in the kernel, but will not trace functions
from any module.

sh# echo '*:mod:*' > set_ftrace_filter
or
sh# echo ':mod:' > set_ftrace_filter

As the opposite will trace all functions from all modules, but not from
kernel.

sh# echo '*:mod:*snd*' > set_ftrace_filter

Will trace your sound drivers only (if any).

Link: http://lkml.kernel.org/r/1443545176-3215-4-git-send-email-0x7f454c46@gmail.com

Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com>
[ Made format changes ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 3ba00929 29-Sep-2015 Dmitry Safonov <0x7f454c46@gmail.com>

ftrace: Introduce ftrace_glob structure

ftrace_match parameters are very related and I reduce the number of local
variables & parameters with it.
This is also preparation for module globbing as it would introduce more
realated variables & parameters.

Link: http://lkml.kernel.org/r/1443545176-3215-3-git-send-email-0x7f454c46@gmail.com

Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com>
[ Made some formatting changes ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# f0a3b154 29-Sep-2015 Dmitry Safonov <0x7f454c46@gmail.com>

ftrace: Clarify code for mod command

"Not" is too abstract variable name - changed to clear_filter.
Removed ftrace_match_module_records function: comparison with !* or *
not does the general code in filter_parse_regex() as it works without
mod command for
sh# echo '!*' > /sys/kernel/debug/tracing/set_ftrace_filter

Link: http://lkml.kernel.org/r/1443545176-3215-2-git-send-email-0x7f454c46@gmail.com

Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 5e3949f0 29-Sep-2015 Dmitry Safonov <0x7f454c46@gmail.com>

ftrace: Remove redundant strsep in mod_callback

By now there isn't any subcommand for mod.

Before:
sh$ echo '*:mod:ipv6:a' > set_ftrace_filter
sh$ echo '*:mod:ipv6' > set_ftrace_filter
had the same results, but now first will result in:
sh$ echo '*:mod:ipv6:a' > set_ftrace_filter
-bash: echo: write error: Invalid argument

Also, I clarified ftrace_mod_callback code a little.

Link: http://lkml.kernel.org/r/1443545176-3215-1-git-send-email-0x7f454c46@gmail.com

Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com>
[ converted 'if (ret == 0)' to 'if (!ret)' ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# c73464b1 28-Sep-2015 Peter Zijlstra <peterz@infradead.org>

sched/core: Fix trace_sched_switch()

__trace_sched_switch_state() is the last remaining PREEMPT_ACTIVE
user, move trace_sched_switch() from prepare_task_switch() to
__schedule() and propagate the @preempt argument.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 6db02903 09-Sep-2015 Rasmus Villemoes <linux@rasmusvillemoes.dk>

ftrace: Remove redundant swap function

To cover the common case of sorting an array of pointers, Daniel
Wagner recently modified the library sort() to use a specific swap
function for size==8, in addition to the size==4 case which was
already handled. Since sizeof(long) is either 4 or 8,
ftrace_swap_ips() is redundant and we can just let sort() pick an
appropriate and fast swap callback.

Link: http://lkml.kernel.org/r/1441834023-13130-1-git-send-email-linux@rasmusvillemoes.dk

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 55577204 29-Sep-2015 Steven Rostedt (Red Hat) <rostedt@goodmis.org>

tracing: Move sleep-time and graph-time options out of the core trace_flags

The sleep-time and graph-time options are only for the function graph tracer
and are not used by anything else. As tracer options are now visible when
the tracer is not activated, its better to move the function graph specific
tracer options into the function graph tracer.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# e3eea140 24-Jul-2015 Steven Rostedt (Red Hat) <rostedt@goodmis.org>

ftrace: Fix breakage of set_ftrace_pid

Commit 4104d326b670 ("ftrace: Remove global function list and call function
directly") simplified the ftrace code by removing the global_ops list with a
new design. But this cleanup also broke the filtering of PIDs that are added
to the set_ftrace_pid file.

Add back the proper hooks to have pid filtering working once again.

Cc: stable@vger.kernel.org # 3.16+
Reported-by: Matt Fleming <matt@console-pimps.org>
Reported-by: Richard Weinberger <richard.weinberger@gmail.com>
Tested-by: Matt Fleming <matt@console-pimps.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 8e436ca0 22-Jun-2015 Umesh Tiwari <umesh.t@samsung.com>

ftrace: add tracing_thresh to function profile

This patch extends tracing_thresh functionality to function profile tracer.
If tracing_thresh is set, print those entries only,
whose average is > tracing thresh.

Link: http://lkml.kernel.org/r/1434972488-8571-1-git-send-email-umesh.t@samsung.com

Signed-off-by: Umesh Tiwari <umesh.t@samsung.com>
[ Removed unnecessary 'moved' comment ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 00ccbf2f 19-Feb-2015 Steven Rostedt (Red Hat) <rostedt@goodmis.org>

ftrace/x86: Let dynamic trampolines call ops->func even for dynamic fops

Dynamically allocated trampolines call ftrace_ops_get_func to get the
function which they should call. For dynamic fops (FTRACE_OPS_FL_DYNAMIC
flag is set) ftrace_ops_list_func is always returned. This is reasonable
for static trampolines but goes against the main advantage of dynamic
ones, that is avoidance of going through the list of all registered
callbacks for functions that are only being traced by a single callback.

We can fix it by returning ops->func (or recursion safe version) from
ftrace_ops_get_func whenever it is possible for dynamic trampolines.

Note that dynamic trampolines are not allowed for dynamic fops if
CONFIG_PREEMPT=y.

Link: http://lkml.kernel.org/r/alpine.LNX.2.00.1501291023000.25445@pobox.suse.cz
Link: http://lkml.kernel.org/r/1424357773-13536-1-git-send-email-mbenes@suse.cz

Reported-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 524a3868 06-Mar-2015 Steven Rostedt (Red Hat) <rostedt@goodmis.org>

ftrace: Fix ftrace enable ordering of sysctl ftrace_enabled

Some archs (specifically PowerPC), are sensitive with the ordering of
the enabling of the calls to function tracing and setting of the
function to use to be traced.

That is, update_ftrace_function() sets what function the ftrace_caller
trampoline should call. Some archs require this to be set before
calling ftrace_run_update_code().

Another bug was discovered, that ftrace_startup_sysctl() called
ftrace_run_update_code() directly. If the function the ftrace_caller
trampoline changes, then it will not be updated. Instead a call
to ftrace_startup_enable() should be called because it tests to see
if the callback changed since the code was disabled, and will
tell the arch to update appropriately. Most archs do not need this
notification, but PowerPC does.

The problem could be seen by the following commands:

# echo 0 > /proc/sys/kernel/ftrace_enabled
# echo function > /sys/kernel/debug/tracing/current_tracer
# echo 1 > /proc/sys/kernel/ftrace_enabled
# cat /sys/kernel/debug/tracing/trace

The trace will show that function tracing was not active.

Cc: stable@vger.kernel.org # 2.6.27+
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 1619dc3f 06-Mar-2015 Pratyush Anand <panand@redhat.com>

ftrace: Fix en(dis)able graph caller when en(dis)abling record via sysctl

When ftrace is enabled globally through the proc interface, we must check if
ftrace_graph_active is set. If it is set, then we should also pass the
FTRACE_START_FUNC_RET command to ftrace_run_update_code(). Similarly, when
ftrace is disabled globally through the proc interface, we must check if
ftrace_graph_active is set. If it is set, then we should also pass the
FTRACE_STOP_FUNC_RET command to ftrace_run_update_code().

Consider the following situation.

# echo 0 > /proc/sys/kernel/ftrace_enabled

After this ftrace_enabled = 0.

# echo function_graph > /sys/kernel/debug/tracing/current_tracer

Since ftrace_enabled = 0, ftrace_enable_ftrace_graph_caller() is never
called.

# echo 1 > /proc/sys/kernel/ftrace_enabled

Now ftrace_enabled will be set to true, but still
ftrace_enable_ftrace_graph_caller() will not be called, which is not
desired.

Further if we execute the following after this:
# echo nop > /sys/kernel/debug/tracing/current_tracer

Now since ftrace_enabled is set it will call
ftrace_disable_ftrace_graph_caller(), which causes a kernel warning on
the ARM platform.

On the ARM platform, when ftrace_enable_ftrace_graph_caller() is called,
it checks whether the old instruction is a nop or not. If it's not a nop,
then it returns an error. If it is a nop then it replaces instruction at
that address with a branch to ftrace_graph_caller.
ftrace_disable_ftrace_graph_caller() behaves just the opposite. Therefore,
if generic ftrace code ever calls either ftrace_enable_ftrace_graph_caller()
or ftrace_disable_ftrace_graph_caller() consecutively two times in a row,
then it will return an error, which will cause the generic ftrace code to
raise a warning.

Note, x86 does not have an issue with this because the architecture
specific code for ftrace_enable_ftrace_graph_caller() and
ftrace_disable_ftrace_graph_caller() does not check the previous state,
and calling either of these functions twice in a row has no ill effect.

Link: http://lkml.kernel.org/r/e4fbe64cdac0dd0e86a3bf914b0f83c0b419f146.1425666454.git.panand@redhat.com

Cc: stable@vger.kernel.org # 2.6.31+
Signed-off-by: Pratyush Anand <panand@redhat.com>
[
removed extra if (ftrace_start_up) and defined ftrace_graph_active as 0
if CONFIG_FUNCTION_GRAPH_TRACER is not set.
]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# b24d443b 04-Mar-2015 Steven Rostedt (Red Hat) <rostedt@goodmis.org>

ftrace: Clear REGS_EN and TRAMP_EN flags on disabling record via sysctl

When /proc/sys/kernel/ftrace_enabled is set to zero, all function
tracing is disabled. But the records that represent the functions
still hold information about the ftrace_ops that are hooked to them.

ftrace_ops may request "REGS" (have a full set of pt_regs passed to
the callback), or "TRAMP" (the ops has its own trampoline to use).
When the record is updated to represent the state of the ops hooked
to it, it sets "REGS_EN" and/or "TRAMP_EN" to state that the callback
points to the correct trampoline (REGS has its own trampoline).

When ftrace_enabled is set to zero, all ftrace locations are a nop,
so they do not point to any trampoline. But the _EN flags are still
set. This can cause the accounting to go wrong when ftrace_enabled
is cleared and an ops that has a trampoline is registered or unregistered.

For example, the following will cause ftrace to crash:

# echo function_graph > /sys/kernel/debug/tracing/current_tracer
# echo 0 > /proc/sys/kernel/ftrace_enabled
# echo nop > /sys/kernel/debug/tracing/current_tracer
# echo 1 > /proc/sys/kernel/ftrace_enabled
# echo function_graph > /sys/kernel/debug/tracing/current_tracer

As function_graph uses a trampoline, when ftrace_enabled is set to zero
the updates to the record are not done. When enabling function_graph
again, the record will still have the TRAMP_EN flag set, and it will
look for an op that has a trampoline other than the function_graph
ops, and fail to find one.

Cc: stable@vger.kernel.org # 3.17+
Reported-by: Pratyush Anand <panand@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 8434dc93 19-Jan-2015 Steven Rostedt (Red Hat) <rostedt@goodmis.org>

tracing: Convert the tracing facility over to use tracefs

debugfs was fine for the tracing facility as a quick way to get
an interface. Now that tracing has matured, it should separate itself
from debugfs such that it can be mounted separately without needing
to mount all of debugfs with it. That is, users resist using tracing
because it requires mounting debugfs. Having tracing have its own file
system lets users get the features of tracing without needing to bring
in the rest of the kernel's debug infrastructure.

Another reason for tracefs is that debubfs does not support mkdir.
Currently, to create instances, one does a mkdir in the tracing/instance
directory. This is implemented via a hack that forces debugfs to do
something it is not intended on doing. By converting over to tracefs, this
hack can be removed and mkdir can be properly implemented. This patch does
not address this yet, but it lays the ground work for that to be done.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 14a5ae40 20-Jan-2015 Steven Rostedt (Red Hat) <rostedt@goodmis.org>

tracing: Use IS_ERR() check for return value of tracing_init_dentry()

tracing_init_dentry() will soon return NULL as a valid pointer for the
top level tracing directroy. NULL can not be used as an error value.
Instead, switch to ERR_PTR() and check the return status with
IS_ERR().

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 7485058e 13-Jan-2015 Steven Rostedt (Red Hat) <rostedt@goodmis.org>

ftrace: Check both notrace and filter for old hash

Using just the filter for checking for trampolines or regs is not enough
when updating the code against the records that represent all functions.
Both the filter hash and the notrace hash need to be checked.

To trigger this bug (using trace-cmd and perf):

# perf probe -a do_fork
# trace-cmd start -B foo -e probe
# trace-cmd record -p function_graph -n do_fork sleep 1

The trace-cmd record at the end clears the filter before it disables
function_graph tracing and then that causes the accounting of the
ftrace function records to become incorrect and causes ftrace to bug.

Link: http://lkml.kernel.org/r/20150114154329.358378039@goodmis.org

Cc: stable@vger.kernel.org
[ still need to switch old_hash_ops to old_ops_hash ]
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 8f86f837 13-Jan-2015 Steven Rostedt (Red Hat) <rostedt@goodmis.org>

ftrace: Fix updating of filters for shared global_ops filters

As the set_ftrace_filter affects both the function tracer as well as the
function graph tracer, the ops that represent each have a shared
ftrace_ops_hash structure. This allows both to be updated when the filter
files are updated.

But if function graph is enabled and the global_ops (function tracing) ops
is not, then it is possible that the filter could be changed without the
update happening for the function graph ops. This will cause the changes
to not take place and may even cause a ftrace_bug to occur as it could mess
with the trampoline accounting.

The solution is to check if the ops uses the shared global_ops filter and
if the ops itself is not enabled, to check if there's another ops that is
enabled and also shares the global_ops filter. In that case, the
modification still needs to be executed.

Link: http://lkml.kernel.org/r/20150114154329.055980438@goodmis.org

Cc: stable@vger.kernel.org # 3.17+
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# f8b8be8a 21-Nov-2014 Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>

ftrace, kprobes: Support IPMODIFY flag to find IP modify conflict

Introduce FTRACE_OPS_FL_IPMODIFY to avoid conflict among
ftrace users who may modify regs->ip to change the execution
path. If two or more users modify the regs->ip on the same
function entry, one of them will be broken. So they must add
IPMODIFY flag and make sure that ftrace_set_filter_ip() succeeds.

Note that ftrace doesn't allow ftrace_ops which has IPMODIFY
flag to have notrace hash, and the ftrace_ops must have a
filter hash (so that the ftrace_ops can hook only specific
entries), because it strongly depends on the address and
must be allowed for only few selected functions.

Link: http://lkml.kernel.org/r/20141121102516.11844.27829.stgit@localhost.localdomain

Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Petr Mladek <pmladek@suse.cz>
Cc: Vojtech Pavlik <vojtech@suse.cz>
Cc: Miroslav Benes <mbenes@suse.cz>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
[ fixed up some of the comments ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 0af26492 20-Nov-2014 Steven Rostedt (Red Hat) <rostedt@goodmis.org>

tracing/trivial: Fix typos and make an int into a bool

Fix up a few typos in comments and convert an int into a bool in
update_traceon_count().

Link: http://lkml.kernel.org/r/546DD445.5080108@hitachi.com

Suggested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# aec0be2d 18-Nov-2014 Steven Rostedt (Red Hat) <rostedt@goodmis.org>

ftrace/x86/extable: Add is_ftrace_trampoline() function

Stack traces that happen from function tracing check if the address
on the stack is a __kernel_text_address(). That is, is the address
kernel code. This calls core_kernel_text() which returns true
if the address is part of the builtin kernel code. It also calls
is_module_text_address() which returns true if the address belongs
to module code.

But what is missing is ftrace dynamically allocated trampolines.
These trampolines are allocated for individual ftrace_ops that
call the ftrace_ops callback functions directly. But if they do a
stack trace, the code checking the stack wont detect them as they
are neither core kernel code nor module address space.

Adding another field to ftrace_ops that also stores the size of
the trampoline assigned to it we can create a new function called
is_ftrace_trampoline() that returns true if the address is a
dynamically allocate ftrace trampoline. Note, it ignores trampolines
that are not dynamically allocated as they will return true with
the core_kernel_text() function.

Link: http://lkml.kernel.org/r/20141119034829.497125839@goodmis.org

Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# fa6f0cc7 08-Nov-2014 Rasmus Villemoes <linux@rasmusvillemoes.dk>

tracing: Replace seq_printf by simpler equivalents

Using seq_printf to print a simple string or a single character is a
lot more expensive than it needs to be, since seq_puts and seq_putc
exist.

These patches do

seq_printf(m, s) -> seq_puts(m, s)
seq_printf(m, "%s", s) -> seq_puts(m, s)
seq_printf(m, "%c", c) -> seq_putc(m, c)

Subsequent patches will simplify further.

Link: http://lkml.kernel.org/r/1415479332-25944-2-git-send-email-linux@rasmusvillemoes.dk

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# fe578ba3 13-Nov-2014 Steven Rostedt (Red Hat) <rostedt@goodmis.org>

ftrace: Have the control_ops get a trampoline

With the new logic, if only a single user of ftrace function hooks is
used, it will get its own trampoline assigned to it.

The problem is that the control_ops is an indirect ops that perf ops
uses. What that means is that when perf registers its ops with
register_ftrace_function(), it has the CONTROL flag set and gets added
to the control list instead of the global ftrace list. The control_ops
gets added to that instead and the mcount trampoline calls the control_ops
function. The control_ops function will iterate the control list and
call the ops functions that are attached to it.

But currently the trampoline is added to the perf ops and not the
control ops, and when ftrace tries to find a trampoline hook for it,
it fails to find one and gives the following splat:

------------[ cut here ]------------
WARNING: CPU: 0 PID: 10133 at kernel/trace/ftrace.c:2033 ftrace_get_addr_new+0x6f/0xc0()
Modules linked in: [...]
CPU: 0 PID: 10133 Comm: perf Tainted: P 3.18.0-rc1-test+ #388
Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v02.05 05/07/2012
00000000000007f1 ffff8800c2643bc8 ffffffff814fca6e ffff88011ea0ed01
0000000000000000 ffff8800c2643c08 ffffffff81041ffd 0000000000000000
ffffffff810c388c ffffffff81a5a350 ffff880119b00000 ffffffff810001c8
Call Trace:
[<ffffffff814fca6e>] dump_stack+0x46/0x58
[<ffffffff81041ffd>] warn_slowpath_common+0x81/0x9b
[<ffffffff810c388c>] ? ftrace_get_addr_new+0x6f/0xc0
[<ffffffff810001c8>] ? 0xffffffff810001c8
[<ffffffff81042031>] warn_slowpath_null+0x1a/0x1c
[<ffffffff810c388c>] ftrace_get_addr_new+0x6f/0xc0
[<ffffffff8102e938>] ftrace_replace_code+0xd6/0x334
[<ffffffff810c4116>] ftrace_modify_all_code+0x41/0xc5
[<ffffffff8102eba6>] arch_ftrace_update_code+0x10/0x19
[<ffffffff810c293c>] ftrace_run_update_code+0x21/0x42
[<ffffffff810c298f>] ftrace_startup_enable+0x32/0x34
[<ffffffff810c3049>] ftrace_startup+0x14e/0x15a
[<ffffffff810c307c>] register_ftrace_function+0x27/0x40
[<ffffffff810dc118>] perf_ftrace_event_register+0x3e/0xee
[<ffffffff810dbfbe>] perf_trace_init+0x29d/0x2a9
[<ffffffff810eb422>] perf_tp_event_init+0x27/0x3a
[<ffffffff810f18bc>] perf_init_event+0x9e/0xed
[<ffffffff810f1ba4>] perf_event_alloc+0x299/0x330
[<ffffffff810f236b>] SYSC_perf_event_open+0x3ee/0x816
[<ffffffff8115a066>] ? mntput+0x2d/0x2f
[<ffffffff81142b00>] ? __fput+0xa7/0x1b2
[<ffffffff81091300>] ? do_gettimeofday+0x22/0x3a
[<ffffffff810f279c>] SyS_perf_event_open+0x9/0xb
[<ffffffff81502a92>] system_call_fastpath+0x12/0x17
---[ end trace 81a53565150e4982 ]---
Bad trampoline accounting at: ffffffff810001c8 (run_init_process+0x0/0x2d) (10000001)

Update the control_ops trampoline instead of the perf ops one.

Reported-by: lkp@01.org
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 4fd3279b 24-Oct-2014 Steven Rostedt (Red Hat) <rostedt@goodmis.org>

ftrace: Add more information to ftrace_bug() output

With the introduction of the dynamic trampolines, it is useful that if
things go wrong that ftrace_bug() produces more information about what
the current state is. This can help debug issues that may arise.

Ftrace has lots of checks to make sure that the state of the system it
touchs is exactly what it expects it to be. When it detects an abnormality
it calls ftrace_bug() and disables itself to prevent any further damage.
It is crucial that ftrace_bug() produces sufficient information that
can be used to debug the situation.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Borislav Petkov <bp@suse.de>
Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Tested-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 12cce594 03-Jul-2014 Steven Rostedt (Red Hat) <rostedt@goodmis.org>

ftrace/x86: Allow !CONFIG_PREEMPT dynamic ops to use allocated trampolines

When the static ftrace_ops (like function tracer) enables tracing, and it
is the only callback that is referencing a function, a trampoline is
dynamically allocated to the function that calls the callback directly
instead of calling a loop function that iterates over all the registered
ftrace ops (if more than one ops is registered).

But when it comes to dynamically allocated ftrace_ops, where they may be
freed, on a CONFIG_PREEMPT kernel there's no way to know when it is safe
to free the trampoline. If a task was preempted while executing on the
trampoline, there's currently no way to know when it will be off that
trampoline.

But this is not true when it comes to !CONFIG_PREEMPT. The current method
of calling schedule_on_each_cpu() will force tasks off the trampoline,
becaues they can not schedule while on it (kernel preemption is not
configured). That means it is safe to free a dynamically allocated
ftrace ops trampoline when CONFIG_PREEMPT is not configured.

Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Borislav Petkov <bp@suse.de>
Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Tested-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 15d5b02c 03-Jul-2014 Steven Rostedt (Red Hat) <rostedt@goodmis.org>

ftrace/x86: Show trampoline call function in enabled_functions

The file /sys/kernel/debug/tracing/eneabled_functions is used to debug
ftrace function hooks. Add to the output what function is being called
by the trampoline if the arch supports it.

Add support for this feature in x86_64.

Cc: H. Peter Anvin <hpa@linux.intel.com>
Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Tested-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# f3bea491 02-Jul-2014 Steven Rostedt (Red Hat) <rostedt@goodmis.org>

ftrace/x86: Add dynamic allocated trampoline for ftrace_ops

The current method of handling multiple function callbacks is to register
a list function callback that calls all the other callbacks based on
their hash tables and compare it to the function that the callback was
called on. But this is very inefficient.

For example, if you are tracing all functions in the kernel and then
add a kprobe to a function such that the kprobe uses ftrace, the
mcount trampoline will switch from calling the function trace callback
to calling the list callback that will iterate over all registered
ftrace_ops (in this case, the function tracer and the kprobes callback).
That means for every function being traced it checks the hash of the
ftrace_ops for function tracing and kprobes, even though the kprobes
is only set at a single function. The kprobes ftrace_ops is checked
for every function being traced!

Instead of calling the list function for functions that are only being
traced by a single callback, we can call a dynamically allocated
trampoline that calls the callback directly. The function graph tracer
already uses a direct call trampoline when it is being traced by itself
but it is not dynamically allocated. It's trampoline is static in the
kernel core. The infrastructure that called the function graph trampoline
can also be used to call a dynamically allocated one.

For now, only ftrace_ops that are not dynamically allocated can have
a trampoline. That is, users such as function tracer or stack tracer.
kprobes and perf allocate their ftrace_ops, and until there's a safe
way to free the trampoline, it can not be used. The dynamically allocated
ftrace_ops may, although, use the trampoline if the kernel is not
compiled with CONFIG_PREEMPT. But that will come later.

Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Tested-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 4fc40904 24-Oct-2014 Steven Rostedt (Red Hat) <rostedt@goodmis.org>

ftrace: Fix checking of trampoline ftrace_ops in finding trampoline

When modifying code, ftrace has several checks to make sure things
are being done correctly. One of them is to make sure any code it
modifies is exactly what it expects it to be before it modifies it.
In order to do so with the new trampoline logic, it must be able
to find out what trampoline a function is hooked to in order to
see if the code that hooks to it is what's expected.

The logic to find the trampoline from a record (accounting descriptor
for a function that is hooked) needs to only look at the "old_hash"
of an ops that is being modified. The old_hash is the list of function
an ops is hooked to before its update. Since a record would only be
pointing to an ops that is being modified if it was already hooked
before.

Currently, it can pick a modified ops based on its new functions it
will be hooked to, and this picks the wrong trampoline and causes
the check to fail, disabling ftrace.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

ftrace: squash into ordering of ops for modification


# 8252ecf3 24-Oct-2014 Steven Rostedt (Red Hat) <rostedt@goodmis.org>

ftrace: Set ops->old_hash on modifying what an ops hooks to

The code that checks for trampolines when modifying function hooks
tests against a modified ops "old_hash". But the ops old_hash pointer
is not being updated before the changes are made, making it possible
to not find the right hash to the callback and possibly causing
ftrace to break in accounting and disable itself.

Have the ops set its old_hash before the modifying takes place.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 84bde62c 12-Sep-2014 Steven Rostedt (Red Hat) <rostedt@goodmis.org>

ftrace: Add sanity check when unregistering last ftrace_ops

When the last ftrace_ops is unregistered, all the function records should
have a zeroed flags value. Make sure that is the case when the last ftrace_ops
is unregistered.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# fef5aeee 23-Jul-2014 Steven Rostedt (Red Hat) <rostedt@goodmis.org>

ftrace: Replace tramp_hash with old_*_hash to save space

Allowing function callbacks to declare their own trampolines requires
that each ftrace_ops that has a trampoline must have some sort of
accounting that keeps track of which ops has a trampoline attached
to a record.

The easy way to solve this was to add a "tramp_hash" that created a
hash entry for every function that a ops uses with a trampoline.
But since we can have literally tens of thousands of functions being
traced, that means we need tens of thousands of descriptors to map
the ops to the function in the hash. This is quite expensive and
can cause enabling and disabling the function graph tracer to take
some time to start and stop. It can take up to several seconds to
disable or enable all functions in the function graph tracer for this
reason.

The better approach albeit more complex, is to keep track of how ops
are being enabled and disabled, and use that along with the counting
of the number of ops attached to records, to determive what ops has
a trampoline attached to a record at enabling and disabling of
tracing.

To do this, the tramp_hash has been replaced with an old_filter_hash
and old_notrace_hash, which get the copy of the ops filter_hash and
notrace_hash respectively. The old hashes is kept until the ops has
been modified or removed and the old hashes are used with the logic
of the accounting to determine the ops that have the trampoline of
a record. The reason this has less of a footprint is due to the trick
that an "empty" hash in the filter_hash means "all functions" and
an empty hash in the notrace hash means "no functions" in the hash.

This is much more efficienct, doesn't have the delay, and takes up
much less memory, as we do not need to map all the functions but
just figure out which functions are mapped at the time it is
enabled or disabled.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# e1effa01 05-Aug-2014 Steven Rostedt (Red Hat) <rostedt@goodmis.org>

ftrace: Annotate the ops operation on update

Add three new flags for ftrace_ops:

FTRACE_OPS_FL_ADDING
FTRACE_OPS_FL_REMOVING
FTRACE_OPS_FL_MODIFYING

These will be set for the ftrace_ops when they are first added
to the function tracing, being removed from function tracing
or just having their functions changed from function tracing,
respectively.

This will be needed to remove the tramp_hash, which can grow quite
big. The tramp_hash is used to note what functions a ftrace_ops
is using a trampoline for. Denoting which ftrace_ops is being
modified, will allow us to use the ftrace_ops hashes themselves,
which are much smaller as they have a global flag to denote if
a ftrace_ops is tracing all functions, as well as a notrace hash
if the ftrace_ops is tracing all but a few. The tramp_hash just
creates a hash item for every function, which can go into the 10s
of thousands if all functions are using the ftrace_ops trampoline.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 5fecaa04 24-Jul-2014 Steven Rostedt (Red Hat) <rostedt@goodmis.org>

ftrace: Grab any ops for a rec for enabled_functions output

When dumping the enabled_functions, use the first op that is
found with a trampoline to the record, as there should only be
one, as only one ops can be registered to a function that has
a trampoline.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 3296fc4e 24-Jul-2014 Steven Rostedt (Red Hat) <rostedt@goodmis.org>

ftrace: Remove freeing of old_hash from ftrace_hash_move()

ftrace_hash_move() currently frees the old hash that is passed to it
after replacing the pointer with the new hash. Instead of having the
function do that chore, have the caller perform the free.

This lets the ftrace_hash_move() be used a bit more freely, which
is needed for changing the way the trampoline logic is done.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# f7aad4e1 10-Sep-2014 Steven Rostedt (Red Hat) <rostedt@goodmis.org>

ftrace: Set callback to ftrace_stub when no ops are registered

The clean up that adds the helper function ftrace_ops_get_func()
caused the default function to not change when DYNAMIC_FTRACE was not
set and no ftrace_ops were registered. Although static tracing is
not very useful (not having DYNAMIC_FTRACE set), it is still supported
and we don't want to break it.

Clean up the if statement even more to specifically have the default
function call ftrace_stub when no ftrace_ops are registered. This
fixes the small bug for static tracing as well as makes the code a
bit more understandable.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 87354059 22-Jul-2014 Steven Rostedt (Red Hat) <rostedt@goodmis.org>

ftrace: Add helper function ftrace_ops_get_func()

Add the helper function to what the mcount trampoline is to call
for a ftrace_ops function. This helper will be used by arch code
in the future to set up dynamic trampolines. But as this does the
same tests that are performed in choosing what function to call for
the default mcount trampoline, might as well use it to clean up
the existing code.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# f1ff6348 22-Jul-2014 Steven Rostedt (Red Hat) <rostedt@goodmis.org>

ftrace: Add separate function for non recursive callbacks

Instead of using the generic list function for callbacks that
are not recursive, call a new helper function from the mcount
trampoline called ftrace_ops_recur_func() that will do the recursion
checking for the callback.

This eliminates an indirection as well as will help in future code
that will use dynamically allocated trampolines.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 39b5552c 17-Aug-2014 Steven Rostedt (Red Hat) <rostedt@goodmis.org>

ftrace: Use current addr when converting to nop in __ftrace_replace_code()

In __ftrace_replace_code(), when converting the call to a nop in a function
it needs to compare against the "curr" (current) value of the ftrace ops, and
not the "new" one. It currently does not affect x86 which is the only arch
to do the trampolines with function graph tracer, but when other archs that do
depend on this code implement the function graph trampoline, it can crash.

Here's an example when ARM uses the trampolines (in the future):

------------[ cut here ]------------
WARNING: CPU: 0 PID: 9 at kernel/trace/ftrace.c:1716 ftrace_bug+0x17c/0x1f4()
Modules linked in: omap_rng rng_core ipv6
CPU: 0 PID: 9 Comm: migration/0 Not tainted 3.16.0-test-10959-gf0094b28f303-dirty #52
[<c02188f4>] (unwind_backtrace) from [<c021343c>] (show_stack+0x20/0x24)
[<c021343c>] (show_stack) from [<c095a674>] (dump_stack+0x78/0x94)
[<c095a674>] (dump_stack) from [<c02532a0>] (warn_slowpath_common+0x7c/0x9c)
[<c02532a0>] (warn_slowpath_common) from [<c02532ec>] (warn_slowpath_null+0x2c/0x34)
[<c02532ec>] (warn_slowpath_null) from [<c02cbac4>] (ftrace_bug+0x17c/0x1f4)
[<c02cbac4>] (ftrace_bug) from [<c02cc44c>] (ftrace_replace_code+0x80/0x9c)
[<c02cc44c>] (ftrace_replace_code) from [<c02cc658>] (ftrace_modify_all_code+0xb8/0x164)
[<c02cc658>] (ftrace_modify_all_code) from [<c02cc718>] (__ftrace_modify_code+0x14/0x1c)
[<c02cc718>] (__ftrace_modify_code) from [<c02c7244>] (multi_cpu_stop+0xf4/0x134)
[<c02c7244>] (multi_cpu_stop) from [<c02c6e90>] (cpu_stopper_thread+0x54/0x130)
[<c02c6e90>] (cpu_stopper_thread) from [<c0271cd4>] (smpboot_thread_fn+0x1ac/0x1bc)
[<c0271cd4>] (smpboot_thread_fn) from [<c026ddf0>] (kthread+0xe0/0xfc)
[<c026ddf0>] (kthread) from [<c020f318>] (ret_from_fork+0x14/0x20)
---[ end trace dc9ce72c5b617d8f ]---
[ 65.047264] ftrace failed to modify [<c0208580>] asm_do_IRQ+0x10/0x1c
[ 65.054070] actual: 85:1b:00:eb

Fixes: 7413af1fb70e7 "ftrace: Make get_ftrace_addr() and get_ftrace_addr_old() global"
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 5f151b24 15-Aug-2014 Steven Rostedt (Red Hat) <rostedt@goodmis.org>

ftrace: Fix function_profiler and function tracer together

The latest rewrite of ftrace removed the separate ftrace_ops of
the function tracer and the function graph tracer and had them
share the same ftrace_ops. This simplified the accounting by removing
the multiple layers of functions called, where the global_ops func
would call a special list that would iterate over the other ops that
were registered within it (like function and function graph), which
itself was registered to the ftrace ops list of all functions
currently active. If that sounds confusing, the code that implemented
it was also confusing and its removal is a good thing.

The problem with this change was that it assumed that the function
and function graph tracer can never be used at the same time.
This is mostly true, but there is an exception. That is when the
function profiler uses the function graph tracer to profile.
The function profiler can be activated the same time as the function
tracer, and this breaks the assumption and the result is that ftrace
will crash (it detects the error and shuts itself down, it does not
cause a kernel oops).

To solve this issue, a previous change allowed the hash tables
for the functions traced by a ftrace_ops to be a pointer and let
multiple ftrace_ops share the same hash. This allows the function
and function_graph tracer to have separate ftrace_ops, but still
share the hash, which is what is done.

Now the function and function graph tracers have separate ftrace_ops
again, and the function tracer can be run while the function_profile
is active.

Cc: stable@vger.kernel.org # 3.16 (apply after 3.17-rc4 is out)
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# bce0b6c5 20-Aug-2014 Steven Rostedt (Red Hat) <rostedt@goodmis.org>

ftrace: Fix up trampoline accounting with looping on hash ops

Now that a ftrace_hash can be shared by multiple ftrace_ops, they can dec
the rec->flags by more than once (one per those that share the ftrace_hash).
This means that the tramp_hash may not have a hash item when it was added.

For example, if two ftrace_ops share a hash for a ftrace record, and the
first ops has a trampoline, when it adds itself it will set the rec->flags
TRAMP flag and increments its nr_trampolines counter. When the second ops
is added, it must clear that tramp flag but also decrement the other ops
that shares its hash. As the update to the function callbacks has not yet
been performed, the other ops will not have the tramp hash set yet and it
can not be used to know to decrement its nr_trampolines.

Luckily, the tramp_hash does not need to be used. As the ftrace_mutex is
held, a ops with a trampoline to a record during an update of another ops
that shares the record will have its func_hash pointing to it. Since a
trampoline can only be set for a record if only one ops is attached to it,
we can just check if the record has a trampoline (the FTRACE_FL_TRAMP flag
is set) and then find the ops that has this record in its hashes.

Also added some output to help debug when things go wrong.

Cc: stable@vger.kernel.org # 3.16+ (apply after 3.17-rc4 is out)
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 84261912 18-Aug-2014 Steven Rostedt (Red Hat) <rostedt@goodmis.org>

ftrace: Update all ftrace_ops for a ftrace_hash_ops update

When updating what an ftrace_ops traces, if it is registered (that is,
actively tracing), and that ftrace_ops uses the shared global_ops
local_hash, then we need to update all tracers that are active and
also share the global_ops' ftrace_hash_ops.

Cc: stable@vger.kernel.org # 3.16 (apply after 3.17-rc4 is out)
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 33b7f99c 15-Aug-2014 Steven Rostedt (Red Hat) <rostedt@goodmis.org>

ftrace: Allow ftrace_ops to use the hashes from other ops

Currently the top level debug file system function tracer shares its
ftrace_ops with the function graph tracer. This was thought to be fine
because the tracers are not used together, as one can only enable
function or function_graph tracer in the current_tracer file.

But that assumption proved to be incorrect. The function profiler
can use the function graph tracer when function tracing is enabled.
Since all function graph users uses the function tracing ftrace_ops
this causes a conflict and when a user enables both function profiling
as well as the function tracer it will crash ftrace and disable it.

The quick solution so far is to move them as separate ftrace_ops like
it was earlier. The problem though is to synchronize the functions that
are traced because both function and function_graph tracer are limited
by the selections made in the set_ftrace_filter and set_ftrace_notrace
files.

To handle this, a new structure is made called ftrace_ops_hash. This
structure will now hold the filter_hash and notrace_hash, and the
ftrace_ops will point to this structure. That will allow two ftrace_ops
to share the same hashes.

Since most ftrace_ops do not share the hashes, and to keep allocation
simple, the ftrace_ops structure will include both a pointer to the
ftrace_ops_hash called func_hash, as well as the structure itself,
called local_hash. When the ops are registered, the func_hash pointer
will be initialized to point to the local_hash within the ftrace_ops
structure. Some of the ftrace internal ftrace_ops will be initialized
statically. This will allow for the function and function_graph tracer
to have separate ops but still share the same hash tables that determine
what functions they trace.

Cc: stable@vger.kernel.org # 3.16 (apply after 3.17-rc4 is out)
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# dc6f03f2 24-Jul-2014 Steven Rostedt (Red Hat) <rostedt@goodmis.org>

ftrace: Add warning if tramp hash does not match nr_trampolines

After adding all the records to the tramp_hash, add a check that makes
sure that the number of records added matches the number of records
expected to match and do a WARN_ON and disable ftrace if they do
not match.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 2a0343ba 24-Jul-2014 Steven Rostedt (Red Hat) <rostedt@goodmis.org>

ftrace: Fix trampoline hash update check on rec->flags

In the loop of ftrace_save_ops_tramp_hash(), it adds all the recs
to the ops hash if the rec has only one callback attached and the
ops is connected to the rec. It gives a nasty warning and shuts down
ftrace if the rec doesn't have a trampoline set for it. But this
can happen with the following scenario:

# cd /sys/kernel/debug/tracing
# echo schedule do_IRQ > set_ftrace_filter
# mkdir instances/foo
# echo schedule > instances/foo/set_ftrace_filter
# echo function_graph > current_function
# echo function > instances/foo/current_function
# echo nop > instances/foo/current_function

The above would then trigger the following warning and disable
ftrace:

------------[ cut here ]------------
WARNING: CPU: 0 PID: 3145 at kernel/trace/ftrace.c:2212 ftrace_run_update_code+0xe4/0x15b()
Modules linked in: ipt_MASQUERADE sunrpc ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ip [...]
CPU: 1 PID: 3145 Comm: bash Not tainted 3.16.0-rc3-test+ #136
Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M., BIOS SDBLI944.86P 05/08/2007
0000000000000000 ffffffff81808a88 ffffffff81502130 0000000000000000
ffffffff81040ca1 ffff880077c08000 ffffffff810bd286 0000000000000001
ffffffff81a56830 ffff88007a041be0 ffff88007a872d60 00000000000001be
Call Trace:
[<ffffffff81502130>] ? dump_stack+0x4a/0x75
[<ffffffff81040ca1>] ? warn_slowpath_common+0x7e/0x97
[<ffffffff810bd286>] ? ftrace_run_update_code+0xe4/0x15b
[<ffffffff810bd286>] ? ftrace_run_update_code+0xe4/0x15b
[<ffffffff810bda1a>] ? ftrace_shutdown+0x11c/0x16b
[<ffffffff810bda87>] ? unregister_ftrace_function+0x1e/0x38
[<ffffffff810cc7e1>] ? function_trace_reset+0x1a/0x28
[<ffffffff810c924f>] ? tracing_set_tracer+0xc1/0x276
[<ffffffff810c9477>] ? tracing_set_trace_write+0x73/0x91
[<ffffffff81132383>] ? __sb_start_write+0x9a/0xcc
[<ffffffff8120478f>] ? security_file_permission+0x1b/0x31
[<ffffffff81130e49>] ? vfs_write+0xac/0x11c
[<ffffffff8113115d>] ? SyS_write+0x60/0x8e
[<ffffffff81508112>] ? system_call_fastpath+0x16/0x1b
---[ end trace 938c4415cbc7dc96 ]---
------------[ cut here ]------------

Link: http://lkml.kernel.org/r/20140723120805.GB21376@redhat.com

Reported-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 0162d621 23-Jul-2014 Steven Rostedt (Red Hat) <rostedt@goodmis.org>

ftrace: Rename ftrace_ops field from trampolines to nr_trampolines

Having two fields within the same struct that is off by one character
can be confusing and error prone. Rename the counter "trampolines"
to "nr_trampolines" to explicitly show it is a counter and not to
be confused by the "trampoline" field.

Suggested-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# b972cc58 14-Jul-2014 Wang Nan <wangnan0@huawei.com>

ftrace: Do not copy old hash when resetting

Do not waste time copying the old hash if the hash is going to be
reset. Just allocate a new hash and free the old one, as that is
the same result as copying te old one and then resetting it.

Link: http://lkml.kernel.org/p/1405384820-48837-1-git-send-email-wangnan0@huawei.com

Signed-off-by: Wang Nan <wangnan0@huawei.com>
[ SDR: Removed unused ftrace_filter_reset() function ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 3a636388 26-Jun-2014 Steven Rostedt (Red Hat) <rostedt@goodmis.org>

tracing: Remove function_trace_stop and HAVE_FUNCTION_TRACE_MCOUNT_TEST

All users of function_trace_stop and HAVE_FUNCTION_TRACE_MCOUNT_TEST have
been removed. We can safely remove them from the kernel.

Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 1d48d596 25-Jun-2014 Steven Rostedt (Red Hat) <rostedt@goodmis.org>

ftrace: Remove function_trace_stop check from list func

function_trace_stop is no longer used to stop function tracing.
Remove the check from __ftrace_ops_list_func().

Also, call FTRACE_WARN_ON() instead of setting function_trace_stop
if a ops has no func to call.

Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 1820122a 25-Jun-2014 Steven Rostedt (Red Hat) <rostedt@goodmis.org>

ftrace: Do no disable function tracing on enabling function tracing

When function tracing is being updated function_trace_stop is set to
keep from tracing the updates. This was fine when function tracing
was done from stop machine. But it is no longer done that way and
this can cause real tracing to be missed.

Remove it.

Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 1b2f121c 25-Jun-2014 Steven Rostedt (Red Hat) <rostedt@goodmis.org>

ftrace-graph: Remove dependency of ftrace_stop() from ftrace_graph_stop()

ftrace_stop() is going away as it disables parts of function tracing
that affects users that should not be affected. But ftrace_graph_stop()
is built on ftrace_stop(). Here's another example of killing all of
function tracing because something went wrong with function graph
tracing.

Instead of disabling all users of function tracing on function graph
error, disable only function graph tracing.

A new function is created called ftrace_graph_is_dead(). This is called
in strategic paths to prevent function graph from doing more harm and
allowing at least a warning to be printed before the system crashes.

NOTE: ftrace_stop() is still used until all the archs are converted over
to use ftrace_graph_is_dead(). After that, ftrace_stop() will be removed.

Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 646d7043 11-Jul-2014 Steven Rostedt (Red Hat) <rostedt@goodmis.org>

ftrace: Allow archs to specify if they need a separate function graph trampoline

Currently if an arch supports function graph tracing, the core code will
just assign the function graph trampoline to the function graph addr that
gets called.

But as the old method for function graph tracing always calls the function
trampoline first and that calls the function graph trampoline, some
archs may have the function graph trampoline dependent on operations that
were done in the function trampoline. This causes function graph tracer
to break on those archs.

Instead of having the default be to set the function graph ftrace_ops
to the function graph trampoline, have it instead just set it to zero
which will keep it from jumping to a trampoline that is not set up
to be jumped directly too.

Link: http://lkml.kernel.org/r/53BED155.9040607@nvidia.com

Reported-by: Tuomas Tynkkynen <ttynkkynen@nvidia.com>
Tested-by: Tuomas Tynkkynen <ttynkkynen@nvidia.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 5f8bf2d2 15-Jul-2014 Steven Rostedt (Red Hat) <rostedt@goodmis.org>

tracing: Fix graph tracer with stack tracer on other archs

Running my ftrace tests on PowerPC, it failed the test that checks
if function_graph tracer is affected by the stack tracer. It was.
Looking into this, I found that the update_function_graph_func()
must be called even if the trampoline function is not changed.
This is because archs like PowerPC do not support ftrace_ops being
passed by assembly and instead uses a helper function (what the
trampoline function points to). Since this function is not changed
even when multiple ftrace_ops are added to the code, the test that
falls out before calling update_function_graph_func() will miss that
the update must still be done.

Call update_function_graph_function() for all calls to
update_ftrace_function()

Cc: stable@vger.kernel.org # 3.3+
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 8c006cf7 13-Jun-2014 Namhyung Kim <namhyung@kernel.org>

tracing: Improve message of empty set_ftrace_notrace file

When there's no entry in set_ftrace_notrace, it'll print nothing, but
it's better to print something like below like set_graph_notrace does:

#### no functions disabled ####

Link: http://lkml.kernel.org/p/1402644246-4649-1-git-send-email-namhyung@kernel.org

Reported-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 280d1429 12-Jun-2014 Namhyung Kim <namhyung@kernel.org>

tracing: Improve message of empty set_graph_notrace file

When there's no entry in set_graph_notrace, it'll print below message

#### all functions enabled ####

While this is technically correct, it's better to print like below:

#### no functions disabled ####

Link: http://lkml.kernel.org/p/1402590233-22321-3-git-send-email-namhyung@kernel.org

Reported-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 0d7d9a16 12-Jun-2014 Namhyung Kim <namhyung@kernel.org>

tracing: Add ftrace_graph_notrace boot parameter

The ftrace_graph_notrace option is for specifying notrace filter for
function graph tracer at boot time. It can be altered after boot
using set_graph_notrace file on the debugfs.

Link: http://lkml.kernel.org/p/1402590233-22321-2-git-send-email-namhyung@kernel.org

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# ef2fbe16 11-Jun-2014 Namhyung Kim <namhyung@kernel.org>

ftrace: Do not copy hash if O_TRUNC is set

When a filter file is open for writing and O_TRUNC is set, there's no
need to copy and free the filter entries.

Link: http://lkml.kernel.org/p/1402474014-28655-2-git-send-email-namhyung@kernel.org

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 1f61be00 11-Jun-2014 Namhyung Kim <namhyung@kernel.org>

ftrace: Fix memory leak on failure path in ftrace_allocate_pages()

As struct ftrace_page is managed in a single linked list, it should
free from the start page.

Link: http://lkml.kernel.org/p/1402474014-28655-1-git-send-email-namhyung@kernel.org

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# a737e6dd 12-Jun-2014 Namhyung Kim <namhyung@kernel.org>

ftrace: Get rid of obsolete global_start_up variable

It seems like it's a leftover from commit 4104d326b670 ("ftrace:
Remove global function list and call function directly"). As it
isn't updated at all, checking its value is meaningless.

Let's get rid of it.

Link: http://lkml.kernel.org/p/1402584972-17824-1-git-send-email-namhyung@kernel.org

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 5c27c775 17-Jun-2014 Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>

ftrace: Simplify ftrace_hash_disable/enable path in ftrace_hash_move

Simplify ftrace_hash_disable/enable path in ftrace_hash_move
for hardening the process if the memory allocation failed.

Link: http://lkml.kernel.org/p/20140617110442.15167.81076.stgit@kbuild-fedora.novalocal

Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 9674b2fa 09-May-2014 Steven Rostedt (Red Hat) <rostedt@goodmis.org>

ftrace: Add trampolines to enabled_functions debug file

The enabled_functions is used to help debug the dynamic function tracing.
Adding what trampolines are attached to files is useful for debugging.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 79922b80 06-May-2014 Steven Rostedt (Red Hat) <rostedt@goodmis.org>

ftrace: Optimize function graph to be called directly

Function graph tracing is a bit different than the function tracers, as
it is processed after either the ftrace_caller or ftrace_regs_caller
and we only have one place to modify the jump to ftrace_graph_caller,
the jump needs to happen after the restore of registeres.

The function graph tracer is dependent on the function tracer, where
even if the function graph tracing is going on by itself, the save and
restore of registers is still done for function tracing regardless of
if function tracing is happening, before it calls the function graph
code.

If there's no function tracing happening, it is possible to just call
the function graph tracer directly, and avoid the wasted effort to save
and restore regs for function tracing.

This requires adding new flags to the dyn_ftrace records:

FTRACE_FL_TRAMP
FTRACE_FL_TRAMP_EN

The first is set if the count for the record is one, and the ftrace_ops
associated to that record has its own trampoline. That way the mcount code
can call that trampoline directly.

In the future, trampolines can be added to arbitrary ftrace_ops, where you
can have two or more ftrace_ops registered to ftrace (like kprobes and perf)
and if they are not tracing the same functions, then instead of doing a
loop to check all registered ftrace_ops against their hashes, just call the
ftrace_ops trampoline directly, which would call the registered ftrace_ops
function directly.

Without this patch perf showed:

0.05% hackbench [kernel.kallsyms] [k] ftrace_caller
0.05% hackbench [kernel.kallsyms] [k] arch_local_irq_save
0.05% hackbench [kernel.kallsyms] [k] native_sched_clock
0.04% hackbench [kernel.kallsyms] [k] __buffer_unlock_commit
0.04% hackbench [kernel.kallsyms] [k] preempt_trace
0.04% hackbench [kernel.kallsyms] [k] prepare_ftrace_return
0.04% hackbench [kernel.kallsyms] [k] __this_cpu_preempt_check
0.04% hackbench [kernel.kallsyms] [k] ftrace_graph_caller

See that the ftrace_caller took up more time than the ftrace_graph_caller
did.

With this patch:

0.05% hackbench [kernel.kallsyms] [k] __buffer_unlock_commit
0.04% hackbench [kernel.kallsyms] [k] call_filter_check_discard
0.04% hackbench [kernel.kallsyms] [k] ftrace_graph_caller
0.04% hackbench [kernel.kallsyms] [k] sched_clock

The ftrace_caller is no where to be found and ftrace_graph_caller still
takes up the same percentage.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 0376bde1 07-May-2014 Steven Rostedt (Red Hat) <rostedt@goodmis.org>

ftrace: Add ftrace_rec_counter() macro to simplify the code

The ftrace dynamic record has a flags element that also has a counter.
Instead of hard coding "rec->flags & ~FTRACE_FL_MASK" all over the
place. Use a macro instead.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 4fbb48cb 30-Apr-2014 Steven Rostedt (Red Hat) <rostedt@goodmis.org>

ftrace: Allow no regs if no more callbacks require it

When registering a function callback for the function tracer, the ops
can specify if it wants to save full regs (like an interrupt would)
for each function that it traces, or if it does not care about regs
and just wants to have the fastest return possible.

Once a ops has registered a function, if other ops register that
function they all will receive the regs too. That's because it does
the work once, it does it for everyone.

Now if the ops wanting regs unregisters the function so that there's
only ops left that do not care about regs, those ops will still
continue getting regs and going through the work for it on that
function. This is because the disabling of the rec counter only
sees the ops registered, and does not see the ops that are still
attached, and does not know if the current ops that are still attached
want regs or not. To play it safe, it just keeps regs being processed
until no function is registered anymore.

Instead of doing that, check the ops that are still registered for that
function and if none want regs for it anymore, then disable the
processing of regs.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# f1b2f2bd 07-May-2014 Steven Rostedt (Red Hat) <rostedt@goodmis.org>

ftrace: Remove FTRACE_UPDATE_MODIFY_CALL_REGS flag

As the decision to what needs to be done (converting a call to the
ftrace_caller to ftrace_caller_regs or to convert from ftrace_caller_regs
to ftrace_caller) can easily be determined from the rec->flags of
FTRACE_FL_REGS and FTRACE_FL_REGS_EN, there's no need to have the
ftrace_check_record() return either a UPDATE_MODIFY_CALL_REGS or a
UPDATE_MODIFY_CALL. Just he latter is enough. This added flag causes
more complexity than is required. Remove it.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 7c0868e0 08-May-2014 Steven Rostedt (Red Hat) <rostedt@goodmis.org>

ftrace: Use the ftrace_addr helper functions to find the ftrace_addr

With the moving of the functions that determine what the mcount call site
should be replaced with into the generic code, there is a few places
in the generic code that can use them instead of hard coding it as it
does.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 7413af1f 06-May-2014 Steven Rostedt (Red Hat) <rostedt@goodmis.org>

ftrace: Make get_ftrace_addr() and get_ftrace_addr_old() global

Move and rename get_ftrace_addr() and get_ftrace_addr_old() to
ftrace_get_addr_new() and ftrace_get_addr_curr() respectively.

This moves these two helper functions in the generic code out from
the arch specific code, and renames them to have a better generic
name. This will allow other archs to use them as well as makes it
a bit easier to work on getting separate trampolines for different
functions.

ftrace_get_addr_new() returns the trampoline address that the mcount
call address will be converted to.

ftrace_get_addr_curr() returns the trampoline address of what the
mcount call address currently jumps to.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 68f40969 30-Apr-2014 Steven Rostedt (Red Hat) <rostedt@goodmis.org>

ftrace: Always inline ftrace_hash_empty() helper function

The ftrace_hash_empty() function is a simple test:

return !hash || !hash->count;

But gcc seems to want to make it a call. As this is in an extreme
hot path of the function tracer, there's no reason it needs to be
a call. I only wrote it to be a helper function anyway, otherwise
it would have been inlined manually.

Force gcc to inline it, as it could have also been a macro.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 19eab4a4 07-May-2014 Steven Rostedt (Red Hat) <rostedt@goodmis.org>

ftrace: Write in missing comment from a very old commit

Back in 2011 Commit ed926f9b35cda "ftrace: Use counters to enable
functions to trace" changed the way ftrace accounts for enabled
and disabled traced functions. There was a comment started as:

/*
*
*/

But never finished. Well, that's rather useless. I probably forgot
to save the file before committing it. And it passed review from all
this time.

Anyway, better late than never. I updated the comment to express what
is happening in that somewhat complex code.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 66209a5b 06-May-2014 Steven Rostedt (Red Hat) <rostedt@goodmis.org>

ftrace: Remove boolean of hash_enable and hash_disable

Commit 4104d326b670 "ftrace: Remove global function list and call
function directly" cleaned up the global_ops filtering and made
the code simpler, but it left a variable "hash_enable" that was used
to know if the hash functions should be updated or not. It was
updated if the global_ops did not override them. As the global_ops
are now no different than any other ftrace_ops, the hash always
gets updated and there's no reason to use the hash_enable boolean.

The same goes for hash_disable used in ftrace_shutdown().

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# bdffd893 29-Apr-2014 Christoph Lameter <cl@linux.com>

tracing: Replace __get_cpu_var uses with this_cpu_ptr

Replace uses of &__get_cpu_var for address calculation with this_cpu_ptr.

Link: http://lkml.kernel.org/p/alpine.DEB.2.10.1404291415560.18364@gentwo.org

Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# fd06a549 01-May-2014 Steven Rostedt (Red Hat) <rostedt@goodmis.org>

ftrace: Have function graph tracer use global_ops for filtering

Commit 4104d326b670 "ftrace: Remove global function list and call
function directly" cleaned up the global_ops filtering and made
the code simpler. But it left out function graph filtering which
also depended on that code. The function graph filtering still
needs to use global_ops as the filter otherwise it wont filter
at all.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# a949ae56 24-Apr-2014 Steven Rostedt (Red Hat) <rostedt@goodmis.org>

ftrace/module: Hardcode ftrace_module_init() call into load_module()

A race exists between module loading and enabling of function tracer.

CPU 1 CPU 2
----- -----
load_module()
module->state = MODULE_STATE_COMING

register_ftrace_function()
mutex_lock(&ftrace_lock);
ftrace_startup()
update_ftrace_function();
ftrace_arch_code_modify_prepare()
set_all_module_text_rw();
<enables-ftrace>
ftrace_arch_code_modify_post_process()
set_all_module_text_ro();

[ here all module text is set to RO,
including the module that is
loading!! ]

blocking_notifier_call_chain(MODULE_STATE_COMING);
ftrace_init_module()

[ tries to modify code, but it's RO, and fails!
ftrace_bug() is called]

When this race happens, ftrace_bug() will produces a nasty warning and
all of the function tracing features will be disabled until reboot.

The simple solution is to treate module load the same way the core
kernel is treated at boot. To hardcode the ftrace function modification
of converting calls to mcount into nops. This is done in init/main.c
there's no reason it could not be done in load_module(). This gives
a better control of the changes and doesn't tie the state of the
module to its notifiers as much. Ftrace is special, it needs to be
treated as such.

The reason this would work, is that the ftrace_module_init() would be
called while the module is in MODULE_STATE_UNFORMED, which is ignored
by the set_all_module_text_ro() call.

Link: http://lkml.kernel.org/r/1395637826-3312-1-git-send-email-indou.takao@jp.fujitsu.com

Reported-by: Takao Indoh <indou.takao@jp.fujitsu.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: stable@vger.kernel.org # 2.6.38+
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 8d1b065d 20-Apr-2014 Jiaxing Wang <wangjiaxing@insigma.com.cn>

tracing: Fix documentation of ftrace_set_global_{filter,notrace}()

The functions ftrace_set_global_filter() and ftrace_set_global_notrace()
still have their old names in the kernel doc (ftrace_set_filter and
ftrace_set_notrace respectively). Replace these with the real names.

Link: http://lkml.kernel.org/p/1398006644-5935-3-git-send-email-wangjiaxing@insigma.com.cn

Signed-off-by: Jiaxing Wang <wangjiaxing@insigma.com.cn>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 7eea4fce 20-Apr-2014 Jiaxing Wang <wangjiaxing@insigma.com.cn>

tracing/stack_trace: Skip 4 instead of 3 when using ftrace_ops_list_func

When using ftrace_ops_list_func, we should skip 4 instead of 3,
to avoid ftrace_call+0x5/0xb appearing in the stack trace:

Depth Size Location (110 entries)
----- ---- --------
0) 2956 0 update_curr+0xe/0x1e0
1) 2956 68 ftrace_call+0x5/0xb
2) 2888 92 enqueue_entity+0x53/0xe80
3) 2796 80 enqueue_task_fair+0x47/0x7e0
4) 2716 28 enqueue_task+0x45/0x70
5) 2688 12 activate_task+0x22/0x30

Add a function using_ftrace_ops_list_func() to test for this while keeping
ftrace_ops_list_func to remain static.

Link: http://lkml.kernel.org/p/1398006644-5935-2-git-send-email-wangjiaxing@insigma.com.cn

Signed-off-by: Jiaxing Wang <wangjiaxing@insigma.com.cn>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 8275f69f 30-Mar-2014 Mathias Krause <minipli@googlemail.com>

ftrace: Statically initialize pm notifier block

Instead of initializing the pm notifier block in register_ftrace_graph(),
initialize it statically. This safes us some code.

Found in the PaX patch, written by the PaX Team.

Link: http://lkml.kernel.org/p/1396186310-3156-1-git-send-email-minipli@googlemail.com

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: PaX Team <pageexec@freemail.hu>
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 4104d326 10-Jan-2014 Steven Rostedt (Red Hat) <rostedt@goodmis.org>

ftrace: Remove global function list and call function directly

Instead of having a list of global functions that are called,
as only one global function is allow to be enabled at a time, there's
no reason to have a list.

Instead, simply have all the users of the global ops, use the global ops
directly, instead of registering their own ftrace_ops. Just switch what
function is used before enabling the function tracer.

This removes a lot of code as well as the complexity involved with it.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# d88471cb 09-Jan-2013 Sasha Levin <sasha.levin@oracle.com>

ftrace: Constify ftrace_text_reserved

Link: http://lkml.kernel.org/r/1357772960-4436-5-git-send-email-sasha.levin@oracle.com

Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# db0fbadc 10-Mar-2014 Jiri Slaby <jirislaby@kernel.org>

ftrace: Fix compilation warning about control_ops_free

With CONFIG_DYNAMIC_FTRACE=n, I see a warning:
kernel/trace/ftrace.c:240:13: warning: 'control_ops_free' defined but not used
static void control_ops_free(struct ftrace_ops *ops)
^
Move that function around to an already existing #ifdef
CONFIG_DYNAMIC_FTRACE block as the function is used solely from the
dynamic function tracing functions.

Link: http://lkml.kernel.org/r/1394484131-5107-1-git-send-email-jslaby@suse.cz

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# cd21067f 24-Feb-2014 Petr Mladek <pmladek@suse.cz>

ftrace: Warn on error when modifying ftrace function

We should print some warning and kill ftrace functionality when the ftrace
function is not set correctly. Otherwise, ftrace might do crazy things without
an explanation. The error value has been ignored so far.

Note that an error that happens during updating all the traced calls is handled
in ftrace_replace_code(). We print more details about the particular
failing address via ftrace_bug() there.

Link: http://lkml.kernel.org/r/1393258342-29978-3-git-send-email-pmladek@suse.cz

Signed-off-by: Petr Mladek <pmladek@suse.cz>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 3a36cb11 24-Feb-2014 Jiri Slaby <jirislaby@kernel.org>

ftrace: Do not pass data to ftrace_dyn_arch_init

As the data parameter is not really used by any ftrace_dyn_arch_init,
remove that from ftrace_dyn_arch_init. This also removes the addr
local variable from ftrace_init which is now unused.

Note the documentation was imprecise as it did not suggest to set
(*data) to 0.

Link: http://lkml.kernel.org/r/1393268401-24379-4-git-send-email-jslaby@suse.cz

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: linux-arch@vger.kernel.org
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# af64a7cb 24-Feb-2014 Jiri Slaby <jirislaby@kernel.org>

ftrace: Pass retval through return in ftrace_dyn_arch_init()

No architecture uses the "data" parameter in ftrace_dyn_arch_init() in any
way, it just sets the value to 0. And this is used as a return value
in the caller -- ftrace_init, which just checks the retval against
zero.

Note there is also "return 0" in every ftrace_dyn_arch_init. So it is
enough to check the retval and remove all the indirect sets of data on
all archs.

Link: http://lkml.kernel.org/r/1393268401-24379-3-git-send-email-jslaby@suse.cz

Cc: linux-arch@vger.kernel.org
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# c867ccd8 24-Feb-2014 Jiri Slaby <jirislaby@kernel.org>

ftrace: Inline the code from ftrace_dyn_table_alloc()

The function used to do allocations some time ago. This no longer
happens and it only checks the count and prints some info. This patch
inlines the body to the only caller. There are two reasons:
* the name of the function was misleading
* it's clear what is going on in ftrace_init now

Link: http://lkml.kernel.org/r/1393268401-24379-2-git-send-email-jslaby@suse.cz

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 1dc43cf0 24-Feb-2014 Jiri Slaby <jirislaby@kernel.org>

ftrace: Cleanup of global variables ftrace_new_pgs and ftrace_update_cnt

Some of them can be local to functions, so make them local and pass
them as parameters where needed:
* __start_mcount_loc+__stop_mcount_loc are local to ftrace_init
* ftrace_new_pgs -> new_pgs/start_pg
* ftrace_update_cnt -> local update_cnt in ftrace_update_code

Link: http://lkml.kernel.org/r/1393268401-24379-1-git-send-email-jslaby@suse.cz

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 1fcc1553 19-Feb-2014 Steven Rostedt (Red Hat) <rostedt@goodmis.org>

ftrace: Have static function trace clear ENABLED flag on unregister

The ENABLED flag needs to be cleared when a ftrace_ops is unregistered
otherwise it wont be able to be registered again.

This is only for static tracing and does not affect DYNAMIC_FTRACE at
all.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 591dffda 10-Jan-2014 Steven Rostedt (Red Hat) <rostedt@goodmis.org>

ftrace: Allow for function tracing instance to filter functions

Create a "set_ftrace_filter" and "set_ftrace_notrace" files in the instance
directories to let users filter of functions to trace for the given instance.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# e3b3e2e8 11-Nov-2013 Steven Rostedt (Red Hat) <rostedt@goodmis.org>

ftrace: Pass in global_ops for use with filtering files

In preparation for having the function tracing instances be able to
filter on functions, the generic filter functions must first be
converted to take in the global_ops as a parameter.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# e6435e96 10-Jan-2014 Steven Rostedt (Red Hat) <rostedt@goodmis.org>

ftrace: Copy ops private to global_ops private

If global_ops function is being called directly, instead of the global_ops
list function, set the global_ops private to be the same as the ops private
that's being called directly.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# a4c35ed2 12-Jan-2014 Steven Rostedt (Red Hat) <rostedt@goodmis.org>

ftrace: Fix synchronization location disabling and freeing ftrace_ops

The synchronization needed after ftrace_ops are unregistered must happen
after the callback is disabled from becing called by functions.

The current location happens after the function is being removed from the
internal lists, but not after the function callbacks were disabled, leaving
the functions susceptible of being called after their callbacks are freed.

This affects perf and any externel users of function tracing (LTTng and
SystemTap).

Cc: stable@vger.kernel.org # 3.0+
Fixes: cdbe61bfe704 "ftrace: Allow dynamically allocated function tracers"
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 23a8e844 13-Jan-2014 Steven Rostedt (Red Hat) <rostedt@goodmis.org>

ftrace: Have function graph only trace based on global_ops filters

Doing some different tests, I discovered that function graph tracing, when
filtered via the set_ftrace_filter and set_ftrace_notrace files, does
not always keep with them if another function ftrace_ops is registered
to trace functions.

The reason is that function graph just happens to trace all functions
that the function tracer enables. When there was only one user of
function tracing, the function graph tracer did not need to worry about
being called by functions that it did not want to trace. But now that there
are other users, this becomes a problem.

For example, one just needs to do the following:

# cd /sys/kernel/debug/tracing
# echo schedule > set_ftrace_filter
# echo function_graph > current_tracer
# cat trace
[..]
0) | schedule() {
------------------------------------------
0) <idle>-0 => rcu_pre-7
------------------------------------------

0) ! 2980.314 us | }
0) | schedule() {
------------------------------------------
0) rcu_pre-7 => <idle>-0
------------------------------------------

0) + 20.701 us | }

# echo 1 > /proc/sys/kernel/stack_tracer_enabled
# cat trace
[..]
1) + 20.825 us | }
1) + 21.651 us | }
1) + 30.924 us | } /* SyS_ioctl */
1) | do_page_fault() {
1) | __do_page_fault() {
1) 0.274 us | down_read_trylock();
1) 0.098 us | find_vma();
1) | handle_mm_fault() {
1) | _raw_spin_lock() {
1) 0.102 us | preempt_count_add();
1) 0.097 us | do_raw_spin_lock();
1) 2.173 us | }
1) | do_wp_page() {
1) 0.079 us | vm_normal_page();
1) 0.086 us | reuse_swap_page();
1) 0.076 us | page_move_anon_rmap();
1) | unlock_page() {
1) 0.082 us | page_waitqueue();
1) 0.086 us | __wake_up_bit();
1) 1.801 us | }
1) 0.075 us | ptep_set_access_flags();
1) | _raw_spin_unlock() {
1) 0.098 us | do_raw_spin_unlock();
1) 0.105 us | preempt_count_sub();
1) 1.884 us | }
1) 9.149 us | }
1) + 13.083 us | }
1) 0.146 us | up_read();

When the stack tracer was enabled, it enabled all functions to be traced, which
now the function graph tracer also traces. This is a side effect that should
not occur.

To fix this a test is added when the function tracing is changed, as well as when
the graph tracer is enabled, to see if anything other than the ftrace global_ops
function tracer is enabled. If so, then the graph tracer calls a test trampoline
that will look at the function that is being traced and compare it with the
filters defined by the global_ops.

As an optimization, if there's no other function tracers registered, or if
the only registered function tracers also use the global ops, the function
graph infrastructure will call the registered function graph callback directly
and not go through the test trampoline.

Cc: stable@vger.kernel.org # 3.3+
Fixes: d2d45c7a03a2 "tracing: Have stack_tracer use a separate list of functions"
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 405e1d83 08-Nov-2013 Steven Rostedt (Red Hat) <rostedt@goodmis.org>

ftrace: Synchronize setting function_trace_op with ftrace_trace_function

ftrace_trace_function is a variable that holds what function will be called
directly by the assembly code (mcount). If just a single function is
registered and it handles recursion itself, then the assembly will call that
function directly without any helper function. It also passes in the
ftrace_op that was registered with the callback. The ftrace_op to send is
stored in the function_trace_op variable.

The ftrace_trace_function and function_trace_op needs to be coordinated such
that the called callback wont be called with the wrong ftrace_op, otherwise
bad things can happen if it expected a different op. Luckily, there's no
callback that doesn't use the helper functions that requires this. But
there soon will be and this needs to be fixed.

Use a set_function_trace_op to store the ftrace_op to set the
function_trace_op to when it is safe to do so (during the update function
within the breakpoint or stop machine calls). Or if dynamic ftrace is not
being used (static tracing) then we have to do a bit more synchronization
when the ftrace_trace_function is set as that takes affect immediately
(as oppose to dynamic ftrace doing it with the modification of the trampoline).

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 098c879e 21-Dec-2013 Steven Rostedt (Red Hat) <rostedt@goodmis.org>

tracing: Add generic tracing_lseek() function

Trace event triggers added a lseek that uses the ftrace_filter_lseek()
function. Unfortunately, when function tracing is not configured in
that function is not defined and the kernel fails to build.

This is the second time that function was added to a file ops and
it broke the build due to requiring special config dependencies.

Make a generic tracing_lseek() that all the tracing utilities may
use.

Also, modify the old ftrace_filter_lseek() to return 0 instead of
1 on WRONLY. Not sure why it was a 1 as that does not make sense.

This also changes the old tracing_seek() to modify the file pos
pointer on WRONLY as well.

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Tested-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Acked-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# c4602c1c 16-Dec-2013 Miao Xie <miaox@cn.fujitsu.com>

ftrace: Initialize the ftrace profiler for each possible cpu

Ftrace currently initializes only the online CPUs. This implementation has
two problems:
- If we online a CPU after we enable the function profile, and then run the
test, we will lose the trace information on that CPU.
Steps to reproduce:
# echo 0 > /sys/devices/system/cpu/cpu1/online
# cd <debugfs>/tracing/
# echo <some function name> >> set_ftrace_filter
# echo 1 > function_profile_enabled
# echo 1 > /sys/devices/system/cpu/cpu1/online
# run test
- If we offline a CPU before we enable the function profile, we will not clear
the trace information when we enable the function profile. It will trouble
the users.
Steps to reproduce:
# cd <debugfs>/tracing/
# echo <some function name> >> set_ftrace_filter
# echo 1 > function_profile_enabled
# run test
# cat trace_stat/function*
# echo 0 > /sys/devices/system/cpu/cpu1/online
# echo 0 > function_profile_enabled
# echo 1 > function_profile_enabled
# cat trace_stat/function*
# run test
# cat trace_stat/function*

So it is better that we initialize the ftrace profiler for each possible cpu
every time we enable the function profile instead of just the online ones.

Link: http://lkml.kernel.org/r/1387178401-10619-1-git-send-email-miaox@cn.fujitsu.com

Cc: stable@vger.kernel.org # 2.6.31+
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 8a56d776 25-Nov-2013 Steven Rostedt (Red Hat) <rostedt@goodmis.org>

ftrace: Fix function graph with loading of modules

Commit 8c4f3c3fa9681 "ftrace: Check module functions being traced on reload"
fixed module loading and unloading with respect to function tracing, but
it missed the function graph tracer. If you perform the following

# cd /sys/kernel/debug/tracing
# echo function_graph > current_tracer
# modprobe nfsd
# echo nop > current_tracer

You'll get the following oops message:

------------[ cut here ]------------
WARNING: CPU: 2 PID: 2910 at /linux.git/kernel/trace/ftrace.c:1640 __ftrace_hash_rec_update.part.35+0x168/0x1b9()
Modules linked in: nfsd exportfs nfs_acl lockd ipt_MASQUERADE sunrpc ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables uinput snd_hda_codec_idt
CPU: 2 PID: 2910 Comm: bash Not tainted 3.13.0-rc1-test #7
Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M., BIOS SDBLI944.86P 05/08/2007
0000000000000668 ffff8800787efcf8 ffffffff814fe193 ffff88007d500000
0000000000000000 ffff8800787efd38 ffffffff8103b80a 0000000000000668
ffffffff810b2b9a ffffffff81a48370 0000000000000001 ffff880037aea000
Call Trace:
[<ffffffff814fe193>] dump_stack+0x4f/0x7c
[<ffffffff8103b80a>] warn_slowpath_common+0x81/0x9b
[<ffffffff810b2b9a>] ? __ftrace_hash_rec_update.part.35+0x168/0x1b9
[<ffffffff8103b83e>] warn_slowpath_null+0x1a/0x1c
[<ffffffff810b2b9a>] __ftrace_hash_rec_update.part.35+0x168/0x1b9
[<ffffffff81502f89>] ? __mutex_lock_slowpath+0x364/0x364
[<ffffffff810b2cc2>] ftrace_shutdown+0xd7/0x12b
[<ffffffff810b47f0>] unregister_ftrace_graph+0x49/0x78
[<ffffffff810c4b30>] graph_trace_reset+0xe/0x10
[<ffffffff810bf393>] tracing_set_tracer+0xa7/0x26a
[<ffffffff810bf5e1>] tracing_set_trace_write+0x8b/0xbd
[<ffffffff810c501c>] ? ftrace_return_to_handler+0xb2/0xde
[<ffffffff811240a8>] ? __sb_end_write+0x5e/0x5e
[<ffffffff81122aed>] vfs_write+0xab/0xf6
[<ffffffff8150a185>] ftrace_graph_caller+0x85/0x85
[<ffffffff81122dbd>] SyS_write+0x59/0x82
[<ffffffff8150a185>] ftrace_graph_caller+0x85/0x85
[<ffffffff8150a2d2>] system_call_fastpath+0x16/0x1b
---[ end trace 940358030751eafb ]---

The above mentioned commit didn't go far enough. Well, it covered the
function tracer by adding checks in __register_ftrace_function(). The
problem is that the function graph tracer circumvents that (for a slight
efficiency gain when function graph trace is running with a function
tracer. The gain was not worth this).

The problem came with ftrace_startup() which should always be called after
__register_ftrace_function(), if you want this bug to be completely fixed.

Anyway, this solution moves __register_ftrace_function() inside of
ftrace_startup() and removes the need to call them both.

Reported-by: Dave Wysochanski <dwysocha@redhat.com>
Fixes: ed926f9b35cd ("ftrace: Use counters to enable functions to trace")
Cc: stable@vger.kernel.org # 3.0+
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 38de93ab 24-Oct-2013 Tom Zanussi <tom.zanussi@linux.intel.com>

tracing: Make register/unregister_ftrace_command __init

register/unregister_ftrace_command() are only ever called from __init
functions, so can themselves be made __init.

Also make register_snapshot_cmd() __init for the same reason.

Link: http://lkml.kernel.org/r/d4042c8cadb7ae6f843ac9a89a24e1c6a3099727.1382620672.git.tom.zanussi@linux.intel.com

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# b5aa3a47 04-Nov-2013 Steven Rostedt (Red Hat) <rostedt@goodmis.org>

ftrace: Have control op function callback only trace when RCU is watching

Dave Jones reported that trinity would be able to trigger the following
back trace:

===============================
[ INFO: suspicious RCU usage. ]
3.10.0-rc2+ #38 Not tainted
-------------------------------
include/linux/rcupdate.h:771 rcu_read_lock() used illegally while idle!
other info that might help us debug this:

RCU used illegally from idle CPU! rcu_scheduler_active = 1, debug_locks = 0
RCU used illegally from extended quiescent state!
1 lock held by trinity-child1/18786:
#0: (rcu_read_lock){.+.+..}, at: [<ffffffff8113dd48>] __perf_event_overflow+0x108/0x310
stack backtrace:
CPU: 3 PID: 18786 Comm: trinity-child1 Not tainted 3.10.0-rc2+ #38
0000000000000000 ffff88020767bac8 ffffffff816e2f6b ffff88020767baf8
ffffffff810b5897 ffff88021de92520 0000000000000000 ffff88020767bbf8
0000000000000000 ffff88020767bb78 ffffffff8113ded4 ffffffff8113dd48
Call Trace:
[<ffffffff816e2f6b>] dump_stack+0x19/0x1b
[<ffffffff810b5897>] lockdep_rcu_suspicious+0xe7/0x120
[<ffffffff8113ded4>] __perf_event_overflow+0x294/0x310
[<ffffffff8113dd48>] ? __perf_event_overflow+0x108/0x310
[<ffffffff81309289>] ? __const_udelay+0x29/0x30
[<ffffffff81076054>] ? __rcu_read_unlock+0x54/0xa0
[<ffffffff816f4000>] ? ftrace_call+0x5/0x2f
[<ffffffff8113dfa1>] perf_swevent_overflow+0x51/0xe0
[<ffffffff8113e08f>] perf_swevent_event+0x5f/0x90
[<ffffffff8113e1c9>] perf_tp_event+0x109/0x4f0
[<ffffffff8113e36f>] ? perf_tp_event+0x2af/0x4f0
[<ffffffff81074630>] ? __rcu_read_lock+0x20/0x20
[<ffffffff8112d79f>] perf_ftrace_function_call+0xbf/0xd0
[<ffffffff8110e1e1>] ? ftrace_ops_control_func+0x181/0x210
[<ffffffff81074630>] ? __rcu_read_lock+0x20/0x20
[<ffffffff81100cae>] ? rcu_eqs_enter_common+0x5e/0x470
[<ffffffff8110e1e1>] ftrace_ops_control_func+0x181/0x210
[<ffffffff816f4000>] ftrace_call+0x5/0x2f
[<ffffffff8110e229>] ? ftrace_ops_control_func+0x1c9/0x210
[<ffffffff816f4000>] ? ftrace_call+0x5/0x2f
[<ffffffff81074635>] ? debug_lockdep_rcu_enabled+0x5/0x40
[<ffffffff81074635>] ? debug_lockdep_rcu_enabled+0x5/0x40
[<ffffffff81100cae>] ? rcu_eqs_enter_common+0x5e/0x470
[<ffffffff8110112a>] rcu_eqs_enter+0x6a/0xb0
[<ffffffff81103673>] rcu_user_enter+0x13/0x20
[<ffffffff8114541a>] user_enter+0x6a/0xd0
[<ffffffff8100f6d8>] syscall_trace_leave+0x78/0x140
[<ffffffff816f46af>] int_check_syscall_exit_work+0x34/0x3d
------------[ cut here ]------------

Perf uses rcu_read_lock() but as the function tracer can trace functions
even when RCU is not currently active, this makes the rcu_read_lock()
used by perf ineffective.

As perf is currently the only user of the ftrace_ops_control_func() and
perf is also the only function callback that actively uses rcu_read_lock(),
the quick fix is to prevent the ftrace_ops_control_func() from calling
its callbacks if RCU is not active.

With Paul's new "rcu_is_watching()" we can tell if RCU is active or not.

Reported-by: Dave Jones <davej@redhat.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 29ad23b0 14-Oct-2013 Namhyung Kim <namhyung.kim@lge.com>

ftrace: Add set_graph_notrace filter

The set_graph_notrace filter is analogous to set_ftrace_notrace and
can be used for eliminating uninteresting part of function graph trace
output. It also works with set_graph_function nicely.

# cd /sys/kernel/debug/tracing/
# echo do_page_fault > set_graph_function
# perf ftrace live true
2) | do_page_fault() {
2) | __do_page_fault() {
2) 0.381 us | down_read_trylock();
2) 0.055 us | __might_sleep();
2) 0.696 us | find_vma();
2) | handle_mm_fault() {
2) | handle_pte_fault() {
2) | __do_fault() {
2) | filemap_fault() {
2) | find_get_page() {
2) 0.033 us | __rcu_read_lock();
2) 0.035 us | __rcu_read_unlock();
2) 1.696 us | }
2) 0.031 us | __might_sleep();
2) 2.831 us | }
2) | _raw_spin_lock() {
2) 0.046 us | add_preempt_count();
2) 0.841 us | }
2) 0.033 us | page_add_file_rmap();
2) | _raw_spin_unlock() {
2) 0.057 us | sub_preempt_count();
2) 0.568 us | }
2) | unlock_page() {
2) 0.084 us | page_waitqueue();
2) 0.126 us | __wake_up_bit();
2) 1.117 us | }
2) 7.729 us | }
2) 8.397 us | }
2) 8.956 us | }
2) 0.085 us | up_read();
2) + 12.745 us | }
2) + 13.401 us | }
...

# echo handle_mm_fault > set_graph_notrace
# perf ftrace live true
1) | do_page_fault() {
1) | __do_page_fault() {
1) 0.205 us | down_read_trylock();
1) 0.041 us | __might_sleep();
1) 0.344 us | find_vma();
1) 0.069 us | up_read();
1) 4.692 us | }
1) 5.311 us | }
...

Link: http://lkml.kernel.org/r/1381739066-7531-5-git-send-email-namhyung@kernel.org

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 6a10108b 14-Oct-2013 Namhyung Kim <namhyung.kim@lge.com>

ftrace: Narrow down the protected area of graph_lock

The parser set up is just a generic utility that uses local variables
allocated by the function. There's no need to hold the graph_lock for
this set up.

This also makes the code simpler.

Link: http://lkml.kernel.org/r/1381739066-7531-4-git-send-email-namhyung@kernel.org

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# faf982a6 14-Oct-2013 Namhyung Kim <namhyung.kim@lge.com>

ftrace: Introduce struct ftrace_graph_data

The struct ftrace_graph_data is for generalizing the access to
set_graph_function file. This is a preparation for adding support to
set_graph_notrace.

Link: http://lkml.kernel.org/r/1381739066-7531-3-git-send-email-namhyung@kernel.org

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 9aa72b4b 14-Oct-2013 Namhyung Kim <namhyung.kim@lge.com>

ftrace: Get rid of ftrace_graph_filter_enabled

The ftrace_graph_filter_enabled means that user sets function filter
and it always has same meaning of ftrace_graph_count > 0.

Link: http://lkml.kernel.org/r/1381739066-7531-2-git-send-email-namhyung@kernel.org

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 59338f75 30-Aug-2013 Steven Rostedt (Red Hat) <rostedt@goodmis.org>

ftrace: Fix a slight race in modifying what function callback gets traced

There's a slight race when going from a list function to a non list
function. That is, when only one callback is registered to the function
tracer, it gets called directly by the mcount trampoline. But if this
function has filters, it may be called by the wrong functions.

As the list ops callback that handles multiple callbacks that are
registered to ftrace, it also handles what functions they call. While
the transaction is taking place, use the list function always, and
after all the updates are finished (only the functions that should be
traced are being traced), then we can update the trampoline to call
the function directly.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 8c4f3c3f 29-Jul-2013 Steven Rostedt (Red Hat) <rostedt@goodmis.org>

ftrace: Check module functions being traced on reload

There's been a nasty bug that would show up and not give much info.
The bug displayed the following warning:

WARNING: at kernel/trace/ftrace.c:1529 __ftrace_hash_rec_update+0x1e3/0x230()
Pid: 20903, comm: bash Tainted: G O 3.6.11+ #38405.trunk
Call Trace:
[<ffffffff8103e5ff>] warn_slowpath_common+0x7f/0xc0
[<ffffffff8103e65a>] warn_slowpath_null+0x1a/0x20
[<ffffffff810c2ee3>] __ftrace_hash_rec_update+0x1e3/0x230
[<ffffffff810c4f28>] ftrace_hash_move+0x28/0x1d0
[<ffffffff811401cc>] ? kfree+0x2c/0x110
[<ffffffff810c68ee>] ftrace_regex_release+0x8e/0x150
[<ffffffff81149f1e>] __fput+0xae/0x220
[<ffffffff8114a09e>] ____fput+0xe/0x10
[<ffffffff8105fa22>] task_work_run+0x72/0x90
[<ffffffff810028ec>] do_notify_resume+0x6c/0xc0
[<ffffffff8126596e>] ? trace_hardirqs_on_thunk+0x3a/0x3c
[<ffffffff815c0f88>] int_signal+0x12/0x17
---[ end trace 793179526ee09b2c ]---

It was finally narrowed down to unloading a module that was being traced.

It was actually more than that. When functions are being traced, there's
a table of all functions that have a ref count of the number of active
tracers attached to that function. When a function trace callback is
registered to a function, the function's record ref count is incremented.
When it is unregistered, the function's record ref count is decremented.
If an inconsistency is detected (ref count goes below zero) the above
warning is shown and the function tracing is permanently disabled until
reboot.

The ftrace callback ops holds a hash of functions that it filters on
(and/or filters off). If the hash is empty, the default means to filter
all functions (for the filter_hash) or to disable no functions (for the
notrace_hash).

When a module is unloaded, it frees the function records that represent
the module functions. These records exist on their own pages, that is
function records for one module will not exist on the same page as
function records for other modules or even the core kernel.

Now when a module unloads, the records that represents its functions are
freed. When the module is loaded again, the records are recreated with
a default ref count of zero (unless there's a callback that traces all
functions, then they will also be traced, and the ref count will be
incremented).

The problem is that if an ftrace callback hash includes functions of the
module being unloaded, those hash entries will not be removed. If the
module is reloaded in the same location, the hash entries still point
to the functions of the module but the module's ref counts do not reflect
that.

With the help of Steve and Joern, we found a reproducer:

Using uinput module and uinput_release function.

cd /sys/kernel/debug/tracing
modprobe uinput
echo uinput_release > set_ftrace_filter
echo function > current_tracer
rmmod uinput
modprobe uinput
# check /proc/modules to see if loaded in same addr, otherwise try again
echo nop > current_tracer

[BOOM]

The above loads the uinput module, which creates a table of functions that
can be traced within the module.

We add uinput_release to the filter_hash to trace just that function.

Enable function tracincg, which increments the ref count of the record
associated to uinput_release.

Remove uinput, which frees the records including the one that represents
uinput_release.

Load the uinput module again (and make sure it's at the same address).
This recreates the function records all with a ref count of zero,
including uinput_release.

Disable function tracing, which will decrement the ref count for uinput_release
which is now zero because of the module removal and reload, and we have
a mismatch (below zero ref count).

The solution is to check all currently tracing ftrace callbacks to see if any
are tracing any of the module's functions when a module is loaded (it already does
that with callbacks that trace all functions). If a callback happens to have
a module function being traced, it increments that records ref count and starts
tracing that function.

There may be a strange side effect with this, where tracing module functions
on unload and then reloading a new module may have that new module's functions
being traced. This may be something that confuses the user, but it's not
a big deal. Another approach is to disable all callback hashes on module unload,
but this leaves some ftrace callbacks that may not be registered, but can
still have hashes tracing the module's function where ftrace doesn't know about
it. That situation can cause the same bug. This solution solves that case too.
Another benefit of this solution, is it is possible to trace a module's
function on unload and load.

Link: http://lkml.kernel.org/r/20130705142629.GA325@redhat.com

Reported-by: Jörn Engel <joern@logfs.org>
Reported-by: Dave Jones <davej@redhat.com>
Reported-by: Steve Hodgson <steve@purestorage.com>
Tested-by: Steve Hodgson <steve@purestorage.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 1c80c432 25-Jul-2013 Steven Rostedt (Red Hat) <rostedt@goodmis.org>

ftrace: Consolidate some duplicate code for updating ftrace ops

When ftrace ops modifies the functions that it will trace, the update
to the function mcount callers may need to be modified. Consolidate
the two places that do the checks to see if an update is required
with a wrapper function for those checks.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 195a8afc 23-Jul-2013 Steven Rostedt (Red Hat) <rostedt@goodmis.org>

ftrace: Add check for NULL regs if ops has SAVE_REGS set

If a ftrace ops is registered with the SAVE_REGS flag set, and there's
already a ops registered to one of its functions but without the
SAVE_REGS flag, there's a small race window where the SAVE_REGS ops gets
added to the list of callbacks to call for that function before the
callback trampoline gets set to save the regs.

The problem is, the function is not currently saving regs, which opens
a small race window where the ops that is expecting regs to be passed
to it, wont. This can cause a crash if the callback were to reference
the regs, as the SAVE_REGS guarantees that regs will be set.

To fix this, we add a check in the loop case where it checks if the ops
has the SAVE_REGS flag set, and if so, it will ignore it if regs is
not set.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# f1ed7c74 27-Jun-2013 Steven Rostedt (Red Hat) <rostedt@goodmis.org>

ftrace: Do not run selftest if command line parameter is set

If the kernel command line ftrace filter parameters are set
(ftrace_filter or ftrace_notrace), force the function self test to
pass, with a warning why it was forced.

If the user adds a filter to the kernel command line, it is assumed
that they know what they are doing, and the self test should just not
run instead of failing (which disables function tracing) or clearing
the filter, as that will probably annoy the user.

If the user wants the selftest to run, the message will tell them why
it did not.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 52d85d76 11-Jun-2013 Juri Lelli <juri.lelli@gmail.com>

ftrace: Fix stddev calculation in function profiler

When FUNCTION_GRAPH_TRACER is enabled, ftrace can profile kernel functions
and print basic statistics about them. Unfortunately, running stddev
calculation is wrong. This patch corrects it implementing Welford’s method:

s^2 = 1 / (n * (n-1)) * (n * \Sum (x_i)^2 - (\Sum x_i)^2) .
Link: http://lkml.kernel.org/r/1371031398-24048-1-git-send-email-juri.lelli@gmail.com

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Juri Lelli <juri.lelli@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 7614c3dc 28-May-2013 Steven Rostedt <rostedt@goodmis.org>

ftrace: Use schedule_on_each_cpu() as a heavy synchronize_sched()

The function tracer uses preempt_disable/enable_notrace() for
synchronization between reading registered ftrace_ops and unregistering
them.

Most of the ftrace_ops are global permanent structures that do not
require this synchronization. That is, ops may be added and removed from
the hlist but are never freed, and wont hurt if a synchronization is
missed.

But this is not true for dynamically created ftrace_ops or control_ops,
which are used by the perf function tracing.

The problem here is that the function tracer can be used to trace
kernel/user context switches as well as going to and from idle.
Basically, it can be used to trace blind spots of the RCU subsystem.
This means that even though preempt_disable() is done, a
synchronize_sched() will ignore CPUs that haven't made it out of user
space or idle. These can include functions that are being traced just
before entering or exiting the kernel sections.

To implement the RCU synchronization, instead of using
synchronize_sched() the use of schedule_on_each_cpu() is performed. This
means that when a dynamically allocated ftrace_ops, or a control ops is
being unregistered, all CPUs must be touched and execute a ftrace_sync()
stub function via the work queues. This will rip CPUs out from idle or
in dynamic tick mode. This only happens when a user disables perf
function tracing or other dynamically allocated function tracers, but it
allows us to continue to debug RCU and context tracking with function
tracing.

Link: http://lkml.kernel.org/r/1369785676.15552.55.camel@gandalf.local.home

Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 1bb539ca 28-May-2013 Steven Rostedt <rostedt@goodmis.org>

ftrace: Use the rcu _notrace variants for rcu_dereference_raw() and friends

As rcu_dereference_raw() under RCU debug config options can add quite a
bit of checks, and that tracing uses rcu_dereference_raw(), these checks
happen with the function tracer. The function tracer also happens to trace
these debug checks too. This added overhead can livelock the system.

Have the function tracer use the new RCU _notrace equivalents that do
not do the debug checks for RCU.

Link: http://lkml.kernel.org/r/20130528184209.467603904@goodmis.org

Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 19dd603e 09-May-2013 Steven Rostedt (Red Hat) <rostedt@goodmis.org>

ftrace: Fix function probe when more than one probe is added

When the first function probe is added and the function tracer
is updated the functions are modified to call the probe.
But when a second function is added, it updates the function
records to have the second function also update, but it fails
to update the actual function itself.

This prevents the second (or third or forth and so on) probes
from having their functions called.

# echo vfs_symlink:enable_event:sched:sched_switch > set_ftrace_filter
# echo vfs_unlink:enable_event:sched:sched_switch > set_ftrace_filter
# cat trace
# tracer: nop
#
# entries-in-buffer/entries-written: 0/0 #P:4
#
# _-----=> irqs-off
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# ||| / delay
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
# | | | |||| | |
# touch /tmp/a
# rm /tmp/a
# cat trace
# tracer: nop
#
# entries-in-buffer/entries-written: 0/0 #P:4
#
# _-----=> irqs-off
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# ||| / delay
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
# | | | |||| | |
# ln -s /tmp/a
# cat trace
# tracer: nop
#
# entries-in-buffer/entries-written: 414/414 #P:4
#
# _-----=> irqs-off
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# ||| / delay
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
# | | | |||| | |
<idle>-0 [000] d..3 2847.923031: sched_switch: prev_comm=swapper/0 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=bash next_pid=2786 next_prio=120
<...>-3114 [001] d..4 2847.923035: sched_switch: prev_comm=ln prev_pid=3114 prev_prio=120 prev_state=x ==> next_comm=swapper/1 next_pid=0 next_prio=120
bash-2786 [000] d..3 2847.923535: sched_switch: prev_comm=bash prev_pid=2786 prev_prio=120 prev_state=S ==> next_comm=kworker/0:1 next_pid=34 next_prio=120
kworker/0:1-34 [000] d..3 2847.923552: sched_switch: prev_comm=kworker/0:1 prev_pid=34 prev_prio=120 prev_state=S ==> next_comm=swapper/0 next_pid=0 next_prio=120
<idle>-0 [002] d..3 2847.923554: sched_switch: prev_comm=swapper/2 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=sshd next_pid=2783 next_prio=120
sshd-2783 [002] d..3 2847.923660: sched_switch: prev_comm=sshd prev_pid=2783 prev_prio=120 prev_state=S ==> next_comm=swapper/2 next_pid=0 next_prio=120

Still need to update the functions even though the probe itself
does not need to be registered again when added a new probe.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 23ea9c4d 09-May-2013 Steven Rostedt (Red Hat) <rostedt@goodmis.org>

ftrace: Fix the output of enabled_functions debug file

The enabled_functions debugfs file was created to be able to see
what functions have been modified from nops to calling a tracer.

The current method uses the counter in the function record.
As when a ftrace_ops is registered to a function, its count
increases. But that doesn't mean that the function is actively
being traced. /proc/sys/kernel/ftrace_enabled can be set to zero
which would disable it, as well as something can go wrong and
we can think its enabled when only the counter is set.

The record's FTRACE_FL_ENABLED flag is set or cleared when its
function is modified. That is a much more accurate way of knowing
what function is enabled or not.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 5ae0bf59 09-May-2013 Steven Rostedt (Red Hat) <rostedt@goodmis.org>

ftrace: Fix locking in register_ftrace_function_probe()

The iteration of the ftrace function list and the call to
ftrace_match_record() need to be protected by the ftrace_lock.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 3f2367ba 08-May-2013 Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>

ftrace: Cleanup regex_lock and ftrace_lock around hash updating

Cleanup regex_lock and ftrace_lock locking points around
ftrace_ops hash update code.

The new rule is that regex_lock protects ops->*_hash
read-update-write code for each ftrace_ops. Usually,
hash update is done by following sequence.

1. allocate a new local hash and copy the original hash.
2. update the local hash.
3. move(actually, copy) back the local hash to ftrace_ops.
4. update ftrace entries if needed.
5. release the local hash.

This makes regex_lock protect #1-#4, and ftrace_lock
to protect #3, #4 and adding and removing ftrace_ops from the
ftrace_ops_list. The ftrace_lock protects #3 as well because
the move functions update the entries too.

Link: http://lkml.kernel.org/r/20130509054421.30398.83411.stgit@mhiramat-M0-7522

Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Tom Zanussi <tom.zanussi@intel.com>
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# f04f24fb 08-May-2013 Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>

ftrace, kprobes: Fix a deadlock on ftrace_regex_lock

Fix a deadlock on ftrace_regex_lock which happens when setting
an enable_event trigger on dynamic kprobe event as below.

----
sh-2.05b# echo p vfs_symlink > kprobe_events
sh-2.05b# echo vfs_symlink:enable_event:kprobes:p_vfs_symlink_0 > set_ftrace_filter

=============================================
[ INFO: possible recursive locking detected ]
3.9.0+ #35 Not tainted
---------------------------------------------
sh/72 is trying to acquire lock:
(ftrace_regex_lock){+.+.+.}, at: [<ffffffff810ba6c1>] ftrace_set_hash+0x81/0x1f0

but task is already holding lock:
(ftrace_regex_lock){+.+.+.}, at: [<ffffffff810b7cbd>] ftrace_regex_write.isra.29.part.30+0x3d/0x220

other info that might help us debug this:
Possible unsafe locking scenario:

CPU0
----
lock(ftrace_regex_lock);
lock(ftrace_regex_lock);

*** DEADLOCK ***
----

To fix that, this introduces a finer regex_lock for each ftrace_ops.
ftrace_regex_lock is too big of a lock which protects all
filter/notrace_hash operations, but it doesn't need to be a global
lock after supporting multiple ftrace_ops because each ftrace_ops
has its own filter/notrace_hash.

Link: http://lkml.kernel.org/r/20130509054417.30398.84254.stgit@mhiramat-M0-7522

Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Tom Zanussi <tom.zanussi@intel.com>
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
[ Added initialization flag and automate mutex initialization for
non ftrace.c ftrace_probes. ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 7c088b51 09-May-2013 Steven Rostedt (Red Hat) <rostedt@goodmis.org>

ftrace: Have ftrace_regex_write() return either read or error

As ftrace_regex_write() reads the result of ftrace_process_regex()
which can sometimes return a positive number, only consider a
failure if the return is negative. Otherwise, it will skip possible
other registered probes and by returning a positive number that
wasn't read, it will confuse the user processes doing the writing.

Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 20079ebe 09-Apr-2013 Namhyung Kim <namhyung.kim@lge.com>

ftrace: Get rid of ftrace_profile_bits

It seems that function profiler's hash size is fixed at 1024. Add and
use FTRACE_PROFILE_HASH_BITS instead and update hash size macro.

Link: http://lkml.kernel.org/r/1365551750-4504-1-git-send-email-namhyung@kernel.org

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# f1943977 09-Apr-2013 Namhyung Kim <namhyung.kim@lge.com>

tracing: Get rid of unneeded key calculation in ftrace_hash_move()

It's not used anywhere in the function.

Link: http://lkml.kernel.org/r/1365553093-10180-1-git-send-email-namhyung@kernel.org

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 9f50afcc 11-Apr-2013 Namhyung Kim <namhyung.kim@lge.com>

tracing: Reset ftrace_graph_filter_enabled if count is zero

The ftrace_graph_count can be decreased with a "!" pattern, so that
the enabled flag should be updated too.

Link: http://lkml.kernel.org/r/1365663698-2413-1-git-send-email-namhyung@kernel.org

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Namhyung Kim <namhyung.kim@lge.com>
Cc: stable@vger.kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 7f49ef69 12-Apr-2013 Steven Rostedt (Red Hat) <rostedt@goodmis.org>

ftrace: Move ftrace_filter_lseek out of CONFIG_DYNAMIC_FTRACE section

As ftrace_filter_lseek is now used with ftrace_pid_fops, it needs to
be moved out of the #ifdef CONFIG_DYNAMIC_FTRACE section as the
ftrace_pid_fops is defined when DYNAMIC_FTRACE is not.

Cc: stable@vger.kernel.org
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 6a76f8c0 11-Apr-2013 Namhyung Kim <namhyung.kim@lge.com>

tracing: Fix possible NULL pointer dereferences

Currently set_ftrace_pid and set_graph_function files use seq_lseek
for their fops. However seq_open() is called only for FMODE_READ in
the fops->open() so that if an user tries to seek one of those file
when she open it for writing, it sees NULL seq_file and then panic.

It can be easily reproduced with following command:

$ cd /sys/kernel/debug/tracing
$ echo 1234 | sudo tee -a set_ftrace_pid

In this example, GNU coreutils' tee opens the file with fopen(, "a")
and then the fopen() internally calls lseek().

Link: http://lkml.kernel.org/r/1365663302-2170-1-git-send-email-namhyung@kernel.org

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Namhyung Kim <namhyung.kim@lge.com>
Cc: stable@vger.kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 39e30cd1 01-Apr-2013 Namhyung Kim <namhyung.kim@lge.com>

tracing: Fix off-by-one on allocating stat->pages

The first page was allocated separately, so no need to start from 0.

Link: http://lkml.kernel.org/r/1364820385-32027-2-git-send-email-namhyung@kernel.org

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Namhyung Kim <namhyung.kim@lge.com>
Cc: stable@vger.kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 83e03b3f 01-Apr-2013 Namhyung Kim <namhyung.kim@lge.com>

tracing: Fix double free when function profile init failed

On the failure path, stat->start and stat->pages will refer same page.
So it'll attempt to free the same page again and get kernel panic.

Link: http://lkml.kernel.org/r/1364820385-32027-1-git-send-email-namhyung@kernel.org

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Namhyung Kim <namhyung.kim@lge.com>
Cc: stable@vger.kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 9607a869 07-Apr-2013 Chen Gang <gang.chen@asianux.com>

kernel: tracing: Use strlcpy instead of strncpy

Use strlcpy() instead of strncpy() as it will always add a '\0'
to the end of the string even if the buffer is smaller than what
is being copied.

Link: http://lkml.kernel.org/r/51624254.30301@asianux.com

Signed-off-by: Chen Gang <gang.chen@asianux.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 395b97a3 27-Mar-2013 Steven Rostedt (Red Hat) <rostedt@goodmis.org>

ftrace: Do not call stub functions in control loop

The function tracing control loop used by perf spits out a warning
if the called function is not a control function. This is because
the control function references a per cpu allocated data structure
on struct ftrace_ops that is not allocated for other types of
functions.

commit 0a016409e42 "ftrace: Optimize the function tracer list loop"

Had an optimization done to all function tracing loops to optimize
for a single registered ops. Unfortunately, this allows for a slight
race when tracing starts or ends, where the stub function might be
called after the current registered ops is removed. In this case we
get the following dump:

root# perf stat -e ftrace:function sleep 1
[ 74.339105] WARNING: at include/linux/ftrace.h:209 ftrace_ops_control_func+0xde/0xf0()
[ 74.349522] Hardware name: PRIMERGY RX200 S6
[ 74.357149] Modules linked in: sg igb iTCO_wdt ptp pps_core iTCO_vendor_support i7core_edac dca lpc_ich i2c_i801 coretemp edac_core crc32c_intel mfd_core ghash_clmulni_intel dm_multipath acpi_power_meter pcspk
r microcode vhost_net tun macvtap macvlan nfsd kvm_intel kvm auth_rpcgss nfs_acl lockd sunrpc uinput xfs libcrc32c sd_mod crc_t10dif sr_mod cdrom mgag200 i2c_algo_bit drm_kms_helper ttm qla2xxx mptsas ahci drm li
bahci scsi_transport_sas mptscsih libata scsi_transport_fc i2c_core mptbase scsi_tgt dm_mirror dm_region_hash dm_log dm_mod
[ 74.446233] Pid: 1377, comm: perf Tainted: G W 3.9.0-rc1 #1
[ 74.453458] Call Trace:
[ 74.456233] [<ffffffff81062e3f>] warn_slowpath_common+0x7f/0xc0
[ 74.462997] [<ffffffff810fbc60>] ? rcu_note_context_switch+0xa0/0xa0
[ 74.470272] [<ffffffff811041a2>] ? __unregister_ftrace_function+0xa2/0x1a0
[ 74.478117] [<ffffffff81062e9a>] warn_slowpath_null+0x1a/0x20
[ 74.484681] [<ffffffff81102ede>] ftrace_ops_control_func+0xde/0xf0
[ 74.491760] [<ffffffff8162f400>] ftrace_call+0x5/0x2f
[ 74.497511] [<ffffffff8162f400>] ? ftrace_call+0x5/0x2f
[ 74.503486] [<ffffffff8162f400>] ? ftrace_call+0x5/0x2f
[ 74.509500] [<ffffffff810fbc65>] ? synchronize_sched+0x5/0x50
[ 74.516088] [<ffffffff816254d5>] ? _cond_resched+0x5/0x40
[ 74.522268] [<ffffffff810fbc65>] ? synchronize_sched+0x5/0x50
[ 74.528837] [<ffffffff811041a2>] ? __unregister_ftrace_function+0xa2/0x1a0
[ 74.536696] [<ffffffff816254d5>] ? _cond_resched+0x5/0x40
[ 74.542878] [<ffffffff8162402d>] ? mutex_lock+0x1d/0x50
[ 74.548869] [<ffffffff81105c67>] unregister_ftrace_function+0x27/0x50
[ 74.556243] [<ffffffff8111eadf>] perf_ftrace_event_register+0x9f/0x140
[ 74.563709] [<ffffffff816254d5>] ? _cond_resched+0x5/0x40
[ 74.569887] [<ffffffff8162402d>] ? mutex_lock+0x1d/0x50
[ 74.575898] [<ffffffff8111e94e>] perf_trace_destroy+0x2e/0x50
[ 74.582505] [<ffffffff81127ba9>] tp_perf_event_destroy+0x9/0x10
[ 74.589298] [<ffffffff811295d0>] free_event+0x70/0x1a0
[ 74.595208] [<ffffffff8112a579>] perf_event_release_kernel+0x69/0xa0
[ 74.602460] [<ffffffff816254d5>] ? _cond_resched+0x5/0x40
[ 74.608667] [<ffffffff8112a640>] put_event+0x90/0xc0
[ 74.614373] [<ffffffff8112a740>] perf_release+0x10/0x20
[ 74.620367] [<ffffffff811a3044>] __fput+0xf4/0x280
[ 74.625894] [<ffffffff811a31de>] ____fput+0xe/0x10
[ 74.631387] [<ffffffff81083697>] task_work_run+0xa7/0xe0
[ 74.637452] [<ffffffff81014981>] do_notify_resume+0x71/0xb0
[ 74.643843] [<ffffffff8162fa92>] int_signal+0x12/0x17

To fix this a new ftrace_ops flag is added that denotes the ftrace_list_end
ftrace_ops stub as just that, a stub. This flag is now checked in the
control loop and the function is not called if the flag is set.

Thanks to Jovi for not just reporting the bug, but also pointing out
where the bug was in the code.

Link: http://lkml.kernel.org/r/514A8855.7090402@redhat.com
Link: http://lkml.kernel.org/r/1364377499-1900-15-git-send-email-jovi.zhangwei@huawei.com

Tested-by: WANG Chao <chaowang@redhat.com>
Reported-by: WANG Chao <chaowang@redhat.com>
Reported-by: zhangwei(Jovi) <jovi.zhangwei@huawei.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 5000c418 26-Mar-2013 Jan Kiszka <jan.kiszka@siemens.com>

ftrace: Consistently restore trace function on sysctl enabling

If we reenable ftrace via syctl, we currently set ftrace_trace_function
based on the previous simplistic algorithm. This is inconsistent with
what update_ftrace_function does. So better call that helper instead.

Link: http://lkml.kernel.org/r/5151D26F.1070702@siemens.com

Cc: stable@vger.kernel.org
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 75761cc1 07-Apr-2013 Chen Gang <gang.chen@asianux.com>

ftrace: Fix strncpy() use, use strlcpy() instead of strncpy()

For NUL terminated string we always need to set '\0' at the end.

Signed-off-by: Chen Gang <gang.chen@asianux.com>
Cc: rostedt@goodmis.org
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/516243B7.9020405@asianux.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 7818b388 12-Mar-2013 Steven Rostedt (Red Hat) <rostedt@goodmis.org>

ftrace: Use manual free after synchronize_sched() not call_rcu_sched()

The entries to the probe hash must be freed after a synchronize_sched()
after the entry has been removed from the hash.

As the entries are registered with ops that may have their own callbacks,
and these callbacks may sleep, we can not use call_rcu_sched() because
the rcu callbacks registered with that are called from a softirq context.

Instead of using call_rcu_sched(), manually save the entries on a free_list
and at the end of the loop that removes the entries, do a synchronize_sched()
and then go through the free_list, freeing the entries.

Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# e67efb93 12-Mar-2013 Steven Rostedt (Red Hat) <rostedt@goodmis.org>

ftrace: Clean up function probe methods

When a function probe is created, each function that the probe is
attached to, a "callback" method is called. On release of the probe,
each function entry calls the "free" method.

First, "callback" is a confusing name and does not really match what
it does. Callback sounds like it will be called when the probe
triggers. But that's not the case. This is really an "init" function,
so lets rename it as such.

Secondly, both "init" and "free" do not pass enough information back
to the handlers. Pass back the ops, ip and data for each time the
method is called. We have the information, might as well use it.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# e1df4cb6 12-Mar-2013 Steven Rostedt (Red Hat) <rostedt@goodmis.org>

ftrace: Fix function probe to only enable needed functions

Currently the function probe enables all functions and runs a "hash"
against every function call to see if it should call a probe. This
is extremely wasteful.

Note, a probe is something like:

echo schedule:traceoff > /debug/tracing/set_ftrace_filter

When schedule is called, the probe will disable tracing. But currently,
it has a call back for *all* functions, and checks to see if the
called function is the probe that is needed.

The probe function has been created before ftrace was rewritten to
allow for more than one "op" to be registered by the function tracer.
When probes were created, it couldn't limit the functions without also
limiting normal function calls. But now we can, it's about time
to update the probe code.

Todo, have separate ops for different entries. That is, assign
a ftrace_ops per probe, instead of one op for all probes. But
as there's not many probes assigned, this may not be that urgent.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 740466bc 13-Mar-2013 Steven Rostedt (Red Hat) <rostedt@goodmis.org>

tracing: Fix free of probe entry by calling call_rcu_sched()

Because function tracing is very invasive, and can even trace
calls to rcu_read_lock(), RCU access in function tracing is done
with preempt_disable_notrace(). This requires a synchronize_sched()
for updates and not a synchronize_rcu().

Function probes (traceon, traceoff, etc) must be freed after
a synchronize_sched() after its entry has been removed from the
hash. But call_rcu() is used. Fix this by using call_rcu_sched().

Also fix the usage to use hlist_del_rcu() instead of hlist_del().

Cc: stable@vger.kernel.org
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# b67bfe0d 27-Feb-2013 Sasha Levin <sasha.levin@oracle.com>

hlist: drop the node parameter from iterators

I'm not sure why, but the hlist for each entry iterators were conceived

list_for_each_entry(pos, head, member)

The hlist ones were greedy and wanted an extra parameter:

hlist_for_each_entry(tpos, pos, head, member)

Why did they need an extra pos parameter? I'm not quite sure. Not only
they don't really need it, it also prevents the iterator from looking
exactly like the list iterator, which is unfortunate.

Besides the semantic patch, there was some manual work required:

- Fix up the actual hlist iterators in linux/list.h
- Fix up the declaration of other iterators based on the hlist ones.
- A very small amount of places were using the 'node' parameter, this
was modified to use 'obj->member' instead.
- Coccinelle didn't handle the hlist_for_each_entry_safe iterator
properly, so those had to be fixed up manually.

The semantic patch which is mostly the work of Peter Senna Tschudin is here:

@@
iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;

type T;
expression a,c,d,e;
identifier b;
statement S;
@@

-T b;
<+... when != b
(
hlist_for_each_entry(a,
- b,
c, d) S
|
hlist_for_each_entry_continue(a,
- b,
c) S
|
hlist_for_each_entry_from(a,
- b,
c) S
|
hlist_for_each_entry_rcu(a,
- b,
c, d) S
|
hlist_for_each_entry_rcu_bh(a,
- b,
c, d) S
|
hlist_for_each_entry_continue_rcu_bh(a,
- b,
c) S
|
for_each_busy_worker(a, c,
- b,
d) S
|
ax25_uid_for_each(a,
- b,
c) S
|
ax25_for_each(a,
- b,
c) S
|
inet_bind_bucket_for_each(a,
- b,
c) S
|
sctp_for_each_hentry(a,
- b,
c) S
|
sk_for_each(a,
- b,
c) S
|
sk_for_each_rcu(a,
- b,
c) S
|
sk_for_each_from
-(a, b)
+(a)
S
+ sk_for_each_from(a) S
|
sk_for_each_safe(a,
- b,
c, d) S
|
sk_for_each_bound(a,
- b,
c) S
|
hlist_for_each_entry_safe(a,
- b,
c, d, e) S
|
hlist_for_each_entry_continue_rcu(a,
- b,
c) S
|
nr_neigh_for_each(a,
- b,
c) S
|
nr_neigh_for_each_safe(a,
- b,
c, d) S
|
nr_node_for_each(a,
- b,
c) S
|
nr_node_for_each_safe(a,
- b,
c, d) S
|
- for_each_gfn_sp(a, c, d, b) S
+ for_each_gfn_sp(a, c, d) S
|
- for_each_gfn_indirect_valid_sp(a, c, d, b) S
+ for_each_gfn_indirect_valid_sp(a, c, d) S
|
for_each_host(a,
- b,
c) S
|
for_each_host_safe(a,
- b,
c, d) S
|
for_each_mesh_entry(a,
- b,
c, d) S
)
...+>

[akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
[akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
[akpm@linux-foundation.org: checkpatch fixes]
[akpm@linux-foundation.org: fix warnings]
[akpm@linux-foudnation.org: redo intrusive kvm changes]
Tested-by: Peter Senna Tschudin <peter.senna@gmail.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 8c189ea6 13-Feb-2013 Steven Rostedt (Red Hat) <rostedt@goodmis.org>

ftrace: Call ftrace cleanup module notifier after all other notifiers

Commit: c1bf08ac "ftrace: Be first to run code modification on modules"

changed ftrace module notifier's priority to INT_MAX in order to
process the ftrace nops before anything else could touch them
(namely kprobes). This was the correct thing to do.

Unfortunately, the ftrace module notifier also contains the ftrace
clean up code. As opposed to the set up code, this code should be
run *after* all the module notifiers have run in case a module is doing
correct clean-up and unregisters its ftrace hooks. Basically, ftrace
needs to do clean up on module removal, as it needs to know about code
being removed so that it doesn't try to modify that code. But after it
removes the module from its records, if a ftrace user tries to remove
a probe, that removal will fail due as the record of that code segment
no longer exists.

Nothing really bad happens if the probe removal is called after ftrace
did the clean up, but the ftrace removal function will return an error.
Correct code (such as kprobes) will produce a WARN_ON() if it fails
to remove the probe. As people get annoyed by frivolous warnings, it's
best to do the ftrace clean up after everything else.

By splitting the ftrace_module_notifier into two notifiers, one that
does the module load setup that is run at high priority, and the other
that is called for module clean up that is run at low priority, the
problem is solved.

Cc: stable@vger.kernel.org
Reported-by: Frank Ch. Eigler <fche@redhat.com>
Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# edc15caf 02-Nov-2012 Steven Rostedt <srostedt@redhat.com>

tracing: Avoid unnecessary multiple recursion checks

When function tracing occurs, the following steps are made:
If arch does not support a ftrace feature:
call internal function (uses INTERNAL bits) which calls...
If callback is registered to the "global" list, the list
function is called and recursion checks the GLOBAL bits.
then this function calls...
The function callback, which can use the FTRACE bits to
check for recursion.

Now if the arch does not suppport a feature, and it calls
the global list function which calls the ftrace callback
all three of these steps will do a recursion protection.
There's no reason to do one if the previous caller already
did. The recursion that we are protecting against will
go through the same steps again.

To prevent the multiple recursion checks, if a recursion
bit is set that is higher than the MAX bit of the current
check, then we know that the check was made by the previous
caller, and we can skip the current check.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# c29f122c 02-Nov-2012 Steven Rostedt <srostedt@redhat.com>

ftrace: Add context level recursion bit checking

Currently for recursion checking in the function tracer, ftrace
tests a task_struct bit to determine if the function tracer had
recursed or not. If it has, then it will will return without going
further.

But this leads to races. If an interrupt came in after the bit
was set, the functions being traced would see that bit set and
think that the function tracer recursed on itself, and would return.

Instead add a bit for each context (normal, softirq, irq and nmi).

A check of which context the task is in is made before testing the
associated bit. Now if an interrupt preempts the function tracer
after the previous context has been set, the interrupt functions
can still be traced.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 0a016409 02-Nov-2012 Steven Rostedt <srostedt@redhat.com>

ftrace: Optimize the function tracer list loop

There is lots of places that perform:

op = rcu_dereference_raw(ftrace_control_list);
while (op != &ftrace_list_end) {

Add a helper macro to do this, and also optimize for a single
entity. That is, gcc will optimize a loop for either no iterations
or more than one iteration. But usually only a single callback
is registered to the function tracer, thus the optimized case
should be a single pass. to do this we now do:

op = rcu_dereference_raw(list);
do {
[...]
} while (likely(op = rcu_dereference_raw((op)->next)) &&
unlikely((op) != &ftrace_list_end));

An op is always registered (ftrace_list_end when no callbacks is
registered), thus when a single callback is registered, the link
list looks like:

top => callback => ftrace_list_end => NULL.

The likely(op = op->next) still must be performed due to the race
of removing the callback, where the first op assignment could
equal ftrace_list_end. In that case, the op->next would be NULL.
But this is unlikely (only happens in a race condition when
removing the callback).

But it is very likely that the next op would be ftrace_list_end,
unless more than one callback has been registered. This tells
gcc what the most common case is and makes the fast path with
the least amount of branches.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 63503794 02-Nov-2012 Steven Rostedt <srostedt@redhat.com>

ftrace: Fix global function tracers that are not recursion safe

If one of the function tracers set by the global ops is not recursion
safe, it can still be called directly without the added recursion
supplied by the ftrace infrastructure.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 06aeaaea 28-Sep-2012 Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>

ftrace: Move ARCH_SUPPORTS_FTRACE_SAVE_REGS in Kconfig

Move SAVE_REGS support flag into Kconfig and rename
it to CONFIG_DYNAMIC_FTRACE_WITH_REGS. This also introduces
CONFIG_HAVE_DYNAMIC_FTRACE_WITH_REGS which indicates
the architecture depending part of ftrace has a code
that saves full registers.
On the other hand, CONFIG_DYNAMIC_FTRACE_WITH_REGS indicates
the code is enabled.

Link: http://lkml.kernel.org/r/20120928081516.3560.72534.stgit@ltc138.sdl.hitachi.co.jp

Cc: Ingo Molnar <mingo@elte.hu>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# c1bf08ac 14-Dec-2012 Steven Rostedt <srostedt@redhat.com>

ftrace: Be first to run code modification on modules

If some other kernel subsystem has a module notifier, and adds a kprobe
to a ftrace mcount point (now that kprobes work on ftrace points),
when the ftrace notifier runs it will fail and disable ftrace, as well
as kprobes that are attached to ftrace points.

Here's the error:

WARNING: at kernel/trace/ftrace.c:1618 ftrace_bug+0x239/0x280()
Hardware name: Bochs
Modules linked in: fat(+) stap_56d28a51b3fe546293ca0700b10bcb29__8059(F) nfsv4 auth_rpcgss nfs dns_resolver fscache xt_nat iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack lockd sunrpc ppdev parport_pc parport microcode virtio_net i2c_piix4 drm_kms_helper ttm drm i2c_core [last unloaded: bid_shared]
Pid: 8068, comm: modprobe Tainted: GF 3.7.0-0.rc8.git0.1.fc19.x86_64 #1
Call Trace:
[<ffffffff8105e70f>] warn_slowpath_common+0x7f/0xc0
[<ffffffff81134106>] ? __probe_kernel_read+0x46/0x70
[<ffffffffa0180000>] ? 0xffffffffa017ffff
[<ffffffffa0180000>] ? 0xffffffffa017ffff
[<ffffffff8105e76a>] warn_slowpath_null+0x1a/0x20
[<ffffffff810fd189>] ftrace_bug+0x239/0x280
[<ffffffff810fd626>] ftrace_process_locs+0x376/0x520
[<ffffffff810fefb7>] ftrace_module_notify+0x47/0x50
[<ffffffff8163912d>] notifier_call_chain+0x4d/0x70
[<ffffffff810882f8>] __blocking_notifier_call_chain+0x58/0x80
[<ffffffff81088336>] blocking_notifier_call_chain+0x16/0x20
[<ffffffff810c2a23>] sys_init_module+0x73/0x220
[<ffffffff8163d719>] system_call_fastpath+0x16/0x1b
---[ end trace 9ef46351e53bbf80 ]---
ftrace failed to modify [<ffffffffa0180000>] init_once+0x0/0x20 [fat]
actual: cc:bb:d2:4b:e1

A kprobe was added to the init_once() function in the fat module on load.
But this happened before ftrace could have touched the code. As ftrace
didn't run yet, the kprobe system had no idea it was a ftrace point and
simply added a breakpoint to the code (0xcc in the cc:bb:d2:4b:e1).

Then when ftrace went to modify the location from a call to mcount/fentry
into a nop, it didn't see a call op, but instead it saw the breakpoint op
and not knowing what to do with it, ftrace shut itself down.

The solution is to simply give the ftrace module notifier the max priority.
This should have been done regardless, as the core code ftrace modification
also happens very early on in boot up. This makes the module modification
closer to core modification.

Link: http://lkml.kernel.org/r/20130107140333.593683061@goodmis.org

Cc: stable@vger.kernel.org
Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Reported-by: Frank Ch. Eigler <fche@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 965c8e59 17-Dec-2012 Andrew Morton <akpm@linux-foundation.org>

lseek: the "whence" argument is called "whence"

But the kernel decided to call it "origin" instead. Fix most of the
sites.

Acked-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 6d49e352 06-Dec-2012 Nadia Yvette Chambers <nyc@holomorphy.com>

propagate name change to comments in kernel source

I've legally changed my name with New York State, the US Social Security
Administration, et al. This patch propagates the name change and change
in initials and login to comments in the kernel source as well.

Signed-off-by: Nadia Yvette Chambers <nyc@holomorphy.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>


# 70f77b3f 09-Jun-2012 Dan Carpenter <dan.carpenter@oracle.com>

ftrace: Clear bits properly in reset_iter_read()

There is a typo here where '&' is used instead of '|' and it turns the
statement into a noop. The original code is equivalent to:

iter->flags &= ~((1 << 2) & (1 << 4));

Link: http://lkml.kernel.org/r/20120609161027.GD6488@elgon.mountain

Cc: stable@vger.kernel.org # all of them
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 6f415672 04-Oct-2012 Steven Rostedt <srostedt@redhat.com>

tracing: Allow tracers to start at core initcall

There's times during debugging that it is helpful to see traces of early
boot functions. But the tracers are initialized at device_initcall()
which is quite late during the boot process. Setting the kernel command
line parameter ftrace=function will not show anything until the function
tracer is initialized. This prevents being able to trace functions before
device_initcall().

There's no reason that the tracers need to be initialized so late in the
boot process. Move them up to core_initcall() as they still need to come
after early_initcall() which initializes the tracing buffers.

Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# bcd83ea6 26-Sep-2012 Daniel Walter <sahne@0x90.at>

tracing: Replace strict_strto* with kstrto*

* remove old string conversions with kstrto*

Link: http://lkml.kernel.org/r/20120926200838.GC1244@0x90.at

Signed-off-by: Daniel Walter <sahne@0x90.at>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 647664ea 05-Jun-2012 Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>

ftrace: add ftrace_set_filter_ip() for address based filter

Add a new filter update interface ftrace_set_filter_ip()
to set ftrace filter by ip address, not only glob pattern.

Link: http://lkml.kernel.org/r/20120605102808.27845.67952.stgit@localhost.localdomain

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: "Frank Ch. Eigler" <fche@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# ea701f11 20-Jul-2012 Steven Rostedt <srostedt@redhat.com>

ftrace: Add selftest to test function trace recursion protection

Add selftests to test the function tracing recursion protection actually
does work. It also tests if a ftrace_ops states it will perform its own
protection. Although, even if the ftrace_ops states it will protect itself,
the ftrace infrastructure may still provide protection if the arch does
not support all features or another ftrace_ops is registered.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 4740974a 20-Jul-2012 Steven Rostedt <srostedt@redhat.com>

ftrace: Add default recursion protection for function tracing

As more users of the function tracer utility are being added, they do
not always add the necessary recursion protection. To protect from
function recursion due to tracing, if the callback ftrace_ops does not
specifically specify that it protects against recursion (by setting
the FTRACE_OPS_FL_RECURSION_SAFE flag), the list operation will be
called by the mcount trampoline which adds recursion protection.

If the flag is set, then the function will be called directly with no
extra protection.

Note, the list operation is called if more than one function callback
is registered, or if the arch does not support all of the function
tracer features.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 08f6fba5 30-Apr-2012 Steven Rostedt <srostedt@redhat.com>

ftrace/x86: Add separate function to save regs

Add a way to have different functions calling different trampolines.
If a ftrace_ops wants regs saved on the return, then have only the
functions with ops registered to save regs. Functions registered by
other ops would not be affected, unless the functions overlap.

If one ftrace_ops registered functions A, B and C and another ops
registered fucntions to save regs on A, and D, then only functions
A and D would be saving regs. Function B and C would work as normal.
Although A is registered by both ops: normal and saves regs; this is fine
as saving the regs is needed to satisfy one of the ops that calls it
but the regs are ignored by the other ops function.

x86_64 implements the full regs saving, and i386 just passes a NULL
for regs to satisfy the ftrace_ops passing. Where an arch must supply
both regs and ftrace_ops parameters, even if regs is just NULL.

It is OK for an arch to pass NULL regs. All function trace users that
require regs passing must add the flag FTRACE_OPS_FL_SAVE_REGS when
registering the ftrace_ops. If the arch does not support saving regs
then the ftrace_ops will fail to register. The flag
FTRACE_OPS_FL_SAVE_REGS_IF_SUPPORTED may be set that will prevent the
ftrace_ops from failing to register. In this case, the handler may
either check if regs is not NULL or check if ARCH_SUPPORTS_FTRACE_SAVE_REGS.
If the arch supports passing regs it will set this macro and pass regs
for ops that request them. All other archs will just pass NULL.

Link: Link: http://lkml.kernel.org/r/20120711195745.107705970@goodmis.org

Cc: Alexander van Heukelum <heukelum@fastmail.fm>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# a1e2e31d 08-Aug-2011 Steven Rostedt <srostedt@redhat.com>

ftrace: Return pt_regs to function trace callback

Return as the 4th paramater to the function tracer callback the pt_regs.

Later patches that implement regs passing for the architectures will require
having the ftrace_ops set the SAVE_REGS flag, which will tell the arch
to take the time to pass a full set of pt_regs to the ftrace_ops callback
function. If the arch does not support it then it should pass NULL.

If an arch can pass full regs, then it should define:
ARCH_SUPPORTS_FTRACE_SAVE_REGS to 1

Link: http://lkml.kernel.org/r/20120702201821.019966811@goodmis.org

Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# ccf3672d 05-Jun-2012 Steven Rostedt <srostedt@redhat.com>

ftrace: Consolidate arch dependent functions with 'list' function

As the function tracer starts to get more features, the support for
theses features will spread out throughout the different architectures
over time. These features boil down to what each arch does in the
mcount trampoline (the ftrace_caller).

Currently there's two features that are not the same throughout the
archs.

1) Support to stop function tracing before the callback
2) passing of the ftrace ops

Both of these require placing an indirect function to support the
features if the mcount trampoline does not.

On a side note, for all architectures, when more than one callback
is registered to the function tracer, an intermediate 'list' function
is called by the mcount trampoline to iterate through the callbacks
that are registered.

Instead of making a separate function for each of these features,
and requiring several indirect calls, just use the single 'list' function
as the intermediate, to handle all cases. If an arch does not support
the 'stop function tracing' or the passing of ftrace ops, just force
it to use the list function that will handle the features required.

This makes the code cleaner and simpler and removes a lot of
#ifdefs in the code.

Link: http://lkml.kernel.org/r/20120612225424.495625483@goodmis.org

Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 2f5f6ad9 08-Aug-2011 Steven Rostedt <srostedt@redhat.com>

ftrace: Pass ftrace_ops as third parameter to function trace callback

Currently the function trace callback receives only the ip and parent_ip
of the function that it traced. It would be more powerful to also return
the ops that registered the function as well. This allows the same function
to act differently depending on what ftrace_ops registered it.

Link: http://lkml.kernel.org/r/20120612225424.267254552@goodmis.org

Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 8d240dd8 29-Mar-2012 Borislav Petkov <bp@amd64.org>

ftrace: Remove a superfluous check

register_ftrace_function() checks ftrace_disabled and calls
__register_ftrace_function which does it again.

Drop the first check and add the unlikely hint to the second one. Also,
drop the label as John correctly notices.

No functional change.

Link: http://lkml.kernel.org/r/20120329171140.GE6409@aftab

Cc: Borislav Petkov <bp@amd64.org>
Cc: John Kacur <jkacur@redhat.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# e4f5d544 27-Apr-2012 Steven Rostedt <srostedt@redhat.com>

ftrace/x86: Have x86 ftrace use the ftrace_modify_all_code()

To remove duplicate code, have the ftrace arch_ftrace_update_code()
use the generic ftrace_modify_all_code(). This requires that the
default ftrace_replace_code() becomes a weak function so that an
arch may override it.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 8ed3e2cf 26-Apr-2012 Steven Rostedt <srostedt@redhat.com>

ftrace: Make ftrace_modify_all_code() global for archs to use

Rename __ftrace_modify_code() to ftrace_modify_all_code() and make
it global for all archs to use. This will remove the duplication
of code, as archs that can modify code without stop_machine()
can use it directly outside of the stop_machine() call.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# f0cf973a 25-Apr-2012 Steven Rostedt <srostedt@redhat.com>

ftrace: Return record ip addr for ftrace_location()

ftrace_location() is passed an addr, and returns 1 if the addr is
on a ftrace nop (or caller to ftrace_caller), and 0 otherwise.

To let kprobes know if it should move a breakpoint or not, it
must return the actual addr that is the start of the ftrace nop.
This way a kprobe placed on the location of a ftrace nop, can
instead be placed on the instruction after the nop. Even if the
probe addr is on the second or later byte of the nop, it can
simply be moved forward.

Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# a650e02a 25-Apr-2012 Steven Rostedt <srostedt@redhat.com>

ftrace: Consolidate ftrace_location() and ftrace_text_reserved()

Both ftrace_location() and ftrace_text_reserved() do basically the same thing.
They search to see if an address is in the ftace table (contains an address
that may change from nop to call ftrace_caller). The difference is
that ftrace_location() searches a single address, but ftrace_text_reserved()
searches a range.

This also makes the ftrace_text_reserved() faster as it now uses a bsearch()
instead of linearly searching all the addresses within a page.

Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 9644302e 25-Apr-2012 Steven Rostedt <srostedt@redhat.com>

ftrace: Speed up search by skipping pages by address

As all records in a page of the ftrace table are sorted, we can
speed up the search algorithm by checking if the address to look for
falls in between the first and last record ip on the page.

This speeds up both the ftrace_location() and ftrace_text_reserved()
algorithms, as it can skip full pages when the search address is
not in them.

Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 706c81f8 24-Apr-2012 Steven Rostedt <srostedt@redhat.com>

ftrace: Remove extra helper functions

The ftrace_record_ip() and ftrace_alloc_dyn_node() were from the
time of the ftrace daemon. Although they were still used, they
still make things a bit more complex than necessary.

Move the code into the one function that uses it, and remove the
helper functions.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 9fd49328 24-Apr-2012 Steven Rostedt <srostedt@redhat.com>

ftrace: Sort all function addresses, not just per page

Instead of just sorting the ip's of the functions per ftrace page,
sort the entire list before adding them to the ftrace pages.

This will allow the bsearch algorithm to be sped up as it can
also sort by pages, not just records within a page.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 50e18b94 25-Apr-2012 Jiri Olsa <jolsa@redhat.com>

tracing: Use seq_*_private interface for some seq files

It's appropriate to use __seq_open_private interface to open
some of trace seq files, because it covers all steps we are
duplicating in tracing code - zallocating the iterator and
setting it as seq_file's private.

Using this for following files:
trace
available_filter_functions
enabled_functions

Link: http://lkml.kernel.org/r/1335342219-2782-5-git-send-email-jolsa@redhat.com

Signed-off-by: Jiri Olsa <jolsa@redhat.com>

[
Fixed warnings for:
kernel/trace/trace.c: In function '__tracing_open':
kernel/trace/trace.c:2418:11: warning: unused variable 'ret' [-Wunused-variable]
kernel/trace/trace.c:2417:19: warning: unused variable 'm' [-Wunused-variable]
]

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# db6544e0 17-Feb-2012 Rajesh Bhagat <rajesh.lnx@gmail.com>

ftrace: Fix function_graph for archs that test ftrace_trace_function

When CONFIG_DYNAMIC_FTRACE is not set, some archs (ARM) test
the variable function_trace_function to determine if it should
call the function tracer. If it is not set to ftrace_stub, then
it will call the function and return, and not call the function
graph tracer.

But some of these archs (ARM) do not have the assembly code
to test if function tracing is enabled or not (quick stop of tracing)
and it calls the helper routine ftrace_test_stop_func() instead.

If function tracer is enabled and then disabled, the variable
ftrace_trace_function is still set to the helper routine
ftrace_test_stop_func(), and not to ftrace_stub. This will
prevent the function graph tracer from ever running.

Output before patch
/debug/tracing # echo function > current_tracer
/debug/tracing # echo function_graph > current_tracer
/debug/tracing # cat trace

Output after patch
/debug/tracing # echo function > current_tracer
/debug/tracing # echo function_graph > current_tracer
/debug/tracing # cat trace
0) ! 253.375 us | } /* irq_enter */
0) | generic_handle_irq() {
0) | handle_fasteoi_irq() {
0) 9.208 us | _raw_spin_lock();
0) | handle_irq_event() {
0) | handle_irq_event_percpu() {

Signed-off-by: Rajesh Bhagat <rajesh.lnx@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 5500fa51 15-Feb-2012 Jiri Olsa <jolsa@redhat.com>

ftrace, perf: Add filter support for function trace event

Adding support to filter function trace event via perf
interface. It is now possible to use filter interface
in the perf tool like:

perf record -e ftrace:function --filter="(ip == mm_*)" ls

The filter syntax is restricted to the the 'ip' field only,
and following operators are accepted '==' '!=' '||', ending
up with the filter strings like:

ip == f1[, ]f2 ... || ip != f3[, ]f4 ...

with comma ',' or space ' ' as a function separator. If the
space ' ' is used as a separator, the right side of the
assignment needs to be enclosed in double quotes '"', e.g.:

perf record -e ftrace:function --filter '(ip == do_execve,sys_*,ext*)' ls
perf record -e ftrace:function --filter '(ip == "do_execve,sys_*,ext*")' ls
perf record -e ftrace:function --filter '(ip == "do_execve sys_* ext*")' ls

The '==' operator adds trace filter with same effect as would
be added via set_ftrace_filter file.

The '!=' operator adds trace filter with same effect as would
be added via set_ftrace_notrace file.

The right side of the '!=', '==' operators is list of functions
or regexp. to be added to filter separated by space.

The '||' operator is used for connecting multiple filter definitions
together. It is possible to have more than one '==' and '!='
operators within one filter string.

Link: http://lkml.kernel.org/r/1329317514-8131-8-git-send-email-jolsa@redhat.com

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# e248491a 15-Feb-2012 Jiri Olsa <jolsa@redhat.com>

ftrace: Add enable/disable ftrace_ops control interface

Adding a way to temporarily enable/disable ftrace_ops. The change
follows the same way as 'global' ftrace_ops are done.

Introducing 2 global ftrace_ops - control_ops and ftrace_control_list
which take over all ftrace_ops registered with FTRACE_OPS_FL_CONTROL
flag. In addition new per cpu flag called 'disabled' is also added to
ftrace_ops to provide the control information for each cpu.

When ftrace_ops with FTRACE_OPS_FL_CONTROL is registered, it is
set as disabled for all cpus.

The ftrace_control_list contains all the registered 'control' ftrace_ops.
The control_ops provides function which iterates ftrace_control_list
and does the check for 'disabled' flag on current cpu.

Adding 3 inline functions:
ftrace_function_local_disable/ftrace_function_local_enable
- enable/disable the ftrace_ops on current cpu
ftrace_function_local_disabled
- get disabled ftrace_ops::disabled value for current cpu

Link: http://lkml.kernel.org/r/1329317514-8131-2-git-send-email-jolsa@redhat.com

Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 47b0edcb 29-Nov-2011 Thomas Meyer <thomas@m3y3r.de>

tracing/trivial: Use kcalloc instead of kzalloc to allocate array

The advantage of kcalloc is, that will prevent integer overflows which could
result from the multiplication of number of elements and size and it is also
a bit nicer to read.

The semantic patch that makes this change is available
in https://lkml.org/lkml/2011/11/25/107

Link: http://lkml.kernel.org/r/1322600880.1534.347.camel@localhost.localdomain

Signed-off-by: Thomas Meyer <thomas@m3y3r.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# ac483c44 02-Jan-2012 Jiri Olsa <jolsa@redhat.com>

ftrace: Change filter/notrace set functions to return exit code

Currently the ftrace_set_filter and ftrace_set_notrace functions
do not return any return code. So there's no way for ftrace_ops
user to tell wether the filter was correctly applied.

The set_ftrace_filter interface returns error in case the filter
did not match:

# echo krava > set_ftrace_filter
bash: echo: write error: Invalid argument

Changing both ftrace_set_filter and ftrace_set_notrace functions
to return zero if the filter was applied correctly or -E* values
in case of error.

Link: http://lkml.kernel.org/r/1325495060-6402-2-git-send-email-jolsa@redhat.com

Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 2a85a37f 19-Dec-2011 Steven Rostedt <srostedt@redhat.com>

ftrace: Allow access to the boot time function enabling

Change set_ftrace_early_filter() to ftrace_set_early_filter()
and make it a global function. This will allow other subsystems
in the kernel to be able to enable function tracing at start
up and reuse the ftrace function parsing code.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 69a3083c 19-Dec-2011 Steven Rostedt <srostedt@redhat.com>

ftrace: Decouple hash items from showing filtered functions

The set_ftrace_filter shows "hashed" functions, which are functions
that are added with operations to them (like traceon and traceoff).

As other subsystems may be able to show what functions they are
using for function tracing, the hash items should no longer
be shown just because the FILTER flag is set. As they have nothing
to do with other subsystems filters.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# fc13cb0c 19-Dec-2011 Steven Rostedt <srostedt@redhat.com>

ftrace: Allow other users of function tracing to use the output listing

The function tracer is set up to allow any other subsystem (like perf)
to use it. Ftrace already has a way to list what functions are enabled
by the global_ops. It would be very helpful to let other users of
the function tracer to be able to use the same code.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 06a51d93 19-Dec-2011 Steven Rostedt <srostedt@redhat.com>

ftrace: Create ftrace_hash_empty() helper routine

There are two types of hashes in the ftrace_ops; one type
is the filter_hash and the other is the notrace_hash. Either
one may be null, meaning it has no elements. But when elements
are added, the hash is allocated.

Throughout the code, a check needs to be made to see if a hash
exists or the hash has elements, but the check if the hash exists
is usually missing causing the possible "NULL pointer dereference bug".

Add a helper routine called "ftrace_hash_empty()" that returns
true if the hash doesn't exist or its count is zero. As they mean
the same thing.

Last-bug-reported-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# c842e975 19-Dec-2011 Steven Rostedt <srostedt@redhat.com>

ftrace: Fix ftrace hash record update with notrace

When disabling the "notrace" records, that means we want to trace them.
If the notrace_hash is zero, it means that we want to trace all
records. But to disable a zero notrace_hash means nothing.

The check for the notrace_hash count was incorrect with:

if (hash && !hash->count)
return

With the correct comment above it that states that we do nothing
if the notrace_hash has zero count. But !hash also means that
the notrace hash has zero count. I think this was done to
protect against dereferencing NULL. But if !hash is true, then
we go through the following loop without doing a single thing.

Fix it to:

if (!hash || !hash->count)
return;

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 5855fead 16-Dec-2011 Steven Rostedt <srostedt@redhat.com>

ftrace: Use bsearch to find record ip

Now that each set of pages in the function list are sorted by
ip, we can use bsearch to find a record within each set of pages.
This speeds up the ftrace_location() function by magnitudes.

For archs (like x86) that need to add a breakpoint at every function
that will be converted from a nop to a callback and vice versa,
the breakpoint callback needs to know if the breakpoint was for
ftrace or not. It requires finding the breakpoint ip within the
records. Doing a linear search is extremely inefficient. It is
a must to be able to do a fast binary search to find these locations.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 68950619 16-Dec-2011 Steven Rostedt <srostedt@redhat.com>

ftrace: Sort the mcount records on each page

Sort records by ip locations of the ftrace mcount calls on each of the
set of pages in the function list. This helps in localizing cache
usuage when updating the function locations, as well as gives us
the ability to quickly find an ip location in the list.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 85ae32ae 16-Dec-2011 Steven Rostedt <srostedt@redhat.com>

ftrace: Replace record newlist with record page list

As new functions come in to be initalized from mcount to nop,
they are done by groups of pages. Whether it is the core kernel
or a module. There's no need to keep track of these on a per record
basis.

At startup, and as any module is loaded, the functions to be
traced are stored in a group of pages and added to the function
list at the end. We just need to keep a pointer to the first
page of the list that was added, and use that to know where to
start on the list for initializing functions.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# a7900875 16-Dec-2011 Steven Rostedt <srostedt@redhat.com>

ftrace: Allocate the mcount record pages as groups

Allocate the mcount record pages as a group of pages as big
as can be allocated and waste no more than a single page.

Grouping the mcount pages as much as possible helps with cache
locality, as we do not need to redirect with descriptors as we
cross from page to page. It also allows us to do more with the
records later on (sort them with bigger benefits).

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 32082309 16-Dec-2011 Steven Rostedt <srostedt@redhat.com>

ftrace: Remove usage of "freed" records

Records that are added to the function trace table are
permanently there, except for modules. By separating out the
modules to their own pages that can be freed in one shot
we can remove the "freed" flag and simplify some of the record
management.

Another benefit of doing this is that we can also move the
records around; sort them.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# c88fd863 16-Aug-2011 Steven Rostedt <srostedt@redhat.com>

ftrace: Allow archs to modify code without stop machine

The stop machine method to modify all functions in the kernel
(some 20,000 of them) is the safest way to do so across all archs.
But some archs may not need this big hammer approach to modify code
on SMP machines, and can simply just update the code it needs.

Adding a weak function arch_ftrace_update_code() that now does the
stop machine, will also let any arch override this method.

If the arch needs to check the system and then decide if it can
avoid stop machine, it can still call ftrace_run_stop_machine() to
use the old method.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 30fb6aa7 05-Dec-2011 Jiri Olsa <jolsa@redhat.com>

ftrace: Fix unregister ftrace_ops accounting

Multiple users of the function tracer can register their functions
with the ftrace_ops structure. The accounting within ftrace will
update the counter on each function record that is being traced.
When the ftrace_ops filtering adds or removes functions, the
function records will be updated accordingly if the ftrace_ops is
still registered.

When a ftrace_ops is removed, the counter of the function records,
that the ftrace_ops traces, are decremented. When they reach zero
the functions that they represent are modified to stop calling the
mcount code.

When changes are made, the code is updated via stop_machine() with
a command passed to the function to tell it what to do. There is an
ENABLE and DISABLE command that tells the called function to enable
or disable the functions. But the ENABLE is really a misnomer as it
should just update the records, as records that have been enabled
and now have a count of zero should be disabled.

The DISABLE command is used to disable all functions regardless of
their counter values. This is the big off switch and is not the
complement of the ENABLE command.

To make matters worse, when a ftrace_ops is unregistered and there
is another ftrace_ops registered, neither the DISABLE nor the
ENABLE command are set when calling into the stop_machine() function
and the records will not be updated to match their counter. A command
is passed to that function that will update the mcount code to call
the registered callback directly if it is the only one left. This
means that the ftrace_ops that is still registered will have its callback
called by all functions that have been set for it as well as the ftrace_ops
that was just unregistered.

Here's a way to trigger this bug. Compile the kernel with
CONFIG_FUNCTION_PROFILER set and with CONFIG_FUNCTION_GRAPH not set:

CONFIG_FUNCTION_PROFILER=y
# CONFIG_FUNCTION_GRAPH is not set

This will force the function profiler to use the function tracer instead
of the function graph tracer.

# cd /sys/kernel/debug/tracing
# echo schedule > set_ftrace_filter
# echo function > current_tracer
# cat set_ftrace_filter
schedule
# cat trace
# tracer: nop
#
# entries-in-buffer/entries-written: 692/68108025 #P:4
#
# _-----=> irqs-off
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# ||| / delay
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
# | | | |||| | |
kworker/0:2-909 [000] .... 531.235574: schedule <-worker_thread
<idle>-0 [001] .N.. 531.235575: schedule <-cpu_idle
kworker/0:2-909 [000] .... 531.235597: schedule <-worker_thread
sshd-2563 [001] .... 531.235647: schedule <-schedule_hrtimeout_range_clock

# echo 1 > function_profile_enabled
# echo 0 > function_porfile_enabled
# cat set_ftrace_filter
schedule
# cat trace
# tracer: function
#
# entries-in-buffer/entries-written: 159701/118821262 #P:4
#
# _-----=> irqs-off
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# ||| / delay
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
# | | | |||| | |
<idle>-0 [002] ...1 604.870655: local_touch_nmi <-cpu_idle
<idle>-0 [002] d..1 604.870655: enter_idle <-cpu_idle
<idle>-0 [002] d..1 604.870656: atomic_notifier_call_chain <-enter_idle
<idle>-0 [002] d..1 604.870656: __atomic_notifier_call_chain <-atomic_notifier_call_chain

The same problem could have happened with the trace_probe_ops,
but they are modified with the set_frace_filter file which does the
update at closure of the file.

The simple solution is to change ENABLE to UPDATE and call it every
time an ftrace_ops is unregistered.

Link: http://lkml.kernel.org/r/1323105776-26961-3-git-send-email-jolsa@redhat.com

Cc: stable@vger.kernel.org # 3.0+
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# ddf6e0e5 04-Nov-2011 Steven Rostedt <srostedt@redhat.com>

ftrace: Fix hash record accounting bug

If the set_ftrace_filter is cleared by writing just whitespace to
it, then the filter hash refcounts will be decremented but not
updated. This causes two bugs:

1) No functions will be enabled for tracing when they all should be

2) If the users clears the set_ftrace_filter twice, it will crash ftrace:

------------[ cut here ]------------
WARNING: at /home/rostedt/work/git/linux-trace.git/kernel/trace/ftrace.c:1384 __ftrace_hash_rec_update.part.27+0x157/0x1a7()
Modules linked in:
Pid: 2330, comm: bash Not tainted 3.1.0-test+ #32
Call Trace:
[<ffffffff81051828>] warn_slowpath_common+0x83/0x9b
[<ffffffff8105185a>] warn_slowpath_null+0x1a/0x1c
[<ffffffff810ba362>] __ftrace_hash_rec_update.part.27+0x157/0x1a7
[<ffffffff810ba6e8>] ? ftrace_regex_release+0xa7/0x10f
[<ffffffff8111bdfe>] ? kfree+0xe5/0x115
[<ffffffff810ba51e>] ftrace_hash_move+0x2e/0x151
[<ffffffff810ba6fb>] ftrace_regex_release+0xba/0x10f
[<ffffffff8112e49a>] fput+0xfd/0x1c2
[<ffffffff8112b54c>] filp_close+0x6d/0x78
[<ffffffff8113a92d>] sys_dup3+0x197/0x1c1
[<ffffffff8113a9a6>] sys_dup2+0x4f/0x54
[<ffffffff8150cac2>] system_call_fastpath+0x16/0x1b
---[ end trace 77a3a7ee73794a02 ]---

Link: http://lkml.kernel.org/r/20111101141420.GA4918@debian

Reported-by: Rabin Vincent <rabin@rab.in>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# c7c6ec8b 04-Nov-2011 Steven Rostedt <srostedt@redhat.com>

ftrace: Remove force undef config value left for testing

A forced undef of a config value was used for testing and was
accidently left in during the final commit. This causes x86 to
run slower than needed while running function tracing as well
as causes the function graph selftest to fail when DYNMAIC_FTRACE
is not set. This is because the code in MCOUNT expects the ftrace
code to be processed with the config value set that happened to
be forced not set.

The forced config option was left in by:
commit 6331c28c962561aee59e5a493b7556a4bb585957
ftrace: Fix dynamic selftest failure on some archs

Link: http://lkml.kernel.org/r/20111102150255.GA6973@debian

Cc: stable@vger.kernel.org
Reported-by: Rabin Vincent <rabin@rab.in>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# d4d34b98 04-Nov-2011 Steven Rostedt <srostedt@redhat.com>

ftrace: Fix hash record accounting bug

If the set_ftrace_filter is cleared by writing just whitespace to
it, then the filter hash refcounts will be decremented but not
updated. This causes two bugs:

1) No functions will be enabled for tracing when they all should be

2) If the users clears the set_ftrace_filter twice, it will crash ftrace:

------------[ cut here ]------------
WARNING: at /home/rostedt/work/git/linux-trace.git/kernel/trace/ftrace.c:1384 __ftrace_hash_rec_update.part.27+0x157/0x1a7()
Modules linked in:
Pid: 2330, comm: bash Not tainted 3.1.0-test+ #32
Call Trace:
[<ffffffff81051828>] warn_slowpath_common+0x83/0x9b
[<ffffffff8105185a>] warn_slowpath_null+0x1a/0x1c
[<ffffffff810ba362>] __ftrace_hash_rec_update.part.27+0x157/0x1a7
[<ffffffff810ba6e8>] ? ftrace_regex_release+0xa7/0x10f
[<ffffffff8111bdfe>] ? kfree+0xe5/0x115
[<ffffffff810ba51e>] ftrace_hash_move+0x2e/0x151
[<ffffffff810ba6fb>] ftrace_regex_release+0xba/0x10f
[<ffffffff8112e49a>] fput+0xfd/0x1c2
[<ffffffff8112b54c>] filp_close+0x6d/0x78
[<ffffffff8113a92d>] sys_dup3+0x197/0x1c1
[<ffffffff8113a9a6>] sys_dup2+0x4f/0x54
[<ffffffff8150cac2>] system_call_fastpath+0x16/0x1b
---[ end trace 77a3a7ee73794a02 ]---

Link: http://lkml.kernel.org/r/20111101141420.GA4918@debian

Reported-by: Rabin Vincent <rabin@rab.in>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 8ee3c92b 04-Nov-2011 Steven Rostedt <srostedt@redhat.com>

ftrace: Remove force undef config value left for testing

A forced undef of a config value was used for testing and was
accidently left in during the final commit. This causes x86 to
run slower than needed while running function tracing as well
as causes the function graph selftest to fail when DYNMAIC_FTRACE
is not set. This is because the code in MCOUNT expects the ftrace
code to be processed with the config value set that happened to
be forced not set.

The forced config option was left in by:
commit 6331c28c962561aee59e5a493b7556a4bb585957
ftrace: Fix dynamic selftest failure on some archs

Link: http://lkml.kernel.org/r/20111102150255.GA6973@debian

Cc: stable@vger.kernel.org
Reported-by: Rabin Vincent <rabin@rab.in>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 56d82e00 26-May-2011 Paul Gortmaker <paul.gortmaker@windriver.com>

kernel: Add <linux/module.h> to files using it implicitly

These files are doing things like module_put and try_module_get
so they need to call out the module.h for explicit inclusion,
rather than getting it via <linux/device.h> which we ideally want
to remove the module.h inclusion from.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>


# e0a413f6 29-Sep-2011 Steven Rostedt <srostedt@redhat.com>

tracing: Warn on output if the function tracer was found corrupted

As the function tracer is very intrusive, lots of self checks are
performed on the tracer and if something is found to be strange
it will shut itself down keeping it from corrupting the rest of the
kernel. This shutdown may still allow functions to be traced, as the
tracing only stops new modifications from happening. Trying to stop
the function tracer itself can cause more harm as it requires code
modification.

Although a WARN_ON() is executed, a user may not notice it. To help
the user see that something isn't right with the tracing of the system
a big warning is added to the output of the tracer that lets the user
know that their data may be incomplete.

Reported-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# f7bc8b61 14-Jul-2011 Steven Rostedt <srostedt@redhat.com>

ftrace: Fix regression where ftrace breaks when modules are loaded

Enabling function tracer to trace all functions, then load a module and
then disable function tracing will cause ftrace to fail.

This can also happen by enabling function tracing on the command line:

ftrace=function

and during boot up, modules are loaded, then you disable function tracing
with 'echo nop > current_tracer' you will trigger a bug in ftrace that
will shut itself down.

The reason is, the new ftrace code keeps ref counts of all ftrace_ops that
are registered for tracing. When one or more ftrace_ops are registered,
all the records that represent the functions that the ftrace_ops will
trace have a ref count incremented. If this ref count is not zero,
when the code modification runs, that function will be enabled for tracing.
If the ref count is zero, that function will be disabled from tracing.

To make sure the accounting was working, FTRACE_WARN_ON()s were added
to updating of the ref counts.

If the ref count hits its max (> 2^30 ftrace_ops added), or if
the ref count goes below zero, a FTRACE_WARN_ON() is triggered which
disables all modification of code.

Since it is common for ftrace_ops to trace all functions in the kernel,
instead of creating > 20,000 hash items for the ftrace_ops, the hash
count is just set to zero, and it represents that the ftrace_ops is
to trace all functions. This is where the issues arrise.

If you enable function tracing to trace all functions, and then add
a module, the modules function records do not get the ref count updated.
When the function tracer is disabled, all function records ref counts
are subtracted. Since the modules never had their ref counts incremented,
they go below zero and the FTRACE_WARN_ON() is triggered.

The solution to this is rather simple. When modules are loaded, and
their functions are added to the the ftrace pool, look to see if any
ftrace_ops are registered that trace all functions. And for those,
update the ref count for the module function records.

Reported-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 6331c28c 13-Jul-2011 Steven Rostedt <srostedt@redhat.com>

ftrace: Fix dynamic selftest failure on some archs

Archs that do not implement CONFIG_HAVE_FUNCTION_TRACE_MCOUNT_TEST, will
fail the dynamic ftrace selftest.

The function tracer has a quick 'off' variable that will prevent
the call back functions from being called. This variable is called
function_trace_stop. In x86, this is implemented directly in the mcount
assembly, but for other archs, an intermediate function is used called
ftrace_test_stop_func().

In dynamic ftrace, the function pointer variable ftrace_trace_function is
used to update the caller code in the mcount caller. But for archs that
do not have CONFIG_HAVE_FUNCTION_TRACE_MCOUNT_TEST set, it only calls
ftrace_test_stop_func() instead, which in turn calls __ftrace_trace_function.

When more than one ftrace_ops is registered, the function it calls is
ftrace_ops_list_func(), which will iterate over all registered ftrace_ops
and call the callbacks that have their hash matching.

The issue happens when two ftrace_ops are registered for different functions
and one is then unregistered. The __ftrace_trace_function is then pointed
to the remaining ftrace_ops callback function directly. This mean it will
be called for all functions that were registered to trace by both ftrace_ops
that were registered.

This is not an issue for archs with CONFIG_HAVE_FUNCTION_TRACE_MCOUNT_TEST,
because the update of ftrace_trace_function doesn't happen until after all
functions have been updated, and then the mcount caller is updated. But
for those archs that do use the ftrace_test_stop_func(), the update is
immediate.

The dynamic selftest fails because it hits this situation, and the
ftrace_ops that it registers fails to only trace what it was suppose to
and instead traces all other functions.

The solution is to delay the setting of __ftrace_trace_function until
after all the functions have been updated according to the registered
ftrace_ops. Also, function_trace_stop is set during the update to prevent
function tracing from calling code that is caused by the function tracer
itself.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 072126f4 13-Jul-2011 Steven Rostedt <srostedt@redhat.com>

ftrace: Update filter when tracing enabled in set_ftrace_filter()

Currently, if set_ftrace_filter() is called when the ftrace_ops is
active, the function filters will not be updated. They will only be updated
when tracing is disabled and re-enabled.

Update the functions immediately during set_ftrace_filter().

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 41fb61c2 13-Jul-2011 Steven Rostedt <srostedt@redhat.com>

ftrace: Balance records when updating the hash

Whenever the hash of the ftrace_ops is updated, the record counts
must be balance. This requires disabling the records that are set
in the original hash, and then enabling the records that are set
in the updated hash.

Moving the update into ftrace_hash_move() removes the bug where the
hash was updated but the records were not, which results in ftrace
triggering a warning and disabling itself because the ftrace_ops filter
is updated while the ftrace_ops was registered, and then the failure
happens when the ftrace_ops is unregistered.

The current code will not trigger this bug, but new code will.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 4376cac6 24-Jun-2011 Steven Rostedt <srostedt@redhat.com>

ftrace: Do not disable interrupts for modules in mcount update

When I mounted an NFS directory, it caused several modules to be loaded. At the
time I was running the preemptirqsoff tracer, and it showed the following
output:

# tracer: preemptirqsoff
#
# preemptirqsoff latency trace v1.1.5 on 2.6.33.9-rt30-mrg-test
# --------------------------------------------------------------------
# latency: 1177 us, #4/4, CPU#3 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:4)
# -----------------
# | task: modprobe-19370 (uid:0 nice:0 policy:0 rt_prio:0)
# -----------------
# => started at: ftrace_module_notify
# => ended at: ftrace_module_notify
#
#
# _------=> CPU#
# / _-----=> irqs-off
# | / _----=> need-resched
# || / _---=> hardirq/softirq
# ||| / _--=> preempt-depth
# |||| /_--=> lock-depth
# |||||/ delay
# cmd pid |||||| time | caller
# \ / |||||| \ | /
modprobe-19370 3d.... 0us!: ftrace_process_locs <-ftrace_module_notify
modprobe-19370 3d.... 1176us : ftrace_process_locs <-ftrace_module_notify
modprobe-19370 3d.... 1178us : trace_hardirqs_on <-ftrace_module_notify
modprobe-19370 3d.... 1178us : <stack trace>
=> ftrace_process_locs
=> ftrace_module_notify
=> notifier_call_chain
=> __blocking_notifier_call_chain
=> blocking_notifier_call_chain
=> sys_init_module
=> system_call_fastpath

That's over 1ms that interrupts are disabled on a Real-Time kernel!

Looking at the cause (being the ftrace author helped), I found that the
interrupts are disabled before the code modification of mcounts into nops. The
interrupts only need to be disabled on start up around this code, not when
modules are being loaded.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 43dd61c9 07-Jul-2011 Steven Rostedt <srostedt@redhat.com>

ftrace: Fix regression of :mod:module function enabling

The new code that allows different utilities to pick and choose
what functions they trace broke the :mod: hook that allows users
to trace only functions of a particular module.

The reason is that the :mod: hook bypasses the hash that is setup
to allow individual users to trace their own functions and uses
the global hash directly. But if the global hash has not been
set up, it will cause a bug:

echo '*:mod:radeon' > /sys/kernel/debug/set_ftrace_filter

produces:

[drm:drm_mode_getfb] *ERROR* invalid framebuffer id
[drm:radeon_crtc_page_flip] *ERROR* failed to reserve new rbo buffer before flip
BUG: unable to handle kernel paging request at ffffffff8160ec90
IP: [<ffffffff810d9136>] add_hash_entry+0x66/0xd0
PGD 1a05067 PUD 1a09063 PMD 80000000016001e1
Oops: 0003 [#1] SMP Jul 7 04:02:28 phyllis kernel: [55303.858604] CPU 1
Modules linked in: cryptd aes_x86_64 aes_generic binfmt_misc rfcomm bnep ip6table_filter hid radeon r8169 ahci libahci mii ttm drm_kms_helper drm video i2c_algo_bit intel_agp intel_gtt

Pid: 10344, comm: bash Tainted: G WC 3.0.0-rc5 #1 Dell Inc. Inspiron N5010/0YXXJJ
RIP: 0010:[<ffffffff810d9136>] [<ffffffff810d9136>] add_hash_entry+0x66/0xd0
RSP: 0018:ffff88003a96bda8 EFLAGS: 00010246
RAX: ffff8801301735c0 RBX: ffffffff8160ec80 RCX: 0000000000306ee0
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff880137c92940
RBP: ffff88003a96bdb8 R08: ffff880137c95680 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000000 R12: ffffffff81c9df78
R13: ffff8801153d1000 R14: 0000000000000000 R15: 0000000000000000
FS: 00007f329c18a700(0000) GS:ffff880137c80000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffff8160ec90 CR3: 000000003002b000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process bash (pid: 10344, threadinfo ffff88003a96a000, task ffff88012fcfc470)
Stack:
0000000000000fd0 00000000000000fc ffff88003a96be38 ffffffff810d92f5
ffff88011c4c4e00 ffff880000000000 000000000b69f4d0 ffffffff8160ec80
ffff8800300e6f06 0000000081130295 0000000000000282 ffff8800300e6f00
Call Trace:
[<ffffffff810d92f5>] match_records+0x155/0x1b0
[<ffffffff810d940c>] ftrace_mod_callback+0xbc/0x100
[<ffffffff810dafdf>] ftrace_regex_write+0x16f/0x210
[<ffffffff810db09f>] ftrace_filter_write+0xf/0x20
[<ffffffff81166e48>] vfs_write+0xc8/0x190
[<ffffffff81167001>] sys_write+0x51/0x90
[<ffffffff815c7e02>] system_call_fastpath+0x16/0x1b
Code: 48 8b 33 31 d2 48 85 f6 75 33 49 89 d4 4c 03 63 08 49 8b 14 24 48 85 d2 48 89 10 74 04 48 89 42 08 49 89 04 24 4c 89 60 08 31 d2
RIP [<ffffffff810d9136>] add_hash_entry+0x66/0xd0
RSP <ffff88003a96bda8>
CR2: ffffffff8160ec90
---[ end trace a5d031828efdd88e ]---

Reported-by: Brian Marete <marete@toshnix.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 22fe9b54 07-Jun-2011 Peter Huewe <peterhuewe@gmx.de>

tracing: Convert to kstrtoul_from_user

This patch replaces the code for getting an unsigned long from a
userspace buffer by a simple call to kstroul_from_user.
This makes it easier to read and less error prone.

Signed-off-by: Peter Huewe <peterhuewe@gmx.de>
Link: http://lkml.kernel.org/r/1307476707-14762-1-git-send-email-peterhuewe@gmx.de
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# bd38c0e6 31-May-2011 Paul McQuade <tungstentide@gmail.com>

ftrace: Fixed an include coding style issue

Removed <asm/ftrace.h> because <linux/ftrace.h> was already declared.
Braces of struct's coding style fixed.

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Paul McQuade <tungstentide@gmail.com>
Link: http://lkml.kernel.org/r/4DE59711.3090900@gmail.com
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# a4f18ed1 07-Jun-2011 Steven Rostedt <srostedt@redhat.com>

ftrace: Revert 8ab2b7efd ftrace: Remove unnecessary disabling of irqs

Revert the commit that removed the disabling of interrupts around
the initial modifying of mcount callers to nops, and update the comment.

The original comment was outdated and stated that the interrupts were
being disabled to prevent kstop machine, which was required with the
old ftrace daemon, but was no longer the case.

What the comment failed to mention was that interrupts needed to be
disabled to keep interrupts from preempting the modifying of the code
and then executing the code that was partially modified.

Revert the commit and update the comment.

Reported-by: Richard W.M. Jones <rjones@redhat.com>
Tested-by: Richard W.M. Jones <rjones@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 0aff1c0c 01-Jun-2011 GuoWen Li <guowen.li.linux@gmail.com>

ftrace: Fix possible undefined return code

kernel/trace/ftrace.c: In function 'ftrace_regex_write.clone.15':
kernel/trace/ftrace.c:2743:6: warning: 'ret' may be used uninitialized in this
function

Signed-off-by: GuoWen Li <guowen.li.linux@gmail.com>
Link: http://lkml.kernel.org/r/201106011918.47939.guowen.li.linux@gmail.com
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# b1cff0ad 25-May-2011 Steven Rostedt <rostedt@goodmis.org>

ftrace: Add internal recursive checks

Witold reported a reboot caused by the selftests of the dynamic function
tracer. He sent me a config and I used ktest to do a config_bisect on it
(as my config did not cause the crash). It pointed out that the problem
config was CONFIG_PROVE_RCU.

What happened was that if multiple callbacks are attached to the
function tracer, we iterate a list of callbacks. Because the list is
managed by synchronize_sched() and preempt_disable, the access to the
pointers uses rcu_dereference_raw().

When PROVE_RCU is enabled, the rcu_dereference_raw() calls some
debugging functions, which happen to be traced. The tracing of the debug
function would then call rcu_dereference_raw() which would then call the
debug function and then... well you get the idea.

I first wrote two different patches to solve this bug.

1) add a __rcu_dereference_raw() that would not do any checks.
2) add notrace to the offending debug functions.

Both of these patches worked.

Talking with Paul McKenney on IRC, he suggested to add recursion
detection instead. This seemed to be a better solution, so I decided to
implement it. As the task_struct already has a trace_recursion to detect
recursion in the ring buffer, and that has a very small number it
allows, I decided to use that same variable to add flags that can detect
the recursion inside the infrastructure of the function tracer.

I plan to change it so that the task struct bit can be checked in
mcount, but as that requires changes to all archs, I will hold that off
to the next merge window.

Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/1306348063.1465.116.camel@gandalf.stny.rr.com
Reported-by: Witold Baryluk <baryluk@smp.if.uj.edu.pl>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 3b6cfdb1 23-May-2011 Steven Rostedt <srostedt@redhat.com>

ftrace: Set ops->flag to enabled even on static function tracing

When dynamic ftrace is not configured, the ops->flags still needs
to have its FTRACE_OPS_FL_ENABLED bit set in ftrace_startup().

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# a1cd6173 23-May-2011 Steven Rostedt <srostedt@redhat.com>

ftrace: Have ftrace_startup() return failure code

The register_ftrace_function() returns an error code on failure
except if the call to ftrace_startup() fails. Add a error return to
ftrace_startup() if it fails to start, allowing register_ftrace_funtion()
to return a proper error value.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 936e074b 05-May-2011 Steven Rostedt <srostedt@redhat.com>

ftrace: Modify ftrace_set_filter/notrace to take ops

Since users of the function tracer can now pick and choose which
functions they want to trace agnostically from other users of the
function tracer, we need to pass the ops struct to the ftrace_set_filter()
functions.

The functions ftrace_set_global_filter() and ftrace_set_global_notrace()
is added to keep the old filter functions which are used to modify
the generic function tracers.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# cdbe61bf 05-May-2011 Steven Rostedt <srostedt@redhat.com>

ftrace: Allow dynamically allocated function tracers

Now that functions may be selected individually, it only makes sense
that we should allow dynamically allocated trace structures to
be traced. This will allow perf to allocate a ftrace_ops structure
at runtime and use it to pick and choose which functions that
structure will trace.

Note, a dynamically allocated ftrace_ops will always be called
indirectly instead of being called directly from the mcount in
entry.S. This is because there's no safe way to prevent mcount
from being preempted before calling the function, unless we
modify every entry.S to do so (not likely). Thus, dynamically allocated
functions will now be called by the ftrace_ops_list_func() that
loops through the ops that are allocated if there are more than
one op allocated at a time. This loop is protected with a
preempt_disable.

To determine if an ftrace_ops structure is allocated or not, a new
util function was added to the kernel/extable.c called
core_kernel_data(), which returns 1 if the address is between
_sdata and _edata.

Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# b848914c 04-May-2011 Steven Rostedt <srostedt@redhat.com>

ftrace: Implement separate user function filtering

ftrace_ops that are registered to trace functions can now be
agnostic to each other in respect to what functions they trace.
Each ops has their own hash of the functions they want to trace
and a hash to what they do not want to trace. A empty hash for
the functions they want to trace denotes all functions should
be traced that are not in the notrace hash.

Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 07fd5515 05-May-2011 Steven Rostedt <srostedt@redhat.com>

ftrace: Free hash with call_rcu_sched()

When a hash is modified and might be in use, we need to perform
a schedule RCU operation on it, as the hashes will soon be used
directly in the function tracer callback.

Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 2b499381 03-May-2011 Steven Rostedt <srostedt@redhat.com>

ftrace: Have global_ops store the functions that are to be traced

This is a step towards each ops structure defining its own set
of functions to trace. As the current code with pid's and such
are specific to the global_ops, it is restructured to be used
with the global ops.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# bd69c30b 03-May-2011 Steven Rostedt <srostedt@redhat.com>

ftrace: Add ops parameter to ftrace_startup/shutdown functions

In order to allow different ops to enable different functions,
the ftrace_startup() and ftrace_shutdown() functions need the
ops parameter passed to them.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 647bcd03 03-May-2011 Steven Rostedt <srostedt@redhat.com>

ftrace: Add enabled_functions file

Add the enabled_functions file that is used to show all the
functions that have been enabled for tracing as well as their
ref counts. This helps seeing if any function has been registered
and what functions are being traced.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# ed926f9b 03-May-2011 Steven Rostedt <srostedt@redhat.com>

ftrace: Use counters to enable functions to trace

Every function has its own record that stores the instruction
pointer and flags for the function to be traced. There are only
two flags: enabled and free. The enabled flag states that tracing
for the function has been enabled (actively traced), and the free
flag states that the record no longer points to a function and can
be used by new functions (loaded modules).

These flags are now moved to the MSB of the flags (actually just
the top 32bits). The rest of the bits (30 bits) are now used as
a ref counter. Everytime a tracer register functions to trace,
those functions will have its counter incremented.

When tracing is enabled, to determine if a function should be traced,
the counter is examined, and if it is non-zero it is set to trace.

When a ftrace_ops is registered to trace functions, its hashes
are examined. If the ftrace_ops filter_hash count is zero, then
all functions are set to be traced, otherwise only the functions
in the hash are to be traced. The exception to this is if a function
is also in the ftrace_ops notrace_hash. Then that function's counter
is not incremented for this ftrace_ops.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 33dc9b12 02-May-2011 Steven Rostedt <srostedt@redhat.com>

ftrace: Separate hash allocation and assignment

When filtering, allocate a hash to insert the function records.
After the filtering is complete, assign it to the ftrace_ops structure.

This allows the ftrace_ops structure to have a much smaller array of
hash buckets instead of wasting a lot of memory.

A read only empty_hash is created to be the minimum size that any ftrace_ops
can point to.

When a new hash is created, it has the following steps:

o Allocate a default hash.
o Walk the function records assigning the filtered records to the hash
o Allocate a new hash with the appropriate size buckets
o Move the entries from the default hash to the new hash.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# f45948e8 01-May-2011 Steven Rostedt <srostedt@redhat.com>

ftrace: Create a global_ops to hold the filter and notrace hashes

Combine the filter and notrace hashes to be accessed by a single entity,
the global_ops. The global_ops is a ftrace_ops structure that is passed
to different functions that can read or modify the filtering of the
function tracer.

The ftrace_ops structure was modified to hold a filter and notrace
hashes so that later patches may allow each ftrace_ops to have its own
set of rules to what functions may be filtered.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 1cf41dd7 29-Apr-2011 Steven Rostedt <srostedt@redhat.com>

ftrace: Use hash instead for FTRACE_FL_FILTER

When multiple users are allowed to have their own set of functions
to trace, having the FTRACE_FL_FILTER flag will not be enough to
handle the accounting of those users. Each user will need their own
set of functions.

Replace the FTRACE_FL_FILTER with a filter_hash instead. This is
temporary until the rest of the function filtering accounting
gets in.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# b448c4e3 29-Apr-2011 Steven Rostedt <srostedt@redhat.com>

ftrace: Replace FTRACE_FL_NOTRACE flag with a hash of ignored functions

To prepare for the accounting system that will allow multiple users of
the function tracer, having the FTRACE_FL_NOTRACE as a flag in the
dyn_trace record does not make sense.

All ftrace_ops will soon have a hash of functions they should trace
and not trace. By making a global hash of functions not to trace makes
this easier for the transition.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# b9df92d2 28-Apr-2011 Steven Rostedt <srostedt@redhat.com>

ftrace: Consolidate the function match routines for normal and mods

The code used for matching functions is almost identical between normal
selecting of functions and using the :mod: feature of set_ftrace_notrace.

Consolidate the two users into one function.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 491d0dcf 27-Apr-2011 Steven Rostedt <srostedt@redhat.com>

ftrace: Consolidate updating of ftrace_trace_function

There are three locations that perform almost identical functions in order
to update the ftrace_trace_function (the ftrace function variable that gets
called by mcount).

Consolidate these into a single function called update_ftrace_function().

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 996e87be 26-Apr-2011 Steven Rostedt <srostedt@redhat.com>

ftrace: Move record update for normal and modules into a separate function

The updating of a function record is moved to a single function. This will allow
us to add specific changes in one location for both modules and kernel
functions.

Later patches will determine if the function record itself needs to be updated
(which enables the mcount caller), or just the ftrace_ops needs the update.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# d2c8c3ea 25-Apr-2011 Steven Rostedt <srostedt@redhat.com>

ftrace: Remove FTRACE_FL_CONVERTED flag

Since we disable all function tracer processing if we detect
that a modification of a instruction had failed, we do not need
to track that the record has failed. No more ftrace processing
is allowed, and the FTRACE_FL_CONVERTED flag is pointless.

The FTRACE_FL_CONVERTED flag was used to denote records that were
successfully converted from mcount calls into nops. But if a single
record fails, all of ftrace is disabled.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 45a4a237 21-Apr-2011 Steven Rostedt <srostedt@redhat.com>

ftrace: Remove FTRACE_FL_FAILED flag

Since we disable all function tracer processing if we detect
that a modification of a instruction had failed, we do not need
to track that the record has failed. No more ftrace processing
is allowed, and the FTRACE_FL_FAILED flag is pointless.

Removing this flag simplifies some of the code, but some ftrace_disabled
checks needed to be added or move around a little.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 3499e461 21-Apr-2011 Steven Rostedt <srostedt@redhat.com>

ftrace: Remove failures file

The failures file in the debugfs tracing directory would list the
functions that failed to convert when the old dead ftrace daemon
tried to update code but failed. Since this code is now dead along
with the daemon the failures file is useless. Remove it.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 8ab2b7ef 21-Apr-2011 Steven Rostedt <srostedt@redhat.com>

ftrace: Remove unnecessary disabling of irqs

The disabling of interrupts around ftrace_update_code() was used
to protect against the evil ftrace daemon from years past. But that
daemon has long been killed. It is safe to keep interrupts enabled
while updating the initial mcount into nops.

The ftrace_mutex is also held which keeps other users at bay.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 0778d9ad 29-Apr-2011 Steven Rostedt <srostedt@redhat.com>

ftrace: Make FTRACE_WARN_ON() work in if condition

Let FTRACE_WARN_ON() be used as a stand alone statement or
inside a conditional: if (FTRACE_WARN_ON(x))

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 058e297d 29-Apr-2011 Steven Rostedt <srostedt@redhat.com>

ftrace: Only update the function code on write to filter files

If function tracing is enabled, a read of the filter files will
cause the call to stop_machine to update the function trace sites.
It should only call stop_machine on write.

Cc: stable@kernel.org
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 25985edc 30-Mar-2011 Lucas De Marchi <lucas.demarchi@profusion.mobi>

Fix common misspellings

Fixes generated by 'codespell' and manually reviewed.

Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>


# 1106b699 16-Feb-2011 Jiri Olsa <jolsa@redhat.com>

tracing: Fix set_ftrace_filter probe function display

If one or more function probes (like traceon) are enabled,
and there's no other function filter, the first probe
func is skipped (which one depends on the position in the hash).

$ echo sys_open:traceon sys_close:traceon > ./set_ftrace_filter
$ cat set_ftrace_filter
#### all functions enabled ####
sys_close:traceon:unlimited
$

The reason was, that in the case of no other function filter,
the func_pos was not properly updated before calling t_hash_start.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
LKML-Reference: <1297874134-7008-1-git-send-email-jolsa@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 868baf07 10-Feb-2011 Steven Rostedt <srostedt@redhat.com>

ftrace: Fix memory leak with function graph and cpu hotplug

When the fuction graph tracer starts, it needs to make a special
stack for each task to save the real return values of the tasks.
All running tasks have this stack created, as well as any new
tasks.

On CPU hot plug, the new idle task will allocate a stack as well
when init_idle() is called. The problem is that cpu hotplug does
not create a new idle_task. Instead it uses the idle task that
existed when the cpu went down.

ftrace_graph_init_task() will add a new ret_stack to the task
that is given to it. Because a clone will make the task
have a stack of its parent it does not check if the task's
ret_stack is already NULL or not. When the CPU hotplug code
starts a CPU up again, it will allocate a new stack even
though one already existed for it.

The solution is to treat the idle_task specially. In fact, the
function_graph code already does, just not at init_idle().
Instead of using the ftrace_graph_init_task() for the idle task,
which that function expects the task to be a clone, have a
separate ftrace_graph_init_idle_task(). Also, we will create a
per_cpu ret_stack that is used by the idle task. When we call
ftrace_graph_init_idle_task() it will check if the idle task's
ret_stack is NULL, if it is, then it will assign it the per_cpu
ret_stack.

Reported-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Suggested-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stable Tree <stable@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 907f2784 27-Sep-2010 matt mooney <mfm@muteddisk.com>

tracing/trivial: Remove cast from void*

Unnecessary cast from void* in assignment.

Signed-off-by: matt mooney <mfm@muteddisk.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 6038f373 15-Aug-2010 Arnd Bergmann <arnd@arndb.de>

llseek: automatically add .llseek fop

All file_operations should get a .llseek operation so we can make
nonseekable_open the default for future file operations without a
.llseek pointer.

The three cases that we can automatically detect are no_llseek, seq_lseek
and default_llseek. For cases where we can we can automatically prove that
the file offset is always ignored, we use noop_llseek, which maintains
the current behavior of not returning an error from a seek.

New drivers should normally not use noop_llseek but instead use no_llseek
and call nonseekable_open at open time. Existing drivers can be converted
to do the same when the maintainer knows for certain that no user code
relies on calling seek on the device file.

The generated code is often incorrectly indented and right now contains
comments that clarify for each added line why a specific variant was
chosen. In the version that gets submitted upstream, the comments will
be gone and I will manually fix the indentation, because there does not
seem to be a way to do that using coccinelle.

Some amount of new code is currently sitting in linux-next that should get
the same modifications, which I will do at the end of the merge window.

Many thanks to Julia Lawall for helping me learn to write a semantic
patch that does all this.

===== begin semantic patch =====
// This adds an llseek= method to all file operations,
// as a preparation for making no_llseek the default.
//
// The rules are
// - use no_llseek explicitly if we do nonseekable_open
// - use seq_lseek for sequential files
// - use default_llseek if we know we access f_pos
// - use noop_llseek if we know we don't access f_pos,
// but we still want to allow users to call lseek
//
@ open1 exists @
identifier nested_open;
@@
nested_open(...)
{
<+...
nonseekable_open(...)
...+>
}

@ open exists@
identifier open_f;
identifier i, f;
identifier open1.nested_open;
@@
int open_f(struct inode *i, struct file *f)
{
<+...
(
nonseekable_open(...)
|
nested_open(...)
)
...+>
}

@ read disable optional_qualifier exists @
identifier read_f;
identifier f, p, s, off;
type ssize_t, size_t, loff_t;
expression E;
identifier func;
@@
ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off)
{
<+...
(
*off = E
|
*off += E
|
func(..., off, ...)
|
E = *off
)
...+>
}

@ read_no_fpos disable optional_qualifier exists @
identifier read_f;
identifier f, p, s, off;
type ssize_t, size_t, loff_t;
@@
ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off)
{
... when != off
}

@ write @
identifier write_f;
identifier f, p, s, off;
type ssize_t, size_t, loff_t;
expression E;
identifier func;
@@
ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off)
{
<+...
(
*off = E
|
*off += E
|
func(..., off, ...)
|
E = *off
)
...+>
}

@ write_no_fpos @
identifier write_f;
identifier f, p, s, off;
type ssize_t, size_t, loff_t;
@@
ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off)
{
... when != off
}

@ fops0 @
identifier fops;
@@
struct file_operations fops = {
...
};

@ has_llseek depends on fops0 @
identifier fops0.fops;
identifier llseek_f;
@@
struct file_operations fops = {
...
.llseek = llseek_f,
...
};

@ has_read depends on fops0 @
identifier fops0.fops;
identifier read_f;
@@
struct file_operations fops = {
...
.read = read_f,
...
};

@ has_write depends on fops0 @
identifier fops0.fops;
identifier write_f;
@@
struct file_operations fops = {
...
.write = write_f,
...
};

@ has_open depends on fops0 @
identifier fops0.fops;
identifier open_f;
@@
struct file_operations fops = {
...
.open = open_f,
...
};

// use no_llseek if we call nonseekable_open
////////////////////////////////////////////
@ nonseekable1 depends on !has_llseek && has_open @
identifier fops0.fops;
identifier nso ~= "nonseekable_open";
@@
struct file_operations fops = {
... .open = nso, ...
+.llseek = no_llseek, /* nonseekable */
};

@ nonseekable2 depends on !has_llseek @
identifier fops0.fops;
identifier open.open_f;
@@
struct file_operations fops = {
... .open = open_f, ...
+.llseek = no_llseek, /* open uses nonseekable */
};

// use seq_lseek for sequential files
/////////////////////////////////////
@ seq depends on !has_llseek @
identifier fops0.fops;
identifier sr ~= "seq_read";
@@
struct file_operations fops = {
... .read = sr, ...
+.llseek = seq_lseek, /* we have seq_read */
};

// use default_llseek if there is a readdir
///////////////////////////////////////////
@ fops1 depends on !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier readdir_e;
@@
// any other fop is used that changes pos
struct file_operations fops = {
... .readdir = readdir_e, ...
+.llseek = default_llseek, /* readdir is present */
};

// use default_llseek if at least one of read/write touches f_pos
/////////////////////////////////////////////////////////////////
@ fops2 depends on !fops1 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier read.read_f;
@@
// read fops use offset
struct file_operations fops = {
... .read = read_f, ...
+.llseek = default_llseek, /* read accesses f_pos */
};

@ fops3 depends on !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier write.write_f;
@@
// write fops use offset
struct file_operations fops = {
... .write = write_f, ...
+ .llseek = default_llseek, /* write accesses f_pos */
};

// Use noop_llseek if neither read nor write accesses f_pos
///////////////////////////////////////////////////////////

@ fops4 depends on !fops1 && !fops2 && !fops3 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier read_no_fpos.read_f;
identifier write_no_fpos.write_f;
@@
// write fops use offset
struct file_operations fops = {
...
.write = write_f,
.read = read_f,
...
+.llseek = noop_llseek, /* read and write both use no f_pos */
};

@ depends on has_write && !has_read && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier write_no_fpos.write_f;
@@
struct file_operations fops = {
... .write = write_f, ...
+.llseek = noop_llseek, /* write uses no f_pos */
};

@ depends on has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier read_no_fpos.read_f;
@@
struct file_operations fops = {
... .read = read_f, ...
+.llseek = noop_llseek, /* read uses no f_pos */
};

@ depends on !has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
@@
struct file_operations fops = {
...
+.llseek = noop_llseek, /* no read or write fn */
};
===== End semantic patch =====

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Julia Lawall <julia@diku.dk>
Cc: Christoph Hellwig <hch@infradead.org>


# 79e406d7 14-Sep-2010 Steven Rostedt <srostedt@redhat.com>

tracing: Remove leftover FTRACE_ENABLE/DISABLE_MCOUNT enums

The enums for FTRACE_ENABLE_MCOUNT and FTRACE_DISABLE_MCOUNT were
used as commands to ftrace_run_update_code(). But these commands
were used by the old nasty ftrace daemon that has long been slain.

This is a clean up patch to remove the references to these enums
and simplify the code a little.

Reported-by: Wu Zhangjin <wuzhangjin@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 57c072c7 14-Sep-2010 Steven Rostedt <srostedt@redhat.com>

tracing: Fix reading of set_ftrace_filter across lists

If we do:

# cd /sys/kernel/debug
# echo 'do_IRQ:traceon schedule:traceon sys_write:traceon' > \
set_ftrace_filter
# cat set_ftrace_filter

We get the following output:

#### all functions enabled ####
sys_write:traceon:unlimited
schedule:traceon:unlimited
do_IRQ:traceon:unlimited

This outputs two lists. One is the fact that all functions are
currently enabled for function tracing, the other has three probed
functions, which happen to have 'traceon' as their commands.

Currently, when reading the first list (functions enabled) the
seq_file code will receive a "NULL" from the t_next() function
causing it to exit early. This makes "read()" from userspace stop
reading the code at this boarder. Although read is allowed to do this,
some (broken) applications might consider this an end of file and
stop early.

This patch adds the start of the second list to t_next() when it
finishes the first list. It is a simple change and gives the
set_ftrace_filter file nicer reading ability.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 98c4fd04 10-Sep-2010 Steven Rostedt <srostedt@redhat.com>

tracing: Keep track of set_ftrace_filter position and allow lseek again

This patch keeps track of the index within the elements of
set_ftrace_filter and if the position goes backwards, it nicely
resets and starts from the beginning again.

This allows for lseek and pread to work properly now.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 4aeb6967 09-Sep-2010 Steven Rostedt <srostedt@redhat.com>

tracing: Replace typecasted void pointer in set_ftrace_filter code

The set_ftrace_filter uses seq_file and reads from two lists. The
pointer returned by t_next() can either be of type struct dyn_ftrace
or struct ftrace_func_probe. If there is a bug (there was one)
the wrong pointer may be used and the reference can cause an oops.

This patch makes t_next() and friends only return the iterator structure
which now has a pointer of type struct dyn_ftrace and struct
ftrace_func_probe. The t_show() can now test if the pointer is NULL or
not and if the pointer exists, it is guaranteed to be of the correct type.

Now if there's a bug, only wrong data will be shown but not an oops.

Cc: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 2bccfffd 09-Sep-2010 Steven Rostedt <srostedt@redhat.com>

tracing: Do not reset *pos in set_ftrace_filter

After the filtered functions are read, the probed functions are read
from the hash in set_ftrace_filter. When the hashed probed functions
are read, the *pos passed in is reset. Instead of modifying the pos
given to the read function, just record the pos where the filtered
functions ended and subtract from that.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# df091625 09-Sep-2010 Chris Wright <chrisw@sous-sol.org>

tracing: t_start: reset FTRACE_ITER_HASH in case of seek/pread

Be sure to avoid entering t_show() with FTRACE_ITER_HASH set without
having properly started the iterator to iterate the hash. This case is
degenerate and, as discovered by Robert Swiecki, can cause t_hash_show()
to misuse a pointer. This causes a NULL ptr deref with possible security
implications. Tracked as CVE-2010-3079.

Cc: Robert Swiecki <swiecki@google.com>
Cc: Eugene Teo <eugene@redhat.com>
Cc: <stable@kernel.org>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 9c55cb12 08-Sep-2010 Steven Rostedt <srostedt@redhat.com>

tracing: Do not allow llseek to set_ftrace_filter

Reading the file set_ftrace_filter does three things.

1) shows whether or not filters are set for the function tracer
2) shows what functions are set for the function tracer
3) shows what triggers are set on any functions

3 is independent from 1 and 2.

The way this file currently works is that it is a state machine,
and as you read it, it may change state. But this assumption breaks
when you use lseek() on the file. The state machine gets out of sync
and the t_show() may use the wrong pointer and cause a kernel oops.

Luckily, this will only kill the app that does the lseek, but the app
dies while holding a mutex. This prevents anyone else from using the
set_ftrace_filter file (or any other function tracing file for that matter).

A real fix for this is to rewrite the code, but that is too much for
a -rc release or stable. This patch simply disables llseek on the
set_ftrace_filter() file for now, and we can do the proper fix for the
next major release.

Reported-by: Robert Swiecki <swiecki@google.com>
Cc: Chris Wright <chrisw@sous-sol.org>
Cc: Tavis Ormandy <taviso@google.com>
Cc: Eugene Teo <eugene@redhat.com>
Cc: vendor-sec@lst.de
Cc: <stable@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 3aaba20f 23-Aug-2010 Li Zefan <lizf@cn.fujitsu.com>

tracing: Fix a race in function profile

While we are reading trace_stat/functionX and someone just
disabled function_profile at that time, we can trigger this:

divide error: 0000 [#1] PREEMPT SMP
...
EIP is at function_stat_show+0x90/0x230
...

This fix just takes the ftrace_profile_lock and checks if
rec->counter is 0. If it's 0, we know the profile buffer
has been reset.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: stable@kernel.org
LKML-Reference: <4C723644.4040708@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 5168ae50 03-Jun-2010 Steven Rostedt <srostedt@redhat.com>

tracing: Remove ftrace_preempt_disable/enable

The ftrace_preempt_disable/enable functions were to address a
recursive race caused by the function tracer. The function tracer
traces all functions which makes it easily susceptible to recursion.
One area was preempt_enable(). This would call the scheduler and
the schedulre would call the function tracer and loop.
(So was it thought).

The ftrace_preempt_disable/enable was made to protect against recursion
inside the scheduler by storing the NEED_RESCHED flag. If it was
set before the ftrace_preempt_disable() it would not call schedule
on ftrace_preempt_enable(), thinking that if it was set before then
it would have already scheduled unless it was already in the scheduler.

This worked fine except in the case of SMP, where another task would set
the NEED_RESCHED flag for a task on another CPU, and then kick off an
IPI to trigger it. This could cause the NEED_RESCHED to be saved at
ftrace_preempt_disable() but the IPI to arrive in the the preempt
disabled section. The ftrace_preempt_enable() would not call the scheduler
because the flag was already set before entring the section.

This bug would cause a missed preemption check and cause lower latencies.

Investigating further, I found that the recusion caused by the function
tracer was not due to schedule(), but due to preempt_schedule(). Now
that preempt_schedule is completely annotated with notrace, the recusion
no longer is an issue.

Reported-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 38516ab5 20-Apr-2010 Steven Rostedt <srostedt@redhat.com>

tracing: Let tracepoints have data passed to tracepoint callbacks

This patch adds data to be passed to tracepoint callbacks.

The created functions from DECLARE_TRACE() now need a mandatory data
parameter. For example:

DECLARE_TRACE(mytracepoint, int value, value)

Will create the register function:

int register_trace_mytracepoint((void(*)(void *data, int value))probe,
void *data);

As the first argument, all callbacks (probes) must take a (void *data)
parameter. So a callback for the above tracepoint will look like:

void myprobe(void *data, int value)
{
}

The callback may choose to ignore the data parameter.

This change allows callbacks to register a private data pointer along
with the function probe.

void mycallback(void *data, int value);

register_trace_mytracepoint(mycallback, mydata);

Then the mycallback() will receive the "mydata" as the first parameter
before the args.

A more detailed example:

DECLARE_TRACE(mytracepoint, TP_PROTO(int status), TP_ARGS(status));

/* In the C file */

DEFINE_TRACE(mytracepoint, TP_PROTO(int status), TP_ARGS(status));

[...]

trace_mytracepoint(status);

/* In a file registering this tracepoint */

int my_callback(void *data, int status)
{
struct my_struct my_data = data;
[...]
}

[...]
my_data = kmalloc(sizeof(*my_data), GFP_KERNEL);
init_my_data(my_data);
register_trace_mytracepoint(my_callback, my_data);

The same callback can also be registered to the same tracepoint as long
as the data registered is different. Note, the data must also be used
to unregister the callback:

unregister_trace_mytracepoint(my_callback, my_data);

Because of the data parameter, tracepoints declared this way can not have
no args. That is:

DECLARE_TRACE(mytracepoint, TP_PROTO(void), TP_ARGS());

will cause an error.

If no arguments are needed, a new macro can be used instead:

DECLARE_TRACE_NOARGS(mytracepoint);

Since there are no arguments, the proto and args fields are left out.

This is part of a series to make the tracepoint footprint smaller:

text data bss dec hex filename
4913961 1088356 861512 6863829 68bbd5 vmlinux.orig
4914025 1088868 861512 6864405 68be15 vmlinux.class
4918492 1084612 861512 6864616 68bee8 vmlinux.tracepoint

Again, this patch also increases the size of the kernel, but
lays the ground work for decreasing it.

v5: Fixed net/core/drop_monitor.c to handle these updates.

v4: Moved the DECLARE_TRACE() DECLARE_TRACE_NOARGS out of the
#ifdef CONFIG_TRACE_POINTS, since the two are the same in both
cases. The __DECLARE_TRACE() is what changes.
Thanks to Frederic Weisbecker for pointing this out.

v3: Made all register_* functions require data to be passed and
all callbacks to take a void * parameter as its first argument.
This makes the calling functions comply with C standards.

Also added more comments to the modifications of DECLARE_TRACE().

v2: Made the DECLARE_TRACE() have the ability to pass arguments
and added a new DECLARE_TRACE_NOARGS() for tracepoints that
do not need any arguments.

Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Masami Hiramatsu <mhiramat@redhat.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 27a9da65 04-May-2010 Peter Zijlstra <a.p.zijlstra@chello.nl>

sched: Remove rq argument to the tracepoints

struct rq isn't visible outside of sched.o so its near useless to
expose the pointer, also there are no users of it, so remove it.

Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1272997616.1642.207.camel@laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 37e44bc5 27-Apr-2010 Steven Rostedt <srostedt@redhat.com>

tracing: Fix sleep time function profiling

When sleep_time is off the function profiler ignores the time that a task
is scheduled out. When the task is scheduled out a timestamp is taken.
When the task is scheduled back in, the timestamp is compared to the
current time and the saved calltimes are adjusted accordingly.

But when stopping the function profiler, the sched switch hook that
does this adjustment was stopped before shutting down the tracer.
This allowed some tasks to not get their timestamps set when they
scheduled out. When the function profiler started again, this would
skew the times of the scheduler functions.

This patch moves the stopping of the sched switch to after the function
profiler is stopped. It also ignores zero set calltimes, which may
happen on start up.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# e330b3bc 26-Apr-2010 Chase Douglas <chase.douglas@canonical.com>

tracing: Show sample std dev in function profiling

When combined with function graph tracing the ftrace function profiler
also prints the average run time of functions. While this gives us some
good information, it doesn't tell us anything about the variance of the
run times of the function. This change prints out the s^2 sample
standard deviation alongside the average.

This change adds one entry to the profile record structure. This
increases the memory footprint of the function profiler by 1/3 on a
32-bit system, and by 1/5 on a 64-bit system when function graphing is
enabled, though the memory is only allocated when the profiler is turned
on. During the profiling, one extra line of code adds the squared
calltime to the new record entry, so this should not adversly affect
performance.

Note that the square of the sample standard deviation is printed because
there is no sqrt implementation for unsigned long long in the kernel.

Signed-off-by: Chase Douglas <chase.douglas@canonical.com>
LKML-Reference: <1272304925-2436-1-git-send-email-chase.douglas@canonical.com>

[ fixed comment about ns^2 -> us^2 conversion ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 5a0e3ad6 24-Mar-2010 Tejun Heo <tj@kernel.org>

include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h

percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.

2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).

* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>


# ea14eb71 12-Mar-2010 Steven Rostedt <srostedt@redhat.com>

function-graph: Init curr_ret_stack with ret_stack

If the graph tracer is active, and a task is forked but the allocating of
the processes graph stack fails, it can cause crash later on.

This is due to the temporary stack being NULL, but the curr_ret_stack
variable is copied from the parent. If it is not -1, then in
ftrace_graph_probe_sched_switch() the following:

for (index = next->curr_ret_stack; index >= 0; index--)
next->ret_stack[index].calltime += timestamp;

Will cause a kernel OOPS.

Found with Li Zefan's ftrace_stress_test.

Cc: stable@kernel.org
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 3f379b03 05-Mar-2010 Paul E. McKenney <paulmck@kernel.org>

ftrace: Replace read_barrier_depends() with rcu_dereference_raw()

Replace the calls to read_barrier_depends() in
ftrace_list_func() with rcu_dereference_raw() to improve
readability. The reason that we use rcu_dereference_raw() here
is that removed entries are never freed, instead they are simply
leaked. This is one of a very few cases where use of
rcu_dereference_raw() is the long-term right answer. And I
don't yet know of any others. ;-)

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: laijs@cn.fujitsu.com
Cc: dipankar@in.ibm.com
Cc: mathieu.desnoyers@polymtl.ca
Cc: josh@joshtriplett.org
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: Valdis.Kletnieks@vt.edu
Cc: dhowells@redhat.com
LKML-Reference: <1267830207-9474-1-git-send-email-paulmck@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 801c29fd 05-Mar-2010 Steven Rostedt <srostedt@redhat.com>

function-graph: Fix unused reference to ftrace_set_func()

The declaration of ftrace_set_func() is at the start of the ftrace.c file
and wrapped with a #ifdef CONFIG_FUNCTION_GRAPH condition. If function
graph tracing is enabled but CONFIG_DYNAMIC_FTRACE is not, a warning
about that function being declared static and unused is given.

This really should have been placed within the CONFIG_FUNCTION_GRAPH
condition that uses ftrace_set_func().

Moving the declaration down fixes the warning and makes the code cleaner.

Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# c7c6b1fe 10-Feb-2010 Li Zefan <lizf@cn.fujitsu.com>

ftrace: Allow to remove a single function from function graph filter

I don't see why we can only clear all functions from the filter.

After patching:

# echo sys_open > set_graph_function
# echo sys_close >> set_graph_function
# cat set_graph_function
sys_open
sys_close
# echo '!sys_close' >> set_graph_function
# cat set_graph_function
sys_open

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
LKML-Reference: <4B726388.2000408@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# f24bb999 02-Feb-2010 Masami Hiramatsu <mhiramat@redhat.com>

ftrace: Remove record freezing

Remove record freezing. Because kprobes never puts probe on
ftrace's mcount call anymore, it doesn't need ftrace to check
whether kprobes on it.

Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Cc: systemtap <systemtap@sources.redhat.com>
Cc: DLE <dle-develop@lists.sourceforge.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: przemyslaw@pawelczyk.it
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <20100202214925.4694.73469.stgit@dhcp-100-2-132.bos.redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 2cfa1978 02-Feb-2010 Masami Hiramatsu <mhiramat@redhat.com>

ftrace/alternatives: Introducing *_text_reserved functions

Introducing *_text_reserved functions for checking the text
address range is partially reserved or not. This patch provides
checking routines for x86 smp alternatives and dynamic ftrace.
Since both functions modify fixed pieces of kernel text, they
should reserve and protect those from other dynamic text
modifier, like kprobes.

This will also be extended when introducing other subsystems
which modify fixed pieces of kernel text. Dynamic text modifiers
should avoid those.

Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Cc: systemtap <systemtap@sources.redhat.com>
Cc: DLE <dle-develop@lists.sourceforge.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: przemyslaw@pawelczyk.it
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Jim Keniston <jkenisto@us.ibm.com>
Cc: Mathieu Desnoyers <compudj@krystal.dyndns.org>
Cc: Jason Baron <jbaron@redhat.com>
LKML-Reference: <20100202214911.4694.16587.stgit@dhcp-100-2-132.bos.redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 751e9983 13-Jan-2010 Li Zefan <lizf@cn.fujitsu.com>

ftrace: Fix MATCH_END_ONLY function filter

For '*foo' pattern, we should allow any string ending with
'foo', but ftrace filter incorrectly disallows strings
like bar_foo_foo:

# echo '*io' > set_ftrace_filter
# cat set_ftrace_filter | grep 'req_bio_endio'
# cat available_filter_functions | grep 'req_bio_endio'
req_bio_endio

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
LKML-Reference: <4B4E870E.6060607@cn.fujitsu.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 91baf628 07-Dec-2009 Li Zefan <lizf@cn.fujitsu.com>

function-graph: Allow writing the same val to set_graph_function

# echo 'do_open' > set_graph_function
# echo 'do_open' >> set_graph_function
bash: echo: write error: Invalid argument

Make it valid to write the same value to set_graph_function,
which is consistent with set_ftrace_filter interface.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
LKML-reference: <4B1DC4E1.1060303@cn.fujitsu.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>


# 313254a9 07-Dec-2009 Li Zefan <lizf@cn.fujitsu.com>

ftrace: Call trace_parser_clear() properly

I found a weird behavior:

# echo 'fuse:*' > set_ftrace_filter
bash: echo: write error: Invalid argument
# cat set_ftrace_filter
fuse_dev_fasync
fuse_dev_poll
fuse_copy_do

We should call trace_parser_clear() no matter ftrace_process_regex()
returns 0 or -errno, otherwise we will actually take the unaccepted
records from ftrace_regex_release().

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <4B1DC4D2.3000406@cn.fujitsu.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>


# 311d16da 07-Dec-2009 Li Zefan <lizf@cn.fujitsu.com>

ftrace: Return EINVAL when writing invalid val to set_ftrace_filter

Currently it doesn't warn user on invald value:

# echo nonexist_symbol > set_ftrace_filter
or:
# echo 'nonexist_symbol:mod:fuse' > set_ftrace_filter

Better make it return failure.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <4B1DC4BF.2070003@cn.fujitsu.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>


# 457dc928 23-Nov-2009 Ingo Molnar <mingo@elte.hu>

tracing, function tracer: Clean up strstrip() usage

Clean up strstrip() usage - which also addresses this build warning:

kernel/trace/ftrace.c: In function 'ftrace_pid_write':
kernel/trace/ftrace.c:3004: warning: ignoring return value of 'strstrip', declared with attribute warn_unused_result

Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# f6060f46 04-Nov-2009 Lai Jiangshan <laijs@cn.fujitsu.com>

tracing: Prevent build warning: 'ftrace_graph_buf' defined but not used

Prevent build warning when CONFIG_FUNCTION_GRAPH_TRACER is not set.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
LKML-Reference: <4AF24381.5060307@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# ed146b25 02-Nov-2009 Li Zefan <lizf@cn.fujitsu.com>

ftrace: Fix unmatched locking in ftrace_regex_write()

When a command is passed to the set_ftrace_filter, then
the ftrace_regex_lock is still held going back to user space.

# echo 'do_open : foo' > set_ftrace_filter
(still holding ftrace_regex_lock when returning to user space!)

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
LKML-Reference: <4AEF7F8A.3080300@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# cf8517cf 23-Oct-2009 Jiri Olsa <jolsa@redhat.com>

tracing: Update *ppos instead of filp->f_pos

Instead of directly updating filp->f_pos we should update the *ppos
argument. The filp->f_pos gets updated within the file_pos_write()
function called from sys_write().

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <20091023233646.399670810@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 06f43d66 14-Oct-2009 Frederic Weisbecker <fweisbec@gmail.com>

ftrace: Copy ftrace_graph_filter boot param using strlcpy

We are using strncpy in the wrong way to copy the ftrace_graph_filter
boot param because we pass the buffer size instead of the max string
size it can contain (buffer size - 1). The end result might not be
NULL terminated as we are abusing the max string size.

Lets use strlcpy() instead.

Reported-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>


# 5cb084bb 13-Oct-2009 Jiri Olsa <jolsa@redhat.com>

tracing: Enable records during the module load

I was debuging some module using "function" and "function_graph"
tracers and noticed, that if you load module after you enabled
tracing, the module's hooks will convert only to NOP instructions.

The attached patch enables modules' hooks if there's function trace
allready on, thus allowing to trace module functions.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <20091013203425.896285120@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 756d17ee 13-Oct-2009 jolsa@redhat.com <jolsa@redhat.com>

tracing: Support multiple pids in set_pid_ftrace file

Adding the possibility to set more than 1 pid in the set_pid_ftrace
file, thus allowing to trace more than 1 independent processes.

Usage:

sh-4.0# echo 284 > ./set_ftrace_pid
sh-4.0# cat ./set_ftrace_pid
284
sh-4.0# echo 1 >> ./set_ftrace_pid
sh-4.0# echo 0 >> ./set_ftrace_pid
sh-4.0# cat ./set_ftrace_pid
swapper tasks
1
284
sh-4.0# echo 4 > ./set_ftrace_pid
sh-4.0# cat ./set_ftrace_pid
4
sh-4.0# echo > ./set_ftrace_pid
sh-4.0# cat ./set_ftrace_pid
no pid
sh-4.0#

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <20091013203425.565454612@goodmis.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 369bc18f 12-Oct-2009 Stefan Assmann <sassmann@redhat.com>

ftrace: add kernel command line graph function filtering

Add a command line parameter to allow limiting the function graphs
that are traced on boot up from the given top-level callers , when
ftrace=function_graph is specified.

This patch adds the following command line option:
ftrace_graph_filter=function-list

Where function-list is a comma separated list of functions to filter.

[fweisbec@gmail.com: picked the documentation changes from the v2 patch]

Signed-off-by: Stefan Assmann <sassmann@redhat.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <4AD2DEB9.2@redhat.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>


# 3279ba37 07-Oct-2009 Steven Rostedt <srostedt@redhat.com>

ftrace: check for failure for all conversions

Due to legacy code from back when the dynamic tracer used a daemon,
only core kernel code was checking for failures. This is no longer
the case. We must check for failures any time we perform text modifications.

Cc: stable@kernel.org
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# e7247a15 07-Oct-2009 jolsa@redhat.com <jolsa@redhat.com>

tracing: correct module boundaries for ftrace_release

When the module is about the unload we release its call records.
The ftrace_release function was given wrong values representing
the module core boundaries, thus not releasing its call records.

Plus making ftrace_release function module specific.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
LKML-Reference: <1254934835-363-3-git-send-email-jolsa@redhat.com>
Cc: stable@kernel.org
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 33974093 28-Sep-2009 Matt Fleming <matthew.fleming@imgtec.com>

tracing: Fix infinite recursion in ftrace_update_pid_func()

When CONFIG_HAVE_FUNCTION_TRACE_MCOUNT_TEST is enabled
__ftrace_trace_function contains the current trace function, not
ftrace_trace_function.

In ftrace_update_pid_func() we currently incorrectly assign the
value of ftrace_trace_function to __ftrace_trace_funcion before
returning.

Without this patch it is possible to execute an infinite recursion
whereby ftrace_test_stop_func() calls __ftrace_trace_function,
which was assigned ftrace_test_stop_func() in
ftrace_update_pid_func().

Signed-off-by: Matt Fleming <matthew.fleming@imgtec.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <1254152581-18347-1-git-send-email-matt@console-pimps.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 3f6fe06d 24-Sep-2009 Frederic Weisbecker <fweisbec@gmail.com>

tracing/filters: Unify the regex parsing helpers

The filter code has stolen the regex parsing function from ftrace to
get the regex support.
We have duplicated this code, so factorize it in the filter area and
make it generally available, as the filter code is the most suited to
host this feature.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tom Zanussi <tzanussi@gmail.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>


# 8d65af78 23-Sep-2009 Alexey Dobriyan <adobriyan@gmail.com>

sysctl: remove "struct file *" argument of ->proc_handler

It's unused.

It isn't needed -- read or write flag is already passed and sysctl
shouldn't care about the rest.

It _was_ used in two places at arch/frv for some reason.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: David Howells <dhowells@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: James Morris <jmorris@namei.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 88e9d34c 22-Sep-2009 James Morris <jmorris@namei.org>

seq_file: constify seq_operations

Make all seq_operations structs const, to help mitigate against
revectoring user-triggerable function pointers.

This is derived from the grsecurity patch, although generated from scratch
because it's simpler than extracting the changes from there.

Signed-off-by: James Morris <jmorris@namei.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 79fe249c 21-Sep-2009 Li Zefan <lizf@cn.fujitsu.com>

tracing: Fix failure path in ftrace_regex_open()

Don't forget to free trace_parser if seq_open() returned failure.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <4AB86694.4040803@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 1eb90f13 21-Sep-2009 Li Zefan <lizf@cn.fujitsu.com>

tracing: Fix failure path in ftrace_graph_write()

Don't call trace_parser_put() on uninitialized trace_parser.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <4AB86639.3000003@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 4ba7978e 21-Sep-2009 Li Zefan <lizf@cn.fujitsu.com>

tracing: Check the return value of trace_get_user()

Return immediately if trace_get_user() returned failure.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <4AB86614.7020803@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# a4ec5e0c 18-Sep-2009 Li Zefan <lizf@cn.fujitsu.com>

function-graph: use ftrace_graph_funcs directly

No need to store ftrace_graph_funcs in file->private.

Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
LKML-Reference: <4AB32364.7020602@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# b375a11a 16-Sep-2009 Steven Rostedt <srostedt@redhat.com>

tracing: switch function prints from %pf to %ps

For direct function pointers (like what mcount provides) PowerPC64
requires the use of %ps, otherwise nothing is printed.

This patch converts all prints of functions retrieved through mcount
to use the %ps format from the %pf.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# b36461da 15-Sep-2009 Atsushi Tsuji <a-tsuji@bk.jp.nec.com>

tracing: Fix minor bugs for __unregister_ftrace_function_probe

Fix the condition of strcmp for "*".
Also fix NULL pointer dereference when glob is NULL.

Signed-off-by: Atsushi Tsuji <a-tsuji@bk.jp.nec.com>
LKML-Reference: <4AAF6726.5090905@bk.jp.nec.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 689fd8b6 11-Sep-2009 jolsa@redhat.com <jolsa@redhat.com>

tracing: trace parser support for function and graph

Convert the writing to 'set_graph_function', 'set_ftrace_filter'
and 'set_ftrace_notrace' to use the generic trace_parser
'trace_get_user' function.

Removed FTRACE_ITER_CONT flag, since it's not needed after this change.

Minor fix in set_graph_function display - g_show function.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
LKML-Reference: <1252682969-3366-4-git-send-email-jolsa@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# eda1e328 11-Aug-2009 Jiri Olsa <jolsa@redhat.com>

tracing: handle broken names in ftrace filter

If one filter item (for set_ftrace_filter and set_ftrace_notrace) is being
setup by more than 1 consecutive writes (FTRACE_ITER_CONT flag), it won't
be handled corretly.

I used following program to test/verify:

[snip]
#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <string.h>

int main(int argc, char **argv)
{
int fd, i;
char *file = argv[1];

if (-1 == (fd = open(file, O_WRONLY))) {
perror("open failed");
return -1;
}

for(i = 0; i < (argc - 2); i++) {
int len = strlen(argv[2+i]);
int cnt, off = 0;

while(len) {
cnt = write(fd, argv[2+i] + off, len);
len -= cnt;
off += cnt;
}
}

close(fd);
return 0;
}
[snip]

before change:
sh-4.0# echo > ./set_ftrace_filter
sh-4.0# /test ./set_ftrace_filter "sys" "_open "
sh-4.0# cat ./set_ftrace_filter
#### all functions enabled ####
sh-4.0#

after change:
sh-4.0# echo > ./set_ftrace_notrace
sh-4.0# test ./set_ftrace_notrace "sys" "_open "
sh-4.0# cat ./set_ftrace_notrace
sys_open
sh-4.0#

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
LKML-Reference: <20090811152904.GA26065@jolsa.lab.eng.brq.redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 3be04b47 17-Aug-2009 Li Zefan <lizf@cn.fujitsu.com>

ftrace: Simplify seqfile code

Use seq_release_private().

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
LKML-Reference: <4A891AAB.8090701@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 8650ae32 22-Jul-2009 Steven Rostedt <srostedt@redhat.com>

tracing: only truncate ftrace files when O_TRUNC is set

The current code will truncate the ftrace files contents if O_APPEND
is not set and the file is opened in write mode. This is incorrect.
It should only truncate the file if O_TRUNC is set. Otherwise
if one of these files is opened by a C program with fopen "r+",
it will incorrectly truncate the file.

Reported-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 87827111 22-Jul-2009 Li Zefan <lizf@cn.fujitsu.com>

function-graph: Fix seqfile memory leak

Every time we cat set_graph_function, we leak memory allocated
by seq_open().

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
LKML-Reference: <4A67D907.2010500@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 566b0aaf 16-Jul-2009 jolsa@redhat.com <jolsa@redhat.com>

tracing: Remove unused fields/variables

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Cc: rostedt@goodmis.org
LKML-Reference: <1247773468-11594-2-git-send-email-jolsa@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 6f2f3cf0 16-Jul-2009 Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>

tracing/function: Cleanup for function tracer

We can directly use %pf input format instead of kallsyms_lookup()
and %s input format

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Reviewed-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>


# 64fbcd16 14-Jul-2009 Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>

tracing/function: Simplify __ftrace_replace_code()

Rewrite the __ftrace_replace_code() function, simplify it, but don't
change the code's logic.

First, we get the state we want to set, if the record has the same
state, then do nothing, otherwise enable/disable it.

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Reviewed-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>


# 6ab5d668 03-Jun-2009 Steven Rostedt <srostedt@redhat.com>

tracing/function-profiler: do not free per cpu variable stat

The per cpu variable stat is freeded if we fail to allocate a name
on start up. This was due to stat at first being allocated in the
initial design. But since then, it has become a static per cpu variable
but the free on error was not removed.

Also added __init annotation to the function that this is in.

[ Impact: prevent possible memory corruption on low mem at boot up ]

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# a32c7765 26-Jun-2009 Li Zefan <lizf@cn.fujitsu.com>

tracing: Fix stack tracer sysctl handling

This made my machine completely frozen:

# echo 1 > /proc/sys/kernel/stack_tracer_enabled
# echo 2 > /proc/sys/kernel/stack_tracer_enabled

The cause is register_ftrace_function() was called twice.

Also fix ftrace_enabled sysctl, though seems nothing bad happened
as I tested it.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <4A448D17.9010305@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 0296e425 25-Jun-2009 Li Zefan <lizf@cn.fujitsu.com>

ftrace: Fix the output of profile

The first entry of the ftrace profile was always skipped when
reading trace_stat/functionX.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <4A443D59.4080307@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# d82d6244 23-Jun-2009 Li Zefan <lizf@cn.fujitsu.com>

ftrace: Fix t_hash_start()

When the output of set_ftrace_filter is larger than PAGE_SIZE,
t_hash_start() will be called the 2nd time, and then we start
from the head of a hlist, which is wrong and causes some entries
to be outputed twice.

The worse is, if the hlist is large enough, reading set_ftrace_filter
won't stop but in a dead loop.

Reviewed-by: Liming Wang <liming.wang@windriver.com>
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <4A41876E.2060407@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 694ce0a5 23-Jun-2009 Li Zefan <lizf@cn.fujitsu.com>

ftrace: Don't manipulate @pos in t_start()

It's rather confusing that in t_start(), in some cases @pos is
incremented, and in some cases it's decremented and then incremented.

This patch rewrites t_start() in a much more general way.

Thus we fix a bug that if ftrace_filtered == 1, functions have tracer
hooks won't be printed, because the branch is always unreachable:

static void *t_start(...)
{
...
if (!p)
return t_hash_start(m, pos);
return p;
}

Before:
# echo 'sys_open' > /mnt/tracing/set_ftrace_filter
# echo 'sys_write:traceon:4' >> /mnt/tracing/set_ftrace_filter
sys_open

After:
# echo 'sys_open' > /mnt/tracing/set_ftrace_filter
# echo 'sys_write:traceon:4' >> /mnt/tracing/set_ftrace_filter
sys_open
sys_write:traceon:count=4

Reviewed-by: Liming Wang <liming.wang@windriver.com>
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <4A41874B.4090507@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 85951842 23-Jun-2009 Li Zefan <lizf@cn.fujitsu.com>

ftrace: Don't increment @pos in g_start()

It's wrong to increment @pos in g_start(). It causes some entries
lost when reading set_graph_function, if the output of the file
is larger than PAGE_SIZE.

Reviewed-by: Liming Wang <liming.wang@windriver.com>
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <4A418738.7090401@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 9ea1a153 19-Jun-2009 Frederic Weisbecker <fweisbec@gmail.com>

tracing/urgent: warn in case of ftrace_start_up inbalance

Prevent from further ftrace_start_up inbalances so that we avoid
future nop patching omissions with dynamic ftrace.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>


# 84047e36 02-Jun-2009 Steven Rostedt <srostedt@redhat.com>

function-graph: always initialize task ret_stack

On creating a new task while running the function graph tracer, if
we fail to allocate the ret_stack, and then fail the fork, the
code will free the parent ret_stack. This is because the child
duplicated the parent and currently points to the parent's ret_stack.

This patch always initializes the task's ret_stack to NULL.

[ Impact: prevent crash of parent on low memory during fork ]

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 26c01624 02-Jun-2009 Steven Rostedt <srostedt@redhat.com>

function-graph: add memory barriers for accessing task's ret_stack

The code that handles the tasks ret_stack allocation for every task
assumes that only an interrupt can cause issues (even though interrupts
are disabled).

In reality, the code is allocating the ret_stack for tasks that may be
running on other CPUs and there are not efficient memory barriers to
handle this case.

[ Impact: prevent crash due to using of uninitialized ret_stack variables ]

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 82310a32 01-Jun-2009 Steven Rostedt <srostedt@redhat.com>

function-graph: enable the stack after initialization of other variables

The function graph tracer checks if the task_struct has ret_stack defined
to know if it is OK or not to use it. The initialization is done for
all tasks by one process, but the idle tasks use the same initialization
used by new tasks.

If an interrupt happens on an idle task that just had the ret_stack
created, but before the rest of the initialization took place, then
we can corrupt the return address of the functions.

This patch moves the setting of the task_struct's ret_stack to after
the other variables have been initialized.

[ Impact: prevent kernel panic on idle task when starting function graph ]

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 179c498a 01-Jun-2009 Steven Rostedt <srostedt@redhat.com>

function-graph: only allocate init tasks if it was not already done

When the function graph tracer is enabled, it calls the initialization
needed for the init tasks that would be called on all created tasks.

The problem is that this is called every time the function graph tracer
is enabled, and the ret_stack is allocated for the idle tasks each time.
Thus, the old ret_stack is lost and a memory leak is created.

This is also dangerous because if an interrupt happened on another CPU
with the init task and the ret_stack is replaced, we then lose all the
return pointers for the interrupt, and a crash would take place.

[ Impact: fix memory leak and possible crash due to race ]

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 0f6ce3de 01-Jun-2009 Steven Rostedt <srostedt@redhat.com>

ftrace: do not profile functions when disabled

A race was found that if one were to enable and disable the function
profiler repeatedly, then the system can panic. This was because a profiled
function may be preempted just before disabling interrupts. While
the profiler is disabled and then reenabled, the preempted function
could start again, and access the hash as it is being initialized.

This just adds a check in the irq disabled part to check if the profiler
is enabled, and if it is not then it will just exit.

When the system is disabled, the profile_enabled variable is cleared
before calling the unregistering of the function profiler. This
unregistering calls stop machine which also acts as a synchronize schedule.

[ Impact: fix panic in enabling/disabling function profiler ]

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 2af15d6a 28-May-2009 Steven Rostedt <srostedt@redhat.com>

ftrace: add kernel command line function filtering

When using ftrace=function on the command line to trace functions
on boot up, one can not filter out functions that are commonly called.

This patch adds two new ftrace command line commands.

ftrace_notrace=function-list
ftrace_filter=function-list

Where function-list is a comma separated list of functions to filter.
The ftrace_notrace will make the functions listed not be included
in the function tracing, and ftrace_filter will only trace the functions
listed.

These two act the same as the debugfs/tracing/set_ftrace_notrace and
debugfs/tracing/set_ftrace_filter respectively.

The simple glob expressions that are allowed by the filter files can also
be used by the command line interface.

ftrace_notrace=rcu*,*lock,*spin*

Will not trace any function that starts with rcu, ends with lock, or has
the word spin in it.

Note, if the self tests are enabled, they may interfere with the filtering
set by the command lines.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 24ed0c4b 17-May-2009 Ming Lei <tom.leiming@gmail.com>

tracing: fix check for return value of register_module_notifier

return zero should be correct, so fix it.

[ Impact: eliminate incorrect syslog message ]

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: rostedt@goodmis.org
LKML-Reference: <1242545498-7285-1-git-send-email-tom.leiming@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 93eb677d 15-Apr-2009 Steven Rostedt <srostedt@redhat.com>

ftrace: use module notifier for function tracer

The hooks in the module code for the function tracer must be called
before any of that module code runs. The function tracer hooks
modify the module (replacing calls to mcount to nops). If the code
is executed while the change occurs, then the CPU can take a GPF.

To handle the above with a bit of paranoia, I originally implemented
the hooks as calls directly from the module code.

After examining the notifier calls, it looks as though the start up
notify is called before any of the module's code is executed. This makes
the use of the notify safe with ftrace.

Only the startup notify is required to be "safe". The shutdown simply
removes the entries from the ftrace function list, and does not modify
any code.

This change has another benefit. It removes a issue with a reverse dependency
in the mutexes of ftrace_lock and module_mutex.

[ Impact: fix lock dependency bug, cleanup ]

Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# ad8d75ff 14-Apr-2009 Steven Rostedt <srostedt@redhat.com>

tracing/events: move trace point headers into include/trace/events

Impact: clean up

Create a sub directory in include/trace called events to keep the
trace point headers in their own separate directory. Only headers that
declare trace points should be defined in this directory.

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: Zhao Lei <zhaolei@cn.fujitsu.com>
Cc: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 597af815 03-Apr-2009 Steven Rostedt <srostedt@redhat.com>

function-graph: use int instead of atomic for ftrace_graph_active

Impact: cleanup

The variable ftrace_graph_active is only modified under the
ftrace_lock mutex, thus an atomic is not necessary for modification.

Reported-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 5452af66 26-Mar-2009 Frederic Weisbecker <fweisbec@gmail.com>

tracing/ftrace: factorize the tracing files creation

Impact: cleanup

Most of the tracing files creation follow the same pattern:

ret = debugfs_create_file(...)
if (!ret)
pr_warning("Couldn't create ... entry\n")

Unify it!

Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <1238109938-11840-1-git-send-email-fweisbec@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 2aad1b76 30-Mar-2009 Steven Rostedt <srostedt@redhat.com>

function-graph: allow unregistering twice

Impact: fix to permanent disabling of function graph tracer

There should be nothing to prevent a tracer from unregistering a
function graph callback more than once. This can simplify error paths.

But currently, the counter does not account for mulitple unregistering
of the function graph callback. If it happens, the function graph
tracer will be permanently disabled.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 2a93a1f2 12-Jan-2009 Uwe Kleine-König <ukleinek@strlen.de>

trivial: fix typo "resgister" -> "register"

Signed-off-by: Uwe Kleine-Koenig <ukleinek@strlen.de>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>


# 34886c8b 25-Mar-2009 Steven Rostedt <srostedt@redhat.com>

tracing: add average time in function to function profiler

Show the average time in the function (Time / Hit)

Function Hit Time Avg
-------- --- ---- ---
mwait_idle 51 140326.6 us 2751.503 us
smp_apic_timer_interrupt 47 3517.735 us 74.845 us
schedule 10 2738.754 us 273.875 us
__schedule 10 2732.857 us 273.285 us
hrtimer_interrupt 47 1896.104 us 40.342 us
irq_exit 56 1711.833 us 30.568 us
__run_hrtimer 47 1315.589 us 27.991 us
tick_sched_timer 47 1138.690 us 24.227 us
do_softirq 56 1116.829 us 19.943 us
__do_softirq 56 1066.932 us 19.052 us
do_IRQ 9 926.153 us 102.905 us

Signed-off-by: Steven Rostedt <srostedt@redhat.com>


# 318e0a73 25-Mar-2009 Steven Rostedt <srostedt@redhat.com>

tracing: remove on the fly allocator from function profiler

Impact: safer code

The on the fly allocator for the function profiler was to save
memory. But at the expense of stability. Although it survived several
tests, allocating from the function tracer is just too risky, just
to save space.

This patch removes the allocator and simply allocates enough entries
at start up.

Each function gets a profiling structure of 40 bytes. With an average
of 20K functions, and this is for each CPU, we have 800K per online
CPU. This is not too bad, at least for non-embedded.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>


# 2a4efa42 24-Mar-2009 Zhaolei <zhaolei@cn.fujitsu.com>

ftrace: Using FTRACE_WARN_ON() to check "freed record" in ftrace_release()

"Because when we call ftrace_free_rec we change the rec->ip to point to the
next record in the chain. Something is very wrong if rec->ip >= s &&
rec->ip < e and the record is already free."

"Note, use FTRACE_WARN_ON() macro. This way it shuts down ftrace if it is
hit and helps to avoid further damage later."
-- Steven Rostedt <rostedt@goodmis.org>

Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>


# fb9fb015 25-Mar-2009 Steven Rostedt <srostedt@redhat.com>

tracing: clean up tracing profiler

Ingo Molnar suggested clean ups for the profiling code. This patch
makes those updates.

Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>


# a2a16d6a 24-Mar-2009 Steven Rostedt <srostedt@redhat.com>

function-graph: add option to calculate graph time or not

graph time is the time that a function is executing another function.
Thus if function A calls B, if graph-time is set, then the time for
A includes B. This is the default behavior. But if graph-time is off,
then the time spent executing B is subtracted from A.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>


# cafb168a 24-Mar-2009 Steven Rostedt <srostedt@redhat.com>

tracing: make the function profiler per cpu

Impact: speed enhancement

By making the function profiler record in per cpu data we not only
get better readings, avoid races, we also do not have to take any
locks.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>


# 0706f1c4 23-Mar-2009 Steven Rostedt <srostedt@redhat.com>

tracing: adding function timings to function profiler

If the function graph trace is enabled, the function profiler will
use it to take the timing of the functions.

cat /debug/tracing/trace_stat/functions

Function Hit Time
-------- --- ----
mwait_idle 127 183028.4 us
schedule 26 151997.7 us
__schedule 31 151975.1 us
sys_wait4 2 74080.53 us
do_wait 2 74077.80 us
sys_newlstat 138 39929.16 us
do_path_lookup 179 39845.79 us
vfs_lstat_fd 138 39761.97 us
user_path_at 153 39469.58 us
path_walk 179 39435.76 us
__link_path_walk 189 39143.73 us
[...]

Note the times are skewed due to the function graph tracer not taking
into account schedules.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>


# 493762fc 23-Mar-2009 Steven Rostedt <srostedt@redhat.com>

tracing: move function profiler data out of function struct

Impact: reduce size of memory in function profiler

The function profiler originally introduces its counters into the
function records itself. There is 20 thousand different functions on
a normal system, and that is adding 20 thousand counters for profiling
event when not needed.

A normal run of the profiler yields only a couple of thousand functions
executed, depending on what is being profiled. This means we have around
18 thousand useless counters.

This patch rectifies this by moving the data out of the function
records used by dynamic ftrace. Data is preallocated to hold the functions
when the profiling begins. Checks are made during profiling to see if
more recorcds should be allocated, and they are allocated if it is safe
to do so.

This also removes the dependency from using dynamic ftrace, and also
removes the overhead by having it enabled.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>


# bac429f0 19-Mar-2009 Steven Rostedt <srostedt@redhat.com>

tracing: add function profiler

Impact: new profiling feature

This patch adds a function profiler. In debugfs/tracing/ two new
files are created.

function_profile_enabled - to enable or disable profiling

trace_stat/functions - the profiled functions.

For example:

echo 1 > /debugfs/tracing/function_profile_enabled
./hackbench 50
echo 0 > /debugfs/tracing/function_profile_enabled

yields:

cat /debugfs/tracing/trace_stat/functions

Function Hit
-------- ---
_spin_lock 10106442
_spin_unlock 10097492
kfree 6013704
_spin_unlock_irqrestore 4423941
_spin_lock_irqsave 4406825
__phys_addr 4181686
__slab_free 4038222
dput 4030130
path_put 4023387
unroll_tree_refs 4019532
[...]

The most hit functions are listed first. Functions that are not
hit are not listed.

This feature depends on and uses dynamic function tracing. When the
function profiling is disabled, no overhead occurs. But it still
takes up around 300KB to hold the data, thus it is not recomended
to keep it enabled for systems low on memory.

When a '1' is echoed into the function_profile_enabled file, the
counters for is function is reset back to zero. Thus you can see what
functions are hit most by different programs.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>


# ee000b7f 23-Mar-2009 Lai Jiangshan <laijs@cn.fujitsu.com>

tracing: use union for multi-usages field

Impact: cleanup

struct dyn_ftrace::ip has different usages in his lifecycle,
we use union for it. And also for struct dyn_ftrace::flags.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Steven Rostedt <srostedt@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <49C871BE.3080405@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# cc59c9e8 23-Mar-2009 Lai Jiangshan <laijs@cn.fujitsu.com>

ftrace: show virtual PID

Impact: fix PID output under namespaces

When current namespace is not the global namespace,
pid read from set_ftrace_pid is no correct.

# ~/newpid_namespace_run bash
# echo $$
1
# echo 1 > set_ftrace_pid
# cat set_ftrace_pid
3756

Since we write virtual PID to set_ftrace_pid, we need get
virtual PID when we read it.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Steven Rostedt <srostedt@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <49C84D65.9050606@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# be6f164a 24-Mar-2009 Steven Rostedt <srostedt@redhat.com>

function-graph: add option for include sleep times

Impact: give user a choice to show times spent while sleeping

The user may want to see the time a function spent sleeping.
This patch adds the trace option "sleep-time" to allow that.
The "sleep-time" option is default on.

echo sleep-time > /debug/tracing/trace_options

produces:

------------------------------------------
2) avahi-d-3428 => <idle>-0
------------------------------------------

2) | finish_task_switch() {
2) 0.621 us | _spin_unlock_irq();
2) 2.202 us | }
2) ! 1002.197 us | }
2) ! 1003.521 us | }

where as,

echo nosleep-time > /debug/tracing/trace_options

produces:

0) <idle>-0 => yum-upd-3416
------------------------------------------

0) | finish_task_switch() {
0) 0.643 us | _spin_unlock_irq();
0) 2.342 us | }
0) + 41.302 us | }
0) + 42.453 us | }

Signed-off-by: Steven Rostedt <srostedt@redhat.com>


# 8aef2d28 23-Mar-2009 Steven Rostedt <srostedt@redhat.com>

function-graph: ignore times across schedule

Impact: more accurate timings

The current method of function graph tracing does not take into
account the time spent when a task is not running. This shows functions
that call schedule have increased costs:

3) + 18.664 us | }
------------------------------------------
3) <idle>-0 => kblockd-123
------------------------------------------

3) | finish_task_switch() {
3) 1.441 us | _spin_unlock_irq();
3) 3.966 us | }
3) ! 2959.433 us | }
3) ! 2961.465 us | }

This patch uses the tracepoint in the scheduling context switch to
account for time that has elapsed while a task is scheduled out.
Now we see:

------------------------------------------
3) <idle>-0 => edac-po-1067
------------------------------------------

3) | finish_task_switch() {
3) 0.685 us | _spin_unlock_irq();
3) 2.331 us | }
3) + 41.439 us | }
3) + 42.663 us | }

Signed-off-by: Steven Rostedt <srostedt@redhat.com>


# 05ce5818 23-Mar-2009 Steven Rostedt <srostedt@redhat.com>

function-graph: prevent more than one tracer registering

Impact: prevent crash due to multiple function graph tracers

The function graph tracer can currently only handle a single tracer
being registered. If another tracer registers with the function
graph tracer it can crash the system.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>


# 03303549 16-Mar-2009 Frederic Weisbecker <fweisbec@gmail.com>

tracing/ftrace: fix the check on nopped sites

Impact: fix a dynamic tracing failure

Recently, the function and function graph tracers failed to use dynamic
tracing after the following commit:

fa9d13cf135efbd454453a53b6299976bea245a9
(ftrace: don't try to __ftrace_replace_code on !FTRACE_FL_CONVERTED rec)

The patch is right except a mistake on the check for the FTRACE_FL_CONVERTED
flag. The code patching is aborted in case of successfully nopped sites.
What we want is the opposite: ignore the callsites that haven't been nopped.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>


# e94142a6 13-Mar-2009 Lai Jiangshan <laijs@cn.fujitsu.com>

ftrace: remove struct list_head from struct dyn_ftrace

Impact: save memory

The struct dyn_ftrace table is very large, this patch will save
about 50%.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Steven Rostedt <srostedt@redhat.com>
LKML-Reference: <49BA2C9F.8020009@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 850a80cf 13-Mar-2009 Lai Jiangshan <laijs@cn.fujitsu.com>

ftrace: use seq_read

Impact: cleanup

VFS layer has tested the file mode, we do not need test it.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Steven Rostedt <srostedt@redhat.com>
LKML-Reference: <49BA2BAB.6010608@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# fa9d13cf 13-Mar-2009 Zhaolei <zhaolei@cn.fujitsu.com>

ftrace: don't try to __ftrace_replace_code on !FTRACE_FL_CONVERTED rec

Do __ftrace_replace_code for !FTRACE_FL_CONVERTED rec will always
fail, we should ignore this rec.

Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Cc: "Steven Rostedt ;" <rostedt@goodmis.org>
LKML-Reference: <49BA2472.4060206@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# b00f0b6d 13-Mar-2009 Zhaolei <zhaolei@cn.fujitsu.com>

ftrace: avoid double-free of dyn_ftrace

If dyn_ftrace is freed before ftrace_release(), ftrace_release()
will free it again and make ftrace_free_records wrong.

Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Cc: "Steven Rostedt ;" <rostedt@goodmis.org>
LKML-Reference: <49BA23D9.1050900@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 10dd3ebe 05-Mar-2009 KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>

tracing: fix deadlock when setting set_ftrace_pid

Impact: fix deadlock while using set_ftrace_pid

Reproducer:

# cd /sys/kernel/debug/tracing
# echo $$ > set_ftrace_pid

then, console becomes hung.

Details:

when writing set_ftracepid, kernel callstack is following

ftrace_pid_write()
mutex_lock(&ftrace_lock);
ftrace_update_pid_func()
mutex_lock(&ftrace_lock);
mutex_unlock(&ftrace_lock);
mutex_unlock(&ftrace_lock);

then, system always deadlocks when ftrace_pid_write() is called.

In past days, ftrace_pid_write() used ftrace_start_lock, but
commit e6ea44e9b4c12325337cd1c06103cd515a1c02b2 consolidated
ftrace_start_lock to ftrace_lock.

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Steven Rostedt <srostedt@redhat.com>
LKML-Reference: <20090306151155.0778.A69D9226@jp.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 5e2336a0 05-Mar-2009 Steven Rostedt <srostedt@redhat.com>

tracing: make all file_operations const

Impact: cleanup

All file_operations structures should be constant. No one is going to
change them.

Reported-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>


# 4377245a 19-Feb-2009 Steven Rostedt <srostedt@redhat.com>

ftrace: break out modify loop immediately on detection of error

Impact: added precaution on failure detection

Break out of the modifying loop as soon as a failure is detected.
This is just an added precaution found by code review and was not
found by any bug chasing.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>


# 000ab691 17-Feb-2009 Steven Rostedt <srostedt@redhat.com>

ftrace: allow archs to preform pre and post process for code modification

This patch creates the weak functions: ftrace_arch_code_modify_prepare
and ftrace_arch_code_modify_post_process that are called before and
after the stop machine is called to modify the kernel text.

If the arch needs to do pre or post processing, it only needs to define
these functions.

[ Update: Ingo Molnar suggested using the name ftrace_arch_code_modify_*
over using ftrace_arch_modify_* ]

Signed-off-by: Steven Rostedt <srostedt@redhat.com>


# f9349a8f 19-Feb-2009 Frederic Weisbecker <fweisbec@gmail.com>

tracing/function-graph-tracer: make set_graph_function file support ftrace regex

Impact: trace only functions matching a pattern

The set_graph_function file let one to trace only one or several
chosen functions and follow all their code flow.

Currently, only a constant function name is allowed so this patch
allows the ftrace_regex functions:

- matches all functions that end with "name":
echo *name > set_graph_function

- matches all functions that begin with "name":
echo name* > set_graph_function

- matches all functions that contains "name":
echo *name* > set_graph_function

Example:

echo mutex* > set_graph_function

0) | mutex_lock_nested() {
0) 0.563 us | __might_sleep();
0) 2.072 us | }
0) | mutex_unlock() {
0) 1.036 us | __mutex_unlock_slowpath();
0) 2.433 us | }
0) | mutex_unlock() {
0) 0.691 us | __mutex_unlock_slowpath();
0) 1.787 us | }
0) | mutex_lock_interruptible_nested() {
0) 0.548 us | __might_sleep();
0) 1.945 us | }

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 5b058bcd 17-Feb-2009 Frederic Weisbecker <fweisbec@gmail.com>

tracing/function-graph-tracer: trace the idle tasks

When the function graph tracer is activated, it iterates over the task_list
to allocate a stack to store the return addresses.

But the per cpu idle tasks are not iterated by using
do_each_thread / while_each_thread.

So we have to iterate on them manually.

This fixes somes weirdness in the traces and many losses of traces.
Examples on two cpus:

0) Xorg-4287 | 2.906 us | }
0) Xorg-4287 | 3.965 us | }
0) Xorg-4287 | 5.302 us | }
------------------------------------------
0) Xorg-4287 => <idle>-0
------------------------------------------

0) <idle>-0 | 2.861 us | }
0) <idle>-0 | 0.526 us | set_normalized_timespec();
0) <idle>-0 | 7.201 us | }
0) <idle>-0 | 8.214 us | }
0) <idle>-0 | | clockevents_program_event() {
0) <idle>-0 | | lapic_next_event() {
0) <idle>-0 | 0.510 us | native_apic_mem_write();
0) <idle>-0 | 1.546 us | }
0) <idle>-0 | 2.583 us | }
0) <idle>-0 | + 12.435 us | }
0) <idle>-0 | + 13.470 us | }
0) <idle>-0 | 0.608 us | _spin_unlock_irqrestore();
0) <idle>-0 | + 23.270 us | }
0) <idle>-0 | + 24.336 us | }
0) <idle>-0 | + 25.417 us | }
0) <idle>-0 | 0.593 us | _spin_unlock();
0) <idle>-0 | + 41.869 us | }
0) <idle>-0 | + 42.906 us | }
0) <idle>-0 | + 95.035 us | }
0) <idle>-0 | 0.540 us | menu_reflect();
0) <idle>-0 | ! 100.404 us | }
0) <idle>-0 | 0.564 us | mce_idle_callback();
0) <idle>-0 | | enter_idle() {
0) <idle>-0 | 0.526 us | mce_idle_callback();
0) <idle>-0 | 1.757 us | }
0) <idle>-0 | | cpuidle_idle_call() {
0) <idle>-0 | | menu_select() {
0) <idle>-0 | 0.525 us | pm_qos_requirement();
0) <idle>-0 | 0.518 us | tick_nohz_get_sleep_length();
0) <idle>-0 | 2.621 us | }
[...]
1) <idle>-0 | 0.518 us | touch_softlockup_watchdog();
1) <idle>-0 | + 14.355 us | }
1) <idle>-0 | + 22.840 us | }
1) <idle>-0 | + 25.949 us | }
1) <idle>-0 | | handle_irq() {
1) <idle>-0 | 0.511 us | irq_to_desc();
1) <idle>-0 | | handle_edge_irq() {
1) <idle>-0 | 0.638 us | _spin_lock();
1) <idle>-0 | | ack_apic_edge() {
1) <idle>-0 | 0.510 us | irq_to_desc();
1) <idle>-0 | | move_native_irq() {
1) <idle>-0 | 0.510 us | irq_to_desc();
1) <idle>-0 | 1.532 us | }
1) <idle>-0 | 0.511 us | native_apic_mem_write();
------------------------------------------
1) <idle>-0 => cat-5073
------------------------------------------

1) cat-5073 | 3.731 us | }
1) cat-5073 | | run_local_timers() {
1) cat-5073 | 0.533 us | hrtimer_run_queues();
1) cat-5073 | | raise_softirq() {
1) cat-5073 | | __raise_softirq_irqoff() {
1) cat-5073 | | /* nr: 1 */
1) cat-5073 | 2.718 us | }
1) cat-5073 | 3.814 us | }

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# b6887d79 16-Feb-2009 Steven Rostedt <srostedt@redhat.com>

ftrace: rename _hook to _probe

Impact: clean up

Ingo Molnar did not like the _hook naming convention used by the
select function tracer. Luis Claudio R. Goncalves suggested using
the "_probe" extension. This patch implements the change of
calling the functions and variables "_hook" and replacing them
with "_probe".

Signed-off-by: Steven Rostedt <srostedt@redhat.com>


# 6a24a244 17-Feb-2009 Steven Rostedt <srostedt@redhat.com>

ftrace: clean up coding style

Ingo Molnar pointed out some coding style issues with the recent ftrace
updates. This patch cleans them up.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>


# 73d3fd96 17-Feb-2009 Ingo Molnar <mingo@elte.hu>

ftrace: fix !CONFIG_DYNAMIC_FTRACE ftrace_swapper_pid definition

Impact: build fix

Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 809dcf29 16-Feb-2009 Steven Rostedt <srostedt@redhat.com>

ftrace: add pretty print to selected fuction traces

This patch adds a call back for the tracers that have hooks to
selected functions. This allows the tracer to show better output
in the set_ftrace_filter file.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>


# 8fc0c701 16-Feb-2009 Steven Rostedt <srostedt@redhat.com>

ftrace: show selected functions in set_ftrace_filter

This patch adds output to show what functions have tracer hooks
attached to them.

# echo 'sys_open:traceon:4' > /debug/tracing/set_ftrace_filter
# cat set_ftrace_filter

#### all functions enabled ####
sys_open:ftrace_traceon:0000000000000004

# echo 'do_fork:traceoff:' > set_ftrace_filter
# cat set_ftrace_filter

#### all functions enabled ####
sys_open:ftrace_traceon:0000000000000002
do_fork:ftrace_traceoff:ffffffffffffffff

Note the 4 changed to a 2. This is because The code was executed twice
since the traceoff was added. If a cat is done again:

#### all functions enabled ####
sys_open:ftrace_traceon
do_fork:ftrace_traceoff:ffffffffffffffff

The number disappears. That is because it will not print a NULL.

Callbacks to allow the tracer to pretty print will be implemented soon.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>


# 59df055f 14-Feb-2009 Steven Rostedt <srostedt@redhat.com>

ftrace: trace different functions with a different tracer

Impact: new feature

Currently, the function tracer only gives you an ability to hook
a tracer to all functions being traced. The dynamic function trace
allows you to pick and choose which of those functions will be
traced, but all functions being traced will call all tracers that
registered with the function tracer.

This patch adds a new feature that allows a tracer to hook to specific
functions, even when all functions are being traced. It allows for
different functions to call different tracer hooks.

The way this is accomplished is by a special function that will hook
to the function tracer and will set up a hash table knowing which
tracer hook to call with which function. This is the most general
and easiest method to accomplish this. Later, an arch may choose
to supply their own method in changing the mcount call of a function
to call a different tracer. But that will be an exercise for the
future.

To register a function:

struct ftrace_hook_ops {
void (*func)(unsigned long ip,
unsigned long parent_ip,
void **data);
int (*callback)(unsigned long ip, void **data);
void (*free)(void **data);
};

int register_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data);

glob is a simple glob to search for the functions to hook.
ops is a pointer to the operations (listed below)
data is the default data to be passed to the hook functions when traced

ops:
func is the hook function to call when the functions are traced
callback is a callback function that is called when setting up the hash.
That is, if the tracer needs to do something special for each
function, that is being traced, and wants to give each function
its own data. The address of the entry data is passed to this
callback, so that the callback may wish to update the entry to
whatever it would like.
free is a callback for when the entry is freed. In case the tracer
allocated any data, it is give the chance to free it.

To unregister we have three functions:

void
unregister_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data)

This will unregister all hooks that match glob, point to ops, and
have its data matching data. (note, if glob is NULL, blank or '*',
all functions will be tested).

void
unregister_ftrace_function_hook_func(char *glob,
struct ftrace_hook_ops *ops)

This will unregister all functions matching glob that has an entry
pointing to ops.

void unregister_ftrace_function_hook_all(char *glob)

This simply unregisters all funcs.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>


# e6ea44e9 13-Feb-2009 Steven Rostedt <srostedt@redhat.com>

ftrace: consolidate mutexes

Impact: clean up

Now that ftrace_lock is a mutex, there is no reason to have three
different mutexes protecting similar data. All the mutex paths
are not in hot paths, so having a mutex to cover more data is
not a problem.

This patch removes the ftrace_sysctl_lock and ftrace_start_lock
and uses the ftrace_lock to protect the locations that were protected
by these locks. By doing so, this change also removes some of
the lock nesting that was taking place.

There are still more mutexes in ftrace.c that can probably be
consolidated, but they can be dealt with later. We need to be careful
about the way the locks are nested, and by consolidating, we can cause
a recursive deadlock.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>


# 52baf119 13-Feb-2009 Steven Rostedt <srostedt@redhat.com>

ftrace: convert ftrace_lock from a spinlock to mutex

Impact: clean up

The older versions of ftrace required doing the ftrace list
search under atomic context. Now all the calls are in non-atomic
context. There is no reason to keep the ftrace_lock as a spinlock.

This patch converts it to a mutex.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>


# f6180773 13-Feb-2009 Steven Rostedt <srostedt@redhat.com>

ftrace: add command interface for function selection

Allow for other tracers to add their own commands for function
selection. This interface gives a trace the ability to name a
command for function selection. Right now it is pretty limited
in what it offers, but this is a building step for more features.

The :mod: command is converted to this interface and also serves
as a template for other implementations.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>


# e68746a2 13-Feb-2009 Steven Rostedt <srostedt@redhat.com>

ftrace: enable filtering only when a function is filtered on

Impact: fix to prevent empty set_ftrace_filter and no ftrace output

The function filter is used to only trace a given set of functions.
The filter is enabled when a function name is echoed into the
set_ftrace_filter file. But if the name has a typo and the function
is not found, the filter is enabled, but no function is listed.

This makes a confusing situation where set_ftrace_filter is empty
but no functions ever get enabled for tracing.

For example:

# cat /debug/tracing/set_ftrace_filter

#### all functions enabled ####

# echo bad_name > set_ftrace_filter
# cat /debug/tracing/set_ftrace_filter

# echo function > current_tracer
# cat trace

# tracer: nop
#
# TASK-PID CPU# TIMESTAMP FUNCTION
# | | | | |

This patch changes that to only enable filtering if a function
is set to be filtered on. Now, the filter is not enabled if
a bad name is echoed into set_ftrace_filter.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>


# 64e7c440 13-Feb-2009 Steven Rostedt <srostedt@redhat.com>

ftrace: add module command function filter selection

This patch adds a "command" syntax to the function filtering files:

/debugfs/tracing/set_ftrace_filter
/debugfs/tracing/set_ftrace_notrace

Of the format: <function>:<command>:<parameter>

The command is optional, and dependent on the command, so are
the parameters.

echo do_fork > set_ftrace_filter

Will only trace 'do_fork'.

echo 'sched_*' > set_ftrace_filter

Will only trace functions starting with the letters 'sched_'.

echo '*:mod:ext3' > set_ftrace_filter

Will trace only the ext3 module functions.

echo '*write*:mod:ext3' > set_ftrace_notrace

Will prevent the ext3 functions with the letters 'write' in
the name from being traced.

echo '!*_allocate:mod:ext3' > set_ftrace_filter

Will remove the functions in ext3 that end with the letters
'_allocate' from the ftrace filter.

Although this patch implements the 'command' format, only the
'mod' command is supported. More commands to follow.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>


# 9f4801e3 13-Feb-2009 Steven Rostedt <srostedt@redhat.com>

ftrace: break up ftrace_match_records into smaller components

Impact: clean up

ftrace_match_records does a lot of things that other features
can use. This patch breaks up ftrace_match_records and pulls
out ftrace_setup_glob and ftrace_match_record.

ftrace_setup_glob prepares a simple glob expression for use with
ftrace_match_record. ftrace_match_record compares a single record
with a glob type.

Breaking this up will allow for more features to run on individual
records.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>


# 7f24b31b 13-Feb-2009 Steven Rostedt <srostedt@redhat.com>

ftrace: rename ftrace_match to ftrace_match_records

Impact: clean up

ftrace_match is too generic of a name. What it really does is
search all records and matches the records with the given string,
and either sets or unsets the functions to be traced depending
on if the parameter 'enable' is set or not.

This allows us to make another function called ftrace_match that
can be used to test a single record.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>


# 265c831c 12-Feb-2009 Steven Rostedt <srostedt@redhat.com>

ftrace: add do_for_each_ftrace_rec and while_for_each_ftrace_rec

Impact: clean up

To iterate over all the functions that dynamic trace knows about
it requires two for loops. One to iterate over the pages and the
other to iterate over the records within the page.

There are several duplications of these loops in ftrace.c. This
patch creates the macros do_for_each_ftrace_rec and
while_for_each_ftrace_rec to handle this logic, and removes the
duplicate code.

While making this change, I also discovered and fixed a small
bug that one of the iterations should exit the loop after it found the
record it was searching for. This used a break when it should have
used a goto, since there were two loops it needed to break out
from. No real harm was done by this bug since it would only continue
to search the other records, and the code was in a slow path anyway.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>


# 0c75a3ed 16-Feb-2009 Steven Rostedt <srostedt@redhat.com>

ftrace: state that all functions are enabled in set_ftrace_filter

Impact: clean up, make set_ftrace_filter less confusing

The set_ftrace_filter shows only the functions that will be traced.
But when it is empty, it will trace all functions. This can be a bit
confusing.

This patch makes set_ftrace_filter show:

#### all functions enabled ####

When all functions will be traced, and we do not filter only a select
few.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>


# 57794a9d 06-Feb-2009 Wenji Huang <wenji.huang@oracle.com>

trace: trivial fixes in comment typos.

Impact: clean up

Fixed several typos in the comments.

Signed-off-by: Wenji Huang <wenji.huang@oracle.com>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>


# 229c4ef8 03-Feb-2009 Oleg Nesterov <oleg@redhat.com>

ftrace: do_each_pid_task() needs rcu lock

"ftrace: use struct pid" commit 978f3a45d9499c7a447ca7615455cefb63d44165
converted ftrace_pid_trace to "struct pid*".

But we can't use do_each_pid_task() without rcu_read_lock() even if
we know the pid itself can't go away (it was pinned in ftrace_pid_write).
The exiting task can detach itself from this pid at any moment.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 00f57f54 14-Jan-2009 Frederic Weisbecker <fweisbec@gmail.com>

tracing/function-graph-tracer: fix a regression while suspend to disk

Impact: fix a crash while kernel image restore

When the function graph tracer is running and while suspend to disk, some racy
and dangerous things happen against this tracer.

The current task will save its registers including the stack pointer which
contains the return address hooked by the tracer. But the current task will
continue to enter other functions after that to save the memory, and then
it will store other return addresses, and finally loose the old depth which
matches the return address saved in the old stack (during the registers saving).

So on image restore, the code will return to wrong addresses.
And there are other things: on restore, the task will have it's "current"
pointer overwritten during registers restoring....switching from one task to
another... That would be insane to try to trace function graphs at these
stages.

This patch makes the function graph tracer listening on power events, making
it's tracing disabled for the current task (the one that performs the
hibernation work) while suspend/resume to disk, making the tracing safe
during hibernation.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 4a2b8dda 14-Jan-2009 Frederic Weisbecker <fweisbec@gmail.com>

tracing/function-graph-tracer: fix a regression while suspend to disk

Impact: fix a crash while kernel image restore

When the function graph tracer is running and while suspend to disk, some racy
and dangerous things happen against this tracer.

The current task will save its registers including the stack pointer which
contains the return address hooked by the tracer. But the current task will
continue to enter other functions after that to save the memory, and then
it will store other return addresses, and finally loose the old depth which
matches the return address saved in the old stack (during the registers saving).

So on image restore, the code will return to wrong addresses.
And there are other things: on restore, the task will have it's "current"
pointer overwritten during registers restoring....switching from one task to
another... That would be insane to try to trace function graphs at these
stages.

This patch makes the function graph tracer listening on power events, making
it's tracing disabled for the current task (the one that performs the
hibernation work) while suspend/resume to disk, making the tracing safe
during hibernation.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 32632920 12-Jan-2009 Uwe Kleine-König <u.kleine-koenig@pengutronix.de>

ftrace, trivial: fix typo "resgister" -> "register"

Signed-off-by: Uwe Kleine-Koenig <ukleinek@strlen.de>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# f0001207 08-Jan-2009 Shaohua Li <shaohua.li@intel.com>

ftrace, ia64: Add macro for ftrace_caller

Define FTRACE_ADDR. In IA64, a function pointer isn't a 'unsigned long' but a
'struct {unsigned long ip, unsigned long gp}'.

Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 25aac9dc 08-Jan-2009 Shaohua Li <shaohua.li@intel.com>

ftrace, ia64: explictly ignore a file in recordmcount.pl

In IA64, a function pointer isn't a 'unsigned long' but a
'struct {unsigned long ip, unsigned long gp}'. MCOUNT_ADDR is determined
at link time not compile time, so explictly ignore kernel/trace/ftrace.o
in recordmcount.pl.

Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 431aa3fb 05-Jan-2009 Steven Rostedt <srostedt@redhat.com>

ftrace: convert unsigned index to signed

Impact: fix to unsigned compared to less than zero

Roel Kluin pointed out that there is a compare of an unsigned number
to less than zero. A previous clean up had the unsigned index set
to -1 for certain cases, but never converted it to signed.

Frederic Weisbecker noticed that another index is used to compare
the above index to and it also needs to be converted to signed.

[
Converted ftrace_page->index to int from unsigned long as
Andrew Morton pointed out that there's no need for it to
stay a long.
]

Reported-by: Roel Kluin <roel.kluin@gmail.com>
Reported-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# ea3a6d6d 17-Dec-2008 Steven Rostedt <srostedt@redhat.com>

ftrace: add not to regex on filtering functions

Impact: enhancement

Ingo Molnar has asked about a way to remove items from the filter
lists. Currently, you can only add or replace items. The way
items are added to the list is through opening one of the list
files (set_ftrace_filter or set_ftrace_notrace) via append.
If the file is opened for truncate, the list is cleared.

echo spin_lock > /debug/tracing/set_ftrace_filter

The above will replace the list with only spin_lock

echo spin_lock >> /debug/tracing/set_ftrace_filter

The above will add spin_lock to the list.

Now this patch adds:

echo '!spin_lock' >> /debug/tracing/set_ftrace_filter

This will remove spin_lock from the list.

The limited glob features of these lists also can be notted.

echo '!spin_*' >> /debug/tracing/set_ftrace_filter

This will remove all functions that start with 'spin_'

Note:

echo '!spin_*' > /debug/tracing/set_ftrace_filter

will simply clear out the list (notice the '>' instead of '>>')

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 380c4b14 05-Dec-2008 Frederic Weisbecker <fweisbec@gmail.com>

tracing/function-graph-tracer: append the tracing_graph_flag

Impact: Provide a way to pause the function graph tracer

As suggested by Steven Rostedt, the previous patch that prevented from
spinlock function tracing shouldn't use the raw_spinlock to fix it.
It's much better to follow lockdep with normal spinlock, so this patch
adds a new flag for each task to make the function graph tracer able
to be paused. We also can send an ftrace_printk whithout worrying of
the irrelevant traced spinlock during insertion.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 21bbecda 04-Dec-2008 Steven Rostedt <srostedt@redhat.com>

ftrace: use init_struct_pid as swapper pid

Impact: clean up

Using (struct pid *)-1 as the pointer for ftrace_swapper_pid is
a little confusing for others. This patch uses the address of the
actual init pid structure instead. This change is only for
clarity. It does not affect the code itself. Hopefully soon the
swapper tasks will all have their own pid structure and then
we can clean up the code a bit more.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# faec2ec5 03-Dec-2008 Liming Wang <liming.wang@windriver.com>

ftrace: avoid duplicated function when writing set_graph_function

Impact: fix a bug in function filter setting

when writing function to set_graph_function, we should check whether it
has existed in set_graph_function to avoid duplicating.

Signed-off-by: Liming Wang <liming.wang@windriver.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# e32d8956 03-Dec-2008 Steven Rostedt <srostedt@redhat.com>

ftrace: add ability to only trace swapper tasks

Impact: new feature

This patch lets the swapper tasks of all CPUS be filtered by the
set_ftrace_pid file.

If '0' is echoed into this file, then all the idle tasks (aka swapper)
is flagged to be traced. This affects all CPU idle tasks.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 978f3a45 03-Dec-2008 Steven Rostedt <srostedt@redhat.com>

ftrace: use struct pid

Impact: clean up, extend PID filtering to PID namespaces

Eric Biederman suggested using the struct pid for filtering on
pids in the kernel. This patch is based off of a demonstration
of an implementation that Eric sent me in an email.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 804a6851 03-Dec-2008 Steven Rostedt <srostedt@redhat.com>

ftrace: trace single pid for function graph tracer

Impact: New feature

This patch makes the changes to set_ftrace_pid apply to the function
graph tracer.

# echo $$ > /debugfs/tracing/set_ftrace_pid
# echo function_graph > /debugfs/tracing/current_tracer

Will cause only the current task to be traced. Note, the trace flags are
also inherited by child processes, so the children of the shell
will also be traced.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 0ef8cde5 03-Dec-2008 Steven Rostedt <srostedt@redhat.com>

ftrace: use task struct trace flag to filter on pid

Impact: clean up

Use the new task struct trace flags to determine if a process should be
traced or not.

Note: this moves the searching of the pid to the slow path of setting
the pid field. This needs to be converted to the pid name space.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# ea4e2bc4 03-Dec-2008 Steven Rostedt <srostedt@redhat.com>

ftrace: graph of a single function

This patch adds the file:

/debugfs/tracing/set_graph_function

which can be used along with the function graph tracer.

When this file is empty, the function graph tracer will act as
usual. When the file has a function in it, the function graph
tracer will only trace that function.

For example:

# echo blk_unplug > /debugfs/tracing/set_graph_function
# cat /debugfs/tracing/trace
[...]
------------------------------------------
| 2) make-19003 => kjournald-2219
------------------------------------------

2) | blk_unplug() {
2) | dm_unplug_all() {
2) | dm_get_table() {
2) 1.381 us | _read_lock();
2) 0.911 us | dm_table_get();
2) 1. 76 us | _read_unlock();
2) + 12.912 us | }
2) | dm_table_unplug_all() {
2) | blk_unplug() {
2) 0.778 us | generic_unplug_device();
2) 2.409 us | }
2) 5.992 us | }
2) 0.813 us | dm_table_put();
2) + 29. 90 us | }
2) + 34.532 us | }

You can add up to 32 functions into this file. Currently we limit it
to 32, but this may change with later improvements.

To add another function, use the append '>>':

# echo sys_read >> /debugfs/tracing/set_graph_function
# cat /debugfs/tracing/set_graph_function
blk_unplug
sys_read

Using the '>' will clear out the function and write anew:

# echo sys_write > /debug/tracing/set_graph_function
# cat /debug/tracing/set_graph_function
sys_write

Note, if you have function graph running while doing this, the small
time between clearing it and updating it will cause the graph to
record all functions. This should not be an issue because after
it sets the filter, only those functions will be recorded from then on.
If you need to only record a particular function then set this
file first before starting the function graph tracer. In the future
this side effect may be corrected.

The set_graph_function file is similar to the set_ftrace_filter but
it does not take wild cards nor does it allow for more than one
function to be set with a single write. There is no technical reason why
this is the case, I just do not have the time yet to implement that.

Note, dynamic ftrace must be enabled for this to appear because it
uses the dynamic ftrace records to match the name to the mcount
call sites.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# e49dc19c 02-Dec-2008 Steven Rostedt <srostedt@redhat.com>

ftrace: function graph return for function entry

Impact: feature, let entry function decide to trace or not

This patch lets the graph tracer entry function decide if the tracing
should be done at the end as well. This requires all function graph
entry functions return 1 if it should trace, or 0 if the return should
not be traced.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 14a866c5 02-Dec-2008 Steven Rostedt <srostedt@redhat.com>

ftrace: add ftrace_graph_stop()

Impact: new ftrace_graph_stop function

While developing more features of function graph, I hit a bug that
caused the WARN_ON to trigger in the prepare_ftrace_return function.
Well, it was hard for me to find out that was happening because the
bug would not print, it would just cause a hard lockup or reboot.
The reason is that it is not safe to call printk from this function.

Looking further, I also found that it calls unregister_ftrace_graph,
which grabs a mutex and calls kstop machine. This would definitely
lock the box up if it were to trigger.

This patch adds a fast and safe ftrace_graph_stop() which will
stop the function tracer. Then it is safe to call the WARN ON.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 48d68b20 01-Dec-2008 Frederic Weisbecker <fweisbec@gmail.com>

tracing/function-graph-tracer: support for x86-64

Impact: extend and enable the function graph tracer to 64-bit x86

This patch implements the support for function graph tracer under x86-64.
Both static and dynamic tracing are supported.

This causes some small CPP conditional asm on arch/x86/kernel/ftrace.c I
wanted to use probe_kernel_read/write to make the return address
saving/patching code more generic but it causes tracing recursion.

That would be perhaps useful to implement a notrace version of these
function for other archs ports.

Note that arch/x86/process_64.c is not traced, as in X86-32. I first
thought __switch_to() was responsible of crashes during tracing because I
believed current task were changed inside but that's actually not the
case (actually yes, but not the "current" pointer).

So I will have to investigate to find the functions that harm here, to
enable tracing of the other functions inside (but there is no issue at
this time, while process_64.c stays out of -pg flags).

A little possible race condition is fixed inside this patch too. When the
tracer allocate a return stack dynamically, the current depth is not
initialized before but after. An interrupt could occur at this time and,
after seeing that the return stack is allocated, the tracer could try to
trace it with a random uninitialized depth. It's a prevention, even if I
hadn't problems with it.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tim Bird <tim.bird@am.sony.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 66eafebc 01-Dec-2008 Liming Wang <liming.wang@windriver.com>

function trace: fix a bug of single thread function trace

Impact: fix "no output from tracer" bug caused by ftrace_update_pid_func()

When disabling single thread function trace using
"echo -1 > set_ftrace_pid", the normal function trace
has to restore to original function, otherwise the normal
function trace will not work well.

Without this commit, something like below:

$ ps |grep 850
850 root 2556 S -/bin/sh
$ echo 850 > /debug/tracing/set_ftrace_pid
$ echo function > /debug/tracing/current_tracer
$ echo 1 > /debug/tracing/tracing_enabled
$ sleep 1
$ echo 0 > /debug/tracing/tracing_enabled
$ cat /debug/tracing/trace_pipe |wc -l
59704
$ echo -1 > /debug/tracing/set_ftrace_pid
$ echo 1 > /debug/tracing/tracing_enabled
$ sleep 1
$ echo 0 > /debug/tracing/tracing_enabled
$ more /debug/tracing/trace_pipe
<====== nothing output now!
it should output trace record.

Signed-off-by: Liming Wang <liming.wang@windriver.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 50cdaf08 27-Nov-2008 Liming Wang <liming.wang@windriver.com>

ftrace: improve seq_operation of ftrace

Impact: make ftrace position computing more sane

First remove useless ->pos field. Then we needn't check seq_printf
in .show like other place.

Signed-off-by: Liming Wang <liming.wang@windriver.com>
Reviewed-by: Bruce Ashfield <bruce.ashfield@windriver.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# e53a6319 25-Nov-2008 Steven Rostedt <rostedt@goodmis.org>

ftrace: let function tracing and function return run together

Impact: feature

This patch enables function tracing and function return to run together.
I've tested this by enabling the stack tracer and return tracer, where
both the function entry and function return are used together with
dynamic ftrace.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 5a45cfe1 25-Nov-2008 Steven Rostedt <rostedt@goodmis.org>

ftrace: use code patching for ftrace graph tracer

Impact: more efficient code for ftrace graph tracer

This patch uses the dynamic patching, when available, to patch
the function graph code into the kernel.

This patch will ease the way for letting both function tracing
and function graph tracing run together.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# df4fc315 25-Nov-2008 Steven Rostedt <srostedt@redhat.com>

ftrace: add function tracing to single thread

Impact: feature to function trace a single thread

This patch adds the ability to function trace a single thread.
The file:

/debugfs/tracing/set_ftrace_pid

contains the pid to trace. Valid pids are any positive integer.
Writing any negative number to this file will disable the pid
tracing and the function tracer will go back to tracing all of
threads.

This feature works with both static and dynamic function tracing.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 287b6e68 25-Nov-2008 Frederic Weisbecker <fweisbec@gmail.com>

tracing/function-return-tracer: set a more human readable output

Impact: feature

This patch sets a C-like output for the function graph tracing.
For this aim, we now call two handler for each function: one on the entry
and one other on return. This way we can draw a well-ordered call stack.

The pid of the previous trace is loosely stored to be compared against
the one of the current trace to see if there were a context switch.

Without this little feature, the call tree would seem broken at
some locations.
We could use the sched_tracer to capture these sched_events but this
way of processing is much more simpler.

2 spaces have been chosen for indentation to fit the screen while deep
calls. The time of execution in nanosecs is printed just after closed
braces, it seems more easy this way to find the corresponding function.
If the time was printed as a first column, it would be not so easy to
find the corresponding function if it is called on a deep depth.

I plan to output the return value but on 32 bits CPU, the return value
can be 32 or 64, and its difficult to guess on which case we are.
I don't know what would be the better solution on X86-32: only print
eax (low-part) or even edx (high-part).

Actually it's thee same problem when a function return a 8 bits value, the
high part of eax could contain junk values...

Here is an example of trace:

sys_read() {
fget_light() {
} 526
vfs_read() {
rw_verify_area() {
security_file_permission() {
cap_file_permission() {
} 519
} 1564
} 2640
do_sync_read() {
pipe_read() {
__might_sleep() {
} 511
pipe_wait() {
prepare_to_wait() {
} 760
deactivate_task() {
dequeue_task() {
dequeue_task_fair() {
dequeue_entity() {
update_curr() {
update_min_vruntime() {
} 504
} 1587
clear_buddies() {
} 512
add_cfs_task_weight() {
} 519
update_min_vruntime() {
} 511
} 5602
dequeue_entity() {
update_curr() {
update_min_vruntime() {
} 496
} 1631
clear_buddies() {
} 496
update_min_vruntime() {
} 527
} 4580
hrtick_update() {
hrtick_start_fair() {
} 488
} 1489
} 13700
} 14949
} 16016
msecs_to_jiffies() {
} 496
put_prev_task_fair() {
} 504
pick_next_task_fair() {
} 489
pick_next_task_rt() {
} 496
pick_next_task_fair() {
} 489
pick_next_task_idle() {
} 489

------------8<---------- thread 4 ------------8<----------

finish_task_switch() {
} 1203
do_softirq() {
__do_softirq() {
__local_bh_disable() {
} 669
rcu_process_callbacks() {
__rcu_process_callbacks() {
cpu_quiet() {
rcu_start_batch() {
} 503
} 1647
} 3128
__rcu_process_callbacks() {
} 542
} 5362
_local_bh_enable() {
} 587
} 8880
} 9986
kthread_should_stop() {
} 669
deactivate_task() {
dequeue_task() {
dequeue_task_fair() {
dequeue_entity() {
update_curr() {
calc_delta_mine() {
} 511
update_min_vruntime() {
} 511
} 2813

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# fb52607a 25-Nov-2008 Frederic Weisbecker <fweisbec@gmail.com>

tracing/function-return-tracer: change the name into function-graph-tracer

Impact: cleanup

This patch changes the name of the "return function tracer" into
function-graph-tracer which is a more suitable name for a tracing
which makes one able to retrieve the ordered call stack during
the code flow.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# eae849ca 23-Nov-2008 Frederic Weisbecker <fweisbec@gmail.com>

tracing/function-return-tracer: don't trace kfree while it frees the return stack

Impact: fix a crash

While I killed the cat process, I got sometimes the following (but rare)
crash:

[ 65.689027] Pid: 2969, comm: cat Not tainted (2.6.28-rc6-tip #83) AMILO Li 2727
[ 65.689027] EIP: 0060:[<00000000>] EFLAGS: 00010082 CPU: 1
[ 65.689027] EIP is at 0x0
[ 65.689027] EAX: 00000000 EBX: f66cd780 ECX: c019a64a EDX: f66cd780
[ 65.689027] ESI: 00000286 EDI: f66cd780 EBP: f630be2c ESP: f630be24
[ 65.689027] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
[ 65.689027] Process cat (pid: 2969, ti=f630a000 task=f66cd780 task.ti=f630a000)
[ 65.689027] Stack:
[ 65.689027] 00000012 f630bd54 f630be7c c012c853 00000000 c0133cc9 f66cda54 f630be5c
[ 65.689027] f630be68 f66cda54 f66cd88c f66cd878 f7070000 00000001 f630be90 c0135dbc
[ 65.689027] f614a614 f630be68 f630be68 f65ba200 00000002 f630bf10 f630be90 c012cad6
[ 65.689027] Call Trace:
[ 65.689027] [<c012c853>] ? do_exit+0x603/0x850
[ 65.689027] [<c0133cc9>] ? next_signal+0x9/0x40
[ 65.689027] [<c0135dbc>] ? dequeue_signal+0x8c/0x180
[ 65.689027] [<c012cad6>] ? do_group_exit+0x36/0x90
[ 65.689027] [<c013709c>] ? get_signal_to_deliver+0x20c/0x390
[ 65.689027] [<c0102b69>] ? do_notify_resume+0x99/0x8b0
[ 65.689027] [<c02e6d1a>] ? tty_ldisc_deref+0x5a/0x80
[ 65.689027] [<c014db9b>] ? trace_hardirqs_on+0xb/0x10
[ 65.689027] [<c02e6d1a>] ? tty_ldisc_deref+0x5a/0x80
[ 65.689027] [<c02e39b0>] ? n_tty_write+0x0/0x340
[ 65.689027] [<c02e1812>] ? redirected_tty_write+0x82/0x90
[ 65.689027] [<c019ee99>] ? vfs_write+0x99/0xd0
[ 65.689027] [<c02e1790>] ? redirected_tty_write+0x0/0x90
[ 65.689027] [<c019f342>] ? sys_write+0x42/0x70
[ 65.689027] [<c01035ca>] ? work_notifysig+0x13/0x19
[ 65.689027] Code: Bad EIP value.
[ 65.689027] EIP: [<00000000>] 0x0 SS:ESP 0068:f630be24

This is because on do_exit(), kfree is called to free the return addresses stack
but kfree is traced and stored its return address in this stack.
This patch fixes it.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# f201ae23 22-Nov-2008 Frederic Weisbecker <fweisbec@gmail.com>

tracing/function-return-tracer: store return stack into task_struct and allocate it dynamically

Impact: use deeper function tracing depth safely

Some tests showed that function return tracing needed a more deeper depth
of function calls. But it could be unsafe to store these return addresses
to the stack.

So these arrays will now be allocated dynamically into task_struct of current
only when the tracer is activated.

Typical scheme when tracer is activated:
- allocate a return stack for each task in global list.
- fork: allocate the return stack for the newly created task
- exit: free return stack of current
- idle init: same as fork

I chose a default depth of 50. I don't have overruns anymore.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 32464779 18-Nov-2008 Steven Rostedt <srostedt@redhat.com>

ftrace: fix dyn ftrace filter selection

Impact: clean up and fix for dyn ftrace filter selection

The previous logic of the dynamic ftrace selection of enabling
or disabling functions was complex and incorrect. This patch simplifies
the code and corrects the usage. This simplification also makes the
code more robust.

Here is the correct logic:

Given a function that can be traced by dynamic ftrace:

If the function is not to be traced, disable it if it was enabled.
(this is if the function is in the set_ftrace_notrace file)

(filter is on if there exists any functions in set_ftrace_filter file)

If the filter is on, and we are enabling functions:
If the function is in set_ftrace_filter, enable it if it is not
already enabled.
If the function is not in set_ftrace_filter, disable it if it is not
already disabled.

Otherwise, if the filter is off and we are enabling function tracing:
Enable the function if it is not already enabled.

Otherwise, if we are disabling function tracing:
Disable the function if it is not already disabled.

This code now sets or clears the ENABLED flag in the record, and at the
end it will enable the function if the flag is set, or disable the function
if the flag is cleared.

The parameters for the function that does the above logic is also
simplified. Instead of passing in confusing "new" and "old" where
they might be swapped if the "enabled" flag is not set. The old logic
even had one of the above always NULL and had to be filled in. The new
logic simply passes in one parameter called "nop". A "call" is calculated
in the code, and at the end of the logic, when we know we need to either
disable or enable the function, we can then use the "nop" and "call"
properly.

This code is more robust than the previous version.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 82043278 18-Nov-2008 Steven Rostedt <srostedt@redhat.com>

ftrace: make filtered functions effective on setting

Impact: fix filter selection to apply when set

It can be confusing when the set_filter_functions is set (or cleared)
and the functions being recorded by the dynamic tracer does not
match.

This patch causes the code to be updated if the function tracer is
enabled and the filter is changed.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# f10ed36e 07-Nov-2008 Steven Rostedt <srostedt@redhat.com>

ftrace: fix set_ftrace_filter

Impact: fix of output of set_ftrace_filter

The commit "ftrace: do not show freed records in
available_filter_functions"

Removed a bit too much from the set_ftrace_filter code, where we now see
all functions in the set_ftrace_filter file even when we set a filter.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 5821e1b7 15-Nov-2008 walimis <walimisdev@gmail.com>

function tracing: fix wrong pos computing when read buffer has been fulfilled

Impact: make output of available_filter_functions complete

phenomenon:

The first value of dyn_ftrace_total_info is not equal with
`cat available_filter_functions | wc -l`, but they should be equal.

root cause:

When printing functions with seq_printf in t_show, if the read buffer
is just overflowed by current function record, then this function
won't be printed to user space through read buffer, it will
just be dropped. So we can't see this function printing.

So, every time the last function to fill the read buffer, if overflowed,
will be dropped.

This also applies to set_ftrace_filter if set_ftrace_filter has
more bytes than read buffer.

fix:

Through checking return value of seq_printf, if less than 0, we know
this function doesn't be printed. Then we decrease position to force
this function to be printed next time, in next read buffer.

Another little fix is to show correct allocating pages count.

Signed-off-by: walimis <walimisdev@gmail.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# e7d3737e 15-Nov-2008 Frederic Weisbecker <fweisbec@gmail.com>

tracing/function-return-tracer: support for dynamic ftrace on function return tracer

This patch adds the support for dynamic tracing on the function return tracer.
The whole difference with normal dynamic function tracing is that we don't need
to hook on a particular callback. The only pro that we want is to nop or set
dynamically the calls to ftrace_caller (which is ftrace_return_caller here).

Some security checks ensure that we are not trying to launch dynamic tracing for
return tracing while normal function tracing is already running.

An example of trace with getnstimeofday set as a filter:

ktime_get_ts+0x22/0x50 -> getnstimeofday (2283 ns)
ktime_get_ts+0x22/0x50 -> getnstimeofday (1396 ns)
ktime_get_ts+0x22/0x50 -> getnstimeofday (1382 ns)
ktime_get_ts+0x22/0x50 -> getnstimeofday (1825 ns)
ktime_get_ts+0x22/0x50 -> getnstimeofday (1426 ns)
ktime_get_ts+0x22/0x50 -> getnstimeofday (1464 ns)
ktime_get_ts+0x22/0x50 -> getnstimeofday (1524 ns)
ktime_get_ts+0x22/0x50 -> getnstimeofday (1382 ns)
ktime_get_ts+0x22/0x50 -> getnstimeofday (1382 ns)
ktime_get_ts+0x22/0x50 -> getnstimeofday (1434 ns)
ktime_get_ts+0x22/0x50 -> getnstimeofday (1464 ns)
ktime_get_ts+0x22/0x50 -> getnstimeofday (1502 ns)
ktime_get_ts+0x22/0x50 -> getnstimeofday (1404 ns)
ktime_get_ts+0x22/0x50 -> getnstimeofday (1397 ns)
ktime_get_ts+0x22/0x50 -> getnstimeofday (1051 ns)
ktime_get_ts+0x22/0x50 -> getnstimeofday (1314 ns)
ktime_get_ts+0x22/0x50 -> getnstimeofday (1344 ns)
ktime_get_ts+0x22/0x50 -> getnstimeofday (1163 ns)
ktime_get_ts+0x22/0x50 -> getnstimeofday (1390 ns)
ktime_get_ts+0x22/0x50 -> getnstimeofday (1374 ns)

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# ee02a2e5 15-Nov-2008 Steven Rostedt <srostedt@redhat.com>

ftrace: make filtered functions effective on setting

Impact: set filtered functions at time the filter is set

It can be confusing when the set_filter_functions is set (or cleared)
and the functions being recorded by the dynamic tracer does not
match.

This patch causes the code to be updated if the function tracer is
enabled and the filter is changed.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 982c350b 15-Nov-2008 Steven Rostedt <srostedt@redhat.com>

ftrace: fix dyn ftrace filter

Impact: correct implementation of dyn ftrace filter

The old decisions made by the filter algorithm was complex and incorrect.
This lead to inconsistent enabling or disabling of functions when
the filter was used.

This patch simplifies that code and in doing so, corrects the usage
of the filters.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 20e5227e 14-Nov-2008 Steven Rostedt <srostedt@redhat.com>

ftrace: allow NULL pointers in mcount_loc

Impact: make ftrace_convert_nops() more permissive

Due to the way different architecture linkers combine the data sections
of the mcount_loc (the section that lists all the locations that
call mcount), there may be zeros added in that section. This is usually
due to strange alignments that the linker performs, that pads in zeros.

This patch makes the conversion code to nops skip any pointer in
the mcount_loc section that is NULL.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 31e88909 14-Nov-2008 Steven Rostedt <srostedt@redhat.com>

ftrace: pass module struct to arch dynamic ftrace functions

Impact: allow archs more flexibility on dynamic ftrace implementations

Dynamic ftrace has largly been developed on x86. Since x86 does not
have the same limitations as other architectures, the ftrace interaction
between the generic code and the architecture specific code was not
flexible enough to handle some of the issues that other architectures
have.

Most notably, module trampolines. Due to the limited branch distance
that archs make in calling kernel core code from modules, the module
load code must create a trampoline to jump to what will make the
larger jump into core kernel code.

The problem arises when this happens to a call to mcount. Ftrace checks
all code before modifying it and makes sure the current code is what
it expects. Right now, there is not enough information to handle modifying
module trampolines.

This patch changes the API between generic dynamic ftrace code and
the arch dependent code. There is now two functions for modifying code:

ftrace_make_nop(mod, rec, addr) - convert the code at rec->ip into
a nop, where the original text is calling addr. (mod is the
module struct if called by module init)

ftrace_make_caller(rec, addr) - convert the code rec->ip that should
be a nop into a caller to addr.

The record "rec" now has a new field called "arch" where the architecture
can add any special attributes to each call site record.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 918c1154 14-Nov-2008 Steven Rostedt <srostedt@redhat.com>

ftrace: do not process freed records

Impact: keep from converting freed records

When the tracer is started or stopped, it converts all code pointed
to by the saved records into callers to ftrace or nops. When modules
are unloaded, their records are freed, but they still exist within
the record pages.

This patch changes the code to skip over freed records.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# b17e8a37 14-Nov-2008 Steven Rostedt <srostedt@redhat.com>

ftrace: disable ftrace on anomalies in trace start and stop

Impact: robust feature to disable ftrace on start or stop tracing on error

Currently only the initial conversion to nops will disable ftrace
on an anomaly. But if an anomaly happens on start or stopping of the
tracer, it will silently fail.

This patch adds a check there too, to disable ftrace and warn if the
conversion fails.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# f3c7ac40 14-Nov-2008 Steven Rostedt <srostedt@redhat.com>

ftrace: remove condition from ftrace_record_ip

Impact: let module functions be recorded when dyn ftrace not enabled

When dynamic ftrace had a daemon and a hash to record the locations
of mcount callers at run time, the recording needed to stop when
ftrace was disabled. But now that the recording is done at compile time
and the ftrace_record_ip is only called at boot up and when a module
is loaded, we no longer need to check if ftrace_enabled is set.
In fact, this breaks module load if it is not set because we skip
over module functions.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# b3535c63 13-Nov-2008 walimis <walimisdev@gmail.com>

ftrace: remove unnecessary if condition of __unregister_ftrace_function

Because it has goto out before ftrace_list == &ftrace_list_end,
that's to say, we never meet this condition.

Signed-off-by: walimis <walimisdev@gmail.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# a3583244 11-Nov-2008 Steven Rostedt <srostedt@redhat.com>

ring-buffer: buffer record on/off switch

Impact: enable/disable ring buffer recording API added

Several kernel developers have requested that there be a way to stop
recording into the ring buffers with a simple switch that can also
be enabled from userspace. This patch addes a new kernel API to the
ring buffers called:

tracing_on()
tracing_off()

When tracing_off() is called, all ring buffers will not be able to record
into their buffers.

tracing_on() will enable the ring buffers again.

These two act like an on/off switch. That is, there is no counting of the
number of times tracing_off or tracing_on has been called.

A new file is added to the debugfs/tracing directory called

tracing_on

This allows for userspace applications to also flip the switch.

echo 0 > debugfs/tracing/tracing_on

disables the tracing.

echo 1 > /debugfs/tracing/tracing_on

enables it.

Note, this does not disable or enable any tracers. It only sets or clears
a flag that needs to be set in order for the ring buffers to write to
their buffers. It is a global flag, and affects all ring buffers.

The buffers start out with tracing_on enabled.

There are now three flags that control recording into the buffers:

tracing_on: which affects all ring buffer tracers.

buffer->record_disabled: which affects an allocated buffer, which may be set
if an anomaly is detected, and tracing is disabled.

cpu_buffer->record_disabled: which is set by tracing_stop() or if an
anomaly is detected. tracing_start can not reenable this if
an anomaly occurred.

The userspace debugfs/tracing/tracing_enabled is implemented with
tracing_stop() but the user space code can not enable it if the kernel
called tracing_stop().

Userspace can enable the tracing_on even if the kernel disabled it.
It is just a switch used to stop tracing if a condition was hit.
tracing_on is not for protecting critical areas in the kernel nor is
it for stopping tracing if an anomaly occurred. This is because userspace
can reenable it at any time.

Side effect: With this patch, I discovered a dead variable in ftrace.c
called tracing_on. This patch removes it.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>


# 15e6cb36 10-Nov-2008 Frederic Weisbecker <fweisbec@gmail.com>

tracing: add a tracer to catch execution time of kernel functions

Impact: add new tracing plugin which can trace full (entry+exit) function calls

This tracer uses the low level function return ftrace plugin to
measure the execution time of the kernel functions.

The first field is the caller of the function, the second is the
measured function, and the last one is the execution time in
nanoseconds.

- v3:

- HAVE_FUNCTION_RET_TRACER have been added. Each arch that support ftrace return
should enable it.
- ftrace_return_stub becomes ftrace_stub.
- CONFIG_FUNCTION_RET_TRACER depends now on CONFIG_FUNCTION_TRACER
- Return traces printing can be used for other tracers on trace.c
- Adapt to the new tracing API (no more ctrl_update callback)
- Correct the check of "disabled" during insertion.
- Minor changes...

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 0183fb1c 07-Nov-2008 Steven Rostedt <srostedt@redhat.com>

ftrace: fix set_ftrace_filter

Impact: fix of output of set_ftrace_filter

Commit ftrace: do not show freed records in available_filter_functions

Removed a bit too much from the set_ftrace_filter code, where we now see
all functions in the set_ftrace_filter file even when we set a filter.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 60a7ecf4 05-Nov-2008 Steven Rostedt <srostedt@redhat.com>

ftrace: add quick function trace stop

Impact: quick start and stop of function tracer

This patch adds a way to disable the function tracer quickly without
the need to run kstop_machine. It adds a new variable called
function_trace_stop which will stop the calls to functions from mcount
when set. This is just an on/off switch and does not handle recursion
like preempt_disable().

It's main purpose is to help other tracers/debuggers start and stop tracing
fuctions without the need to call kstop_machine.

The config option HAVE_FUNCTION_TRACE_MCOUNT_TEST is added for archs
that implement the testing of the function_trace_stop in the mcount
arch dependent code. Otherwise, the test is done in the C code.

x86 is the only arch at the moment that supports this.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 0b6e4d56 28-Oct-2008 Frederic Weisbecker <fweisbec@gmail.com>

ftrace: perform an initialization for ftrace to enable it

Impact: corrects a bug which made the non-dyn function tracer not functional

With latest git, the non-dynamic function tracer didn't get any trace.

The problem was the fact that ftrace_enabled wasn't initialized to 1
because ftrace hasn't any init function when DYNAMIC_FTRACE is disabled.

So when a tracer tries to register an ftrace_ops struct,
__register_ftrace_function failed to set the hook.

This patch corrects it by setting an init function to initialize
ftrace during the boot.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# f17845e5 23-Oct-2008 Ingo Molnar <mingo@elte.hu>

ftrace: warning in kernel/trace/ftrace.c

this warning:

kernel/trace/ftrace.c:189: warning: ‘frozen_record_count’ defined but not used

triggers because frozen_record_count is only used in the KCONFIG_MARKERS
case. Move the variable it there.

Alas, this frozen-record facility seems to have little use. The
frozen_record_count variable is not used by anything, nor the flags.

So this section might need a bit of dead-code-removal care as well.

Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 66b0de35 23-Oct-2008 Ingo Molnar <mingo@elte.hu>

ftrace: fix build failure

fix:

kernel/trace/ftrace.c: In function 'ftrace_release':
kernel/trace/ftrace.c:271: error: implicit declaration of function 'ftrace_release_hash'

release_hash is not needed without dftraced.

Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 08f5ac90 23-Oct-2008 Steven Rostedt <rostedt@goodmis.org>

ftrace: remove ftrace hash

The ftrace hash was used by the ftrace_daemon code. The record ip function
would place the calling address (ip) into the hash. The daemon would later
read the hash and modify that code.

The hash complicates the code. This patch removes it.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 4d296c24 23-Oct-2008 Steven Rostedt <rostedt@goodmis.org>

ftrace: remove mcount set

The arch dependent function ftrace_mcount_set was only used by the daemon
start up code. This patch removes it.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# cb7be3b2 23-Oct-2008 Steven Rostedt <rostedt@goodmis.org>

ftrace: remove daemon

The ftrace daemon is complex and error prone. This patch strips it out
of the code.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 6912896e 23-Oct-2008 Steven Rostedt <rostedt@goodmis.org>

ftrace: add ftrace warn on to disable ftrace

Add ftrace warn on to disable ftrace as well as report a warning.

[ Thanks to Andrew Morton for suggesting using the WARN_ON return value ]

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 81adbdc0 23-Oct-2008 Steven Rostedt <rostedt@goodmis.org>

ftrace: only have ftrace_kill atomic

When an anomaly is detected, we need a way to completely disable
ftrace. Right now we have two functions: ftrace_kill and ftrace_kill_atomic.
The ftrace_kill tries to do it in a "nice" way by converting everything
back to a nop.

The "nice" way is dangerous itself, so this patch removes it and only
has the "atomic" version, which is all that is needed.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 593eb8a2 23-Oct-2008 Steven Rostedt <rostedt@goodmis.org>

ftrace: return error on failed modified text.

Have the ftrace_modify_code return error values:

-EFAULT on error of reading the address

-EINVAL if what is read does not match what it expected

-EPERM if the write fails to update after a successful match.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# bd95b88d 16-Oct-2008 Steven Rostedt <rostedt@goodmis.org>

ftrace: release functions from hash

The x86 architecture uses a static recording of mcount caller locations
and is not affected by this patch.

For architectures still using the dynamic ftrace daemon, this patch is
critical. It removes the race between the recording of a function that
calls mcount, the unloading of a module, and the ftrace daemon updating
the call sites.

This patch adds the releasing of the hash functions that the daemon uses
to update the mcount call sites. When a module is unloaded, not only
are the replaced call site table update, but now so is the hash recorded
functions that the ftrace daemon will use.

Again, architectures that implement MCOUNT_RECORD are not affected by
this (which currently only x86 has).

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 05736a42 22-Sep-2008 Steven Rostedt <rostedt@goodmis.org>

ftrace: warn on failure to disable mcount callers

With the recent updates to ftrace, there should not be any failures when
modifying the code. If there is, then we need to warn about it.

This patch has a cleaned up version of the code that I used to discover
that the weak symbols were causing failures.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 71c67d58 20-Sep-2008 Steven Noonan <steven@uplinklabs.net>

ftrace: mcount_addr defined but not used

When CONFIG_DYNAMIC_FTRACE isn't used, neither is mcount_addr. This
patch eliminates that warning.

Signed-off-by: Steven Noonan <steven@uplinklabs.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 644f991d 05-Sep-2008 Steven Rostedt <rostedt@goodmis.org>

ftrace: fix unlocking of hash

This must be brown paper bag week for Steven Rostedt!

While working on ftrace for PPC, I discovered that the hash locking done
when CONFIG_FTRACE_MCOUNT_RECORD is not set, is totally incorrect.

With a cut and paste error, I had the hash lock macro to lock for both
hash_lock _and_ hash_unlock!

This bug did not affect x86 since this bug was introduced when
CONFIG_FTRACE_MCOUNT_RECORD was added to x86.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 3b47bfc1 27-Aug-2008 Steven Rostedt <rostedt@goodmis.org>

ftrace: remove direct reference to mcount in trace code

The mcount record method of ftrace scans objdump for references to mcount.
Using mcount as the reference to test if the calls to mcount being replaced
are indeed calls to mcount, this use of mcount was also caught as a
location to change. Using a variable that points to the mcount address
moves this reference into the data section that is not scanned, and
we do not use a false location to try and modify.

The warn on code was what was used to detect this bug.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# ac8825ec 25-Aug-2008 Ingo Molnar <mingo@elte.hu>

ftrace: clean up macro usage

enclose the argument in parenthesis. (especially since we cast it,
which is a high prio operation)

Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 2d7da80f 24-Aug-2008 Stephen Rothwell <sfr@canb.auug.org.au>

ftrace: fix build failure

After disabling FTRACE_MCOUNT_RECORD via a patch, a dormant build
failure surfaced:

kernel/trace/ftrace.c: In function 'ftrace_record_ip':
kernel/trace/ftrace.c:416: error: incompatible type for argument 1 of '_spin_lock_irqsave'
kernel/trace/ftrace.c:433: error: incompatible type for argument 1 of '_spin_lock_irqsave'

Introduced by commit 6dad8e07f4c10b17b038e84d29f3ca41c2e55cd0 ("ftrace:
add necessary locking for ftrace records").

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 99ecdc43 15-Aug-2008 Steven Rostedt <rostedt@goodmis.org>

ftrace: add necessary locking for ftrace records

The new design of pre-recorded mcounts and updating the code outside of
kstop_machine has changed the way the records themselves are protected.

This patch uses the ftrace_lock to protect the records. Note, the lock
still does not need to be taken within calls that are only called via
kstop_machine, since the that code can not run while the spin lock is held.

Also removed the hash_lock needed for the daemon when MCOUNT_RECORD is
configured. Also did a slight cleanup of an unused variable.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 00fd61ae 15-Aug-2008 Steven Rostedt <rostedt@goodmis.org>

ftrace: do not init module on ftrace disabled

If one of the self tests of ftrace has disabled the function tracer,
do not run the code to convert the mcount calls in modules.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# fed1939c 14-Aug-2008 Steven Rostedt <rostedt@goodmis.org>

ftrace: remove old pointers to mcount

When a mcount pointer is recorded into a table, it is used to add or
remove calls to mcount (replacing them with nops). If the code is removed
via removing a module, the pointers still exist. At modifying the code
a check is always made to make sure the code being replaced is the code
expected. In-other-words, the code being replaced is compared to what
it is expected to be before being replaced.

There is a very small chance that the code being replaced just happens
to look like code that calls mcount (very small since the call to mcount
is relative). To remove this chance, this patch adds ftrace_release to
allow module unloading to remove the pointers to mcount within the module.

Another change for init calls is made to not trace calls marked with
__init. The tracing can not be started until after init is done anyway.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# a9fdda33 14-Aug-2008 Steven Rostedt <rostedt@goodmis.org>

ftrace: do not show freed records in available_filter_functions

Seems that freed records can appear in the available_filter_functions list.
This patch fixes that.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 90d595fe 14-Aug-2008 Steven Rostedt <rostedt@goodmis.org>

ftrace: enable mcount recording for modules

This patch enables the loading of the __mcount_section of modules and
changing all the callers of mcount into nops.

The modification is done before the init_module function is called, so
again, we do not need to use kstop_machine to make these changes.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 68bf21aa 14-Aug-2008 Steven Rostedt <rostedt@goodmis.org>

ftrace: mcount call site on boot nops core

This is the infrastructure to the converting the mcount call sites
recorded by the __mcount_loc section into nops on boot. It also allows
for using these sites to enable tracing as normal. When the __mcount_loc
section is used, the "ftraced" kernel thread is disabled.

This uses the current infrastructure to record the mcount call sites
as well as convert them to nops. The mcount function is kept as a stub
on boot up and not converted to the ftrace_record_ip function. We use the
ftrace_record_ip to only record from the table.

This patch does not handle modules. That comes with a later patch.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 784e2d76 27-Jul-2008 Rusty Russell <rusty@rustcorp.com.au>

stop_machine: fix up ftrace.c

Simple conversion.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Abhishek Sagar <sagar.abhishek@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Steven Rostedt <rostedt@goodmis.org>


# b2613e37 11-Jul-2008 Ingo Molnar <mingo@elte.hu>

ftrace: build fix for ftraced_suspend

fix:

kernel/trace/ftrace.c:1615: error: 'ftraced_suspend' undeclared (first use in this function)
kernel/trace/ftrace.c:1615: error: (Each undeclared identifier is reported only once
kernel/trace/ftrace.c:1615: error: for each function it appears in.)

Signed-off-by: Ingo Molnar <mingo@elte.hu>


# a2bb6a3d 10-Jul-2008 Steven Rostedt <rostedt@goodmis.org>

ftrace: add ftrace_kill_atomic

It has been suggested that I add a way to disable the function tracer
on an oops. This code adds a ftrace_kill_atomic. It is not meant to be
used in normal situations. It will disable the ftrace tracer, but will
not perform the nice shutdown that requires scheduling.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Cc: Steven Rostedt <srostedt@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 98a05ed4 26-Jun-2008 Abhishek Sagar <sagar.abhishek@gmail.com>

ftrace: prevent ftrace modifications while being kprobe'd, v2

add two missing chunks for ftrace+kprobe.

Signed-off-by: Abhishek Sagar <sagar.abhishek@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# f22f9a89 21-Jun-2008 Abhishek Sagar <sagar.abhishek@gmail.com>

ftrace: avoid modifying kprobe'd records

Avoid modifying the mcount call-site if there is a kprobe installed on it.
These records are not marked as failed however. This allowed the filter
rules on them to remain up-to-date. Whenever the kprobe on the corresponding
record is removed, the record gets updated as normal.

Signed-off-by: Abhishek Sagar <sagar.abhishek@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# ecea656d 21-Jun-2008 Abhishek Sagar <sagar.abhishek@gmail.com>

ftrace: freeze kprobe'd records

Let records identified as being kprobe'd be marked as "frozen". The trouble
with records which have a kprobe installed on their mcount call-site is
that they don't get updated. So if such a function which is currently being
traced gets its tracing disabled due to a new filter rule (or because it
was added to the notrace list) then it won't be updated and continue being
traced. This patch allows scanning of all frozen records during tracing to
check if they should be traced.

Signed-off-by: Abhishek Sagar <sagar.abhishek@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 395a59d0 21-Jun-2008 Abhishek Sagar <sagar.abhishek@gmail.com>

ftrace: store mcount address in rec->ip

Record the address of the mcount call-site. Currently all archs except sparc64
record the address of the instruction following the mcount call-site. Some
general cleanups are entailed. Storing mcount addresses in rec->ip enables
looking them up in the kprobe hash table later on to check if they're kprobe'd.

Signed-off-by: Abhishek Sagar <sagar.abhishek@gmail.com>
Cc: davem@davemloft.net
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# f2252935 22-May-2008 Ingo Molnar <mingo@elte.hu>

namespacecheck: fixes

Signed-off-by: Ingo Molnar <mingo@elte.hu>


# a4500b84 14-Jun-2008 Abhishek Sagar <sagar.abhishek@gmail.com>

ftrace: fix "notrace" filtering priority

This is a fix to give notrace filter rules priority over "set_ftrace_filter"
rules.

This fix ensures that functions which are set to be filtered and are
concurrently marked as "notrace" don't get recorded. As of now, if
a record is marked as FTRACE_FL_FILTER and is enabled, then the notrace
flag is not checked. Tested on x86-32.

Signed-off-by: Abhishek Sagar <sagar.abhishek@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 34078a5e 02-Jun-2008 Abhishek Sagar <sagar.abhishek@gmail.com>

ftrace: prevent freeing of all failed updates

Steven Rostedt wrote:
> If we unload a module and reload it, will it ever get converted again?

The intent was always to filter core kernel functions to prevent their freeing.
Here's a fix which should allow re-recording of module call-sites.

Signed-off-by: Ingo Molnar <mingo@elte.hu>


# eb9a7bf0 01-Jun-2008 Abhishek Sagar <sagar.abhishek@gmail.com>

ftrace: add debugfs entry 'failures'

Identify functions which had their mcount call-site updates failed. This can
help us track functions which ftrace shouldn't fiddle with, and are thus not
being traced. If there is no race with any external agent which is modifying
the mcount call-site, then this file displays no entries (normal case).

Signed-off-by: Abhishek Sagar <sagar.abhishek@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 1d74f2a0 01-Jun-2008 Abhishek Sagar <sagar.abhishek@gmail.com>

ftrace: remove ftrace_ip_converted()

Remove the unneeded function ftrace_ip_converted().

Signed-off-by: Abhishek Sagar <sagar.abhishek@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 0eb96701 01-Jun-2008 Abhishek Sagar <sagar.abhishek@gmail.com>

ftrace: prevent freeing of all failed updates

Prevent freeing of records which cause problems and correspond to function from
core kernel text. A new flag, FTRACE_FL_CONVERTED is used to mark a record
as "converted". All other records are patched lazily to NOPs. Failed records
now also remain on frace_hash table. Each invocation of ftrace_record_ip now
checks whether the traced function has ever been recorded (including past
failures) and doesn't re-record it again.

Signed-off-by: Abhishek Sagar <sagar.abhishek@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# ad90c0e3 27-May-2008 Steven Rostedt <rostedt@goodmis.org>

ftrace: user update and disable dynamic ftrace daemon

In dynamic ftrace, the mcount function starts off pointing to a stub
function that just returns.

On start up, the call to the stub is modified to point to a "record_ip"
function. The job of the record_ip function is to add the function to
a pre-allocated hash list. If the function is already there, it simply is
ignored, otherwise it is added to the list.

Later, a ftraced daemon wakes up and calls kstop_machine if any functions
have been recorded, and changes the calls to the recorded functions to
a simple nop. If no functions were recorded, the daemon goes back to sleep.

The daemon wakes up once a second to see if it needs to update any newly
recorded functions into nops. Usually it does not, but if a lot of code
has been executed for the first time in the kernel, the ftraced daemon
will call kstop_machine to update those into nops.

The problem currently is that there's no way to stop the daemon from doing
this, and it can cause unneeded latencies (800us which for some is bothersome).

This patch adds a new file /debugfs/tracing/ftraced_enabled. If the daemon
is active, reading this will return "enabled\n" and "disabled\n" when the
daemon is not running. To disable the daemon, the user can echo "0" or
"disable" into this file, and "1" or "enable" to re-enable the daemon.

Since the daemon is used to convert the functions into nops to increase
the performance of the system, I also added that anytime something is
written into the ftraced_enabled file, kstop_machine will run if there
are new functions that have been detected that need to be converted.

This way the user can disable the daemon but still be able to control the
conversion of the mcount calls to nops by simply,

"echo 0 > /debugfs/tracing/ftraced_enabled"

when they need to do more conversions.

To see the number of converted functions:

"cat /debugfs/tracing/dyn_ftrace_total_info"

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 492a7ea5 24-May-2008 Abhishek Sagar <sagar.abhishek@gmail.com>

ftrace: fix updating of ftrace_update_cnt

Hi Ingo/Steven,

Ftrace currently maintains an update count which includes false updates,
i.e, updates which failed. If anything, such failures should be tracked
by some separate variable, but this patch provides a minimal fix.

Signed-off-by: Abhishek Sagar <sagar.abhishek@gmail.com>
Cc: rostedt@goodmis.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# ffdaa358 24-May-2008 Abhishek Sagar <sagar.abhishek@gmail.com>

ftrace: safe traversal of ftrace_hash hlist

Hi Steven,

I noticed that concurrent instances of ftrace_record_ip()
have a race between ftrace_hash list traversal during
ftrace_ip_in_hash() (before acquiring ftrace_shutdown_lock)
and ftrace_add_hash(). If it's so then this should fix it.

Signed-off-by: Abhishek Sagar <sagar.abhishek@gmail.com>
Cc: rostedt@goodmis.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 41c52c0d 22-May-2008 Steven Rostedt <rostedt@goodmis.org>

ftrace: set_ftrace_notrace feature

While debugging latencies in the RT kernel, I found that it would be nice
to be able to filter away functions from the trace than just to filter
on functions.

I added a new interface to the debugfs tracing directory called

set_ftrace_notrace

When dynamic frace is enabled, this lets you filter away functions that will
not be recorded in the trace. It is similar to adding 'notrace' to those
functions but by doing it without recompiling the kernel.

Here's how set_ftrace_filter and set_ftrace_notrace interact. Remember, if
set_ftrace_filter is set, it removes all functions from the trace execpt for
those listed in the set_ftrace_filter. set_ftrace_notrace will prevent those
functions from being traced.

If you were to set one function in both set_ftrace_filter and
set_ftrace_notrace and that function was the same, then you would end up
with an empty trace.

the set of functions to trace is:

set_ftrace_filter == empty then

all functions not in set_ftrace_notrace

else

set of the set_ftrace_filter and not in set of set_ftrace_notrace.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 2d8b820b 23-Feb-2008 Ingo Molnar <mingo@elte.hu>

ftrace: cleanups

factor out code and clean it up.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 37135677 14-May-2008 Ingo Molnar <mingo@elte.hu>

ftrace: fix mcount export bug

David S. Miller noticed the following bug: the -pg instrumentation
function callback is named differently on each platform. On x86 it
is mcount, on sparc it is _mcount. So the export does not make sense
in kernel/trace/ftrace.c - move it to x86.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# aa5e5cea 13-May-2008 David Miller <davem@davemloft.net>

ftrace: remove packed attribute on ftrace_page.

It causes unaligned access traps on platforms like sparc
(ftrace_page may be marked packed, but once we return
a dyn_ftrace sub-object from this array to another piece
of code, the "packed" part of the typing information doesn't
propagate).

But also, it didn't serve any purpose either. Even if packed,
on 64-bit or 32-bit, it didn't give us any more dyn_ftrace
entries per-page.

Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 2bb6f8d6 12-May-2008 Steven Rostedt <rostedt@goodmis.org>

ftrace: use raw_smp_processor_id for mcount functions

Due to debug hooks in the kernel that can change the way smp_processor_id
works, use raw_smp_processor_id in mcount called functions (namely
ftrace_record_ip). Currently we annotate most debug functions from calling
mcount, but we should not rely on that to prevent kernel lockups.

This patch uses the raw_smp_processor_id to prevent a recusive crash
that can happen if a debug hook in smp_processor_id calls mcount.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 07a267cd 12-May-2008 Steven Rostedt <srostedt@redhat.com>

ftrace: add UNINTERRUPTIBLE state for kftraced on disable

When dynamic ftrace fails and sets itself disabled, the ftraced daemon
will go back to sleep everytime it wakes up. The setting of the
ftraced state to UNINTERRUPTIBLE is skipped in this process, and the
daemon takes up 100% of the CPU. This patch makes sure the ftraced daemon
sets itself to UNINTERRUPTIBLE in that loop.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# e309b41d 12-May-2008 Ingo Molnar <mingo@elte.hu>

ftrace: remove notrace

now that we have a kbuild method for notrace, no need to pollute the
C code with the annotations.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# caf8cdeb 12-May-2008 Steven Rostedt <srostedt@redhat.com>

ftrace: remove address of function names

PowerPC is very fragile when it comes to use of function names
and function addresses. ftrace needs to either use all function
addresses or function names (i.e. my_func as suppose to &my_func).

This patch chooses to use the names and not the addresses, and
makes ftrace consistent.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 9ff9cdb2 12-May-2008 Ingo Molnar <mingo@elte.hu>

ftrace: cleanups

clean up recent code.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 4eebcc81 12-May-2008 Steven Rostedt <srostedt@redhat.com>

ftrace: disable tracing on failure

Since ftrace touches practically every function. If we detect any
anomaly, we want to fully disable ftrace. This patch adds code
to try shutdown ftrace as much as possible without doing any more
harm is something is detected not quite correct.

This only kills ftrace, this patch does have checks for other parts of
the tracer (irqsoff, wakeup, etc.).

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 37ad5084 12-May-2008 Steven Rostedt <srostedt@redhat.com>

ftrace - fix dynamic ftrace memory leak

The ftrace dynamic function update allocates a record to store the
instruction pointers that are being modified. If the modified
instruction pointer fails to update, then the record is marked as
failed and nothing more is done.

Worse, if the modification fails, but the record ip function is still
called, it will allocate a new record and try again. In just a matter
of time, will this cause a serious memory leak and crash the system.

This patch plugs this memory leak. When a record fails, it is
included back into the pool of records to be used. Now a record may
fail over and over again, but the number of allocated records will
not increase.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 750ed1a4 12-May-2008 Ingo Molnar <mingo@elte.hu>

ftrace: timestamp syncing, prepare

rename and uninline now() to ftrace_now().

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 4bf39a94 12-May-2008 Ingo Molnar <mingo@elte.hu>

ftrace: cleanups

no code changed.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 77a2b37d 12-May-2008 Steven Rostedt <srostedt@redhat.com>

ftrace: startup tester on dynamic tracing.

This patch adds a startup self test on dynamic code modification
and filters. The test filters on a specific function, makes sure that
no other function is traced, exectutes the function, then makes sure that
the function is traced.

This patch also fixes a slight bug with the ftrace selftest, where
tracer_enabled was not being set.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# c7aafc54 12-May-2008 Ingo Molnar <mingo@elte.hu>

ftrace: cleanups

factor out code and clean it up.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# e1c08bdd 12-May-2008 Steven Rostedt <srostedt@redhat.com>

ftrace: force recording

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 5072c59f 12-May-2008 Steven Rostedt <srostedt@redhat.com>

ftrace: add filter select functions to trace

This patch adds two files to the debugfs system:

/debugfs/tracing/available_filter_functions

and

/debugfs/tracing/set_ftrace_filter

The available_filter_functions lists all functions that has been
recorded by the ftraced that has called the ftrace_record_ip function.
This is to allow users to see what functions have been converted
to nops and can be enabled for tracing.

To enable functions, simply echo the names (whitespace delimited)
into set_ftrace_filter. Simple wildcards are also allowed.

echo 'scheduler' > /debugfs/tracing/set_ftrace_filter

Will have only the scheduler be activated when tracing is enabled.

echo 'sched_*' > /debugfs/tracing/set_ftrace_filter

Will have only the functions starting with 'sched_' be activated.

echo '*lock' > /debugfs/tracing/set_ftrace_filter

Will have only functions ending with 'lock' be activated.

echo '*lock*' > /debugfs/tracing/set_ftrace_filter

Will have only functions with 'lock' in its name be activated.

Note: 'sched*lock' will not work. The only wildcards that are
allowed is an asterisk and the beginning and or end of the string
passed in.

Multiple names can be passed in with whitespace delimited:

echo 'scheduler *lock *acpi*' > /debugfs/tracing/set_ftrace_filter

is also the same as:

echo 'scheduler' > /debugfs/tracing/set_ftrace_filter
echo '*lock' >> /debugfs/tracing/set_ftrace_filter
echo '*acpi*' >> /debugfs/tracing/set_ftrace_filter

Appending does just that. It appends to the list.

To disable all filters simply echo an empty line in:

echo > /debugfs/tracing/set_ftrace_filter

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# d61f82d0 12-May-2008 Steven Rostedt <srostedt@redhat.com>

ftrace: use dynamic patching for updating mcount calls

This patch replaces the indirect call to the mcount function
pointer with a direct call that will be patched by the
dynamic ftrace routines.

On boot up, the mcount function calls the ftace_stub function.
When the dynamic ftrace code is initialized, the ftrace_stub
is replaced with a call to the ftrace_record_ip, which records
the instruction pointers of the locations that call it.

Later, the ftraced daemon will call kstop_machine and patch all
the locations to nops.

When a ftrace is enabled, the original calls to mcount will now
be set top call ftrace_caller, which will do a direct call
to the registered ftrace function. This direct call is also patched
when the function that should be called is updated.

All patching is performed by a kstop_machine routine to prevent any
type of race conditions that is associated with modifying code
on the fly.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 3c1720f0 12-May-2008 Steven Rostedt <srostedt@redhat.com>

ftrace: move memory management out of arch code

This patch moves the memory management of the ftrace
records out of the arch code and into the generic code
making the arch code simpler.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# b0fc494f 12-May-2008 Steven Rostedt <srostedt@redhat.com>

ftrace: add ftrace_enabled sysctl to disable mcount function

This patch adds back the sysctl ftrace_enabled. This time it is
defaulted to on, if DYNAMIC_FTRACE is configured. When ftrace_enabled
is disabled, the ftrace function is set to the stub return.

If DYNAMIC_FTRACE is also configured, on ftrace_enabled = 0,
the registered ftrace functions will all be set to jmps, but no more
new calls to ftrace recording (used to find the ftrace calling sites)
will be called.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 3d083395 12-May-2008 Steven Rostedt <srostedt@redhat.com>

ftrace: dynamic enabling/disabling of function calls

This patch adds a feature to dynamically replace the ftrace code
with the jmps to allow a kernel with ftrace configured to run
as fast as it can without it configured.

The way this works, is on bootup (if ftrace is enabled), a ftrace
function is registered to record the instruction pointer of all
places that call the function.

Later, if there's still any code to patch, a kthread is awoken
(rate limited to at most once a second) that performs a stop_machine,
and replaces all the code that was called with a jmp over the call
to ftrace. It only replaces what was found the previous time. Typically
the system reaches equilibrium quickly after bootup and there's no code
patching needed at all.

e.g.

call ftrace /* 5 bytes */

is replaced with

jmp 3f /* jmp is 2 bytes and we jump 3 forward */
3:

When we want to enable ftrace for function tracing, the IP recording
is removed, and stop_machine is called again to replace all the locations
of that were recorded back to the call of ftrace. When it is disabled,
we replace the code back to the jmp.

Allocation is done by the kthread. If the ftrace recording function is
called, and we don't have any record slots available, then we simply
skip that call. Once a second a new page (if needed) is allocated for
recording new ftrace function calls. A large batch is allocated at
boot up to get most of the calls there.

Because we do this via stop_machine, we don't have to worry about another
CPU executing a ftrace call as we modify it. But we do need to worry
about NMI's so all functions that might be called via nmi must be
annotated with notrace_nmi. When this code is configured in, the NMI code
will not call notrace.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 16444a8a 12-May-2008 Arnaldo Carvalho de Melo <acme@ghostprotocols.net>

ftrace: add basic support for gcc profiler instrumentation

If CONFIG_FTRACE is selected and /proc/sys/kernel/ftrace_enabled is
set to a non-zero value the ftrace routine will be called everytime
we enter a kernel function that is not marked with the "notrace"
attribute.

The ftrace routine will then call a registered function if a function
happens to be registered.

[ This code has been highly hacked by Steven Rostedt and Ingo Molnar,
so don't blame Arnaldo for all of this ;-) ]

Update:
It is now possible to register more than one ftrace function.
If only one ftrace function is registered, that will be the
function that ftrace calls directly. If more than one function
is registered, then ftrace will call a function that will loop
through the functions to call.

Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>