History log of /linux-master/kernel/trace/trace_events_hist.c
Revision Date Author Comments
# 4f1991a9 07-Jan-2024 Steven Rostedt (Google) <rostedt@goodmis.org>

tracing histograms: Simplify parse_actions() function

The parse_actions() function uses 'len = str_has_prefix()' to test which
action is in the string being parsed. But then it goes and repeats the
logic for each different action. This logic can be simplified and
duplicate code can be removed as 'len' contains the length of the found
prefix which should be used for all actions.

Link: https://lore.kernel.org/all/20240107112044.6702cb66@gandalf.local.home/
Link: https://lore.kernel.org/linux-trace-kernel/20240107203258.37e26d2b@gandalf.local.home

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andy Shevchenko <andy@kernel.org>
Cc: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>


# 1cc111b9 13-Dec-2023 Zheng Yejian <zhengyejian1@huawei.com>

tracing: Fix uaf issue when open the hist or hist_debug file

KASAN report following issue. The root cause is when opening 'hist'
file of an instance and accessing 'trace_event_file' in hist_show(),
but 'trace_event_file' has been freed due to the instance being removed.
'hist_debug' file has the same problem. To fix it, call
tracing_{open,release}_file_tr() in file_operations callback to have
the ref count and avoid 'trace_event_file' being freed.

BUG: KASAN: slab-use-after-free in hist_show+0x11e0/0x1278
Read of size 8 at addr ffff242541e336b8 by task head/190

CPU: 4 PID: 190 Comm: head Not tainted 6.7.0-rc5-g26aff849438c #133
Hardware name: linux,dummy-virt (DT)
Call trace:
dump_backtrace+0x98/0xf8
show_stack+0x1c/0x30
dump_stack_lvl+0x44/0x58
print_report+0xf0/0x5a0
kasan_report+0x80/0xc0
__asan_report_load8_noabort+0x1c/0x28
hist_show+0x11e0/0x1278
seq_read_iter+0x344/0xd78
seq_read+0x128/0x1c0
vfs_read+0x198/0x6c8
ksys_read+0xf4/0x1e0
__arm64_sys_read+0x70/0xa8
invoke_syscall+0x70/0x260
el0_svc_common.constprop.0+0xb0/0x280
do_el0_svc+0x44/0x60
el0_svc+0x34/0x68
el0t_64_sync_handler+0xb8/0xc0
el0t_64_sync+0x168/0x170

Allocated by task 188:
kasan_save_stack+0x28/0x50
kasan_set_track+0x28/0x38
kasan_save_alloc_info+0x20/0x30
__kasan_slab_alloc+0x6c/0x80
kmem_cache_alloc+0x15c/0x4a8
trace_create_new_event+0x84/0x348
__trace_add_new_event+0x18/0x88
event_trace_add_tracer+0xc4/0x1a0
trace_array_create_dir+0x6c/0x100
trace_array_create+0x2e8/0x568
instance_mkdir+0x48/0x80
tracefs_syscall_mkdir+0x90/0xe8
vfs_mkdir+0x3c4/0x610
do_mkdirat+0x144/0x200
__arm64_sys_mkdirat+0x8c/0xc0
invoke_syscall+0x70/0x260
el0_svc_common.constprop.0+0xb0/0x280
do_el0_svc+0x44/0x60
el0_svc+0x34/0x68
el0t_64_sync_handler+0xb8/0xc0
el0t_64_sync+0x168/0x170

Freed by task 191:
kasan_save_stack+0x28/0x50
kasan_set_track+0x28/0x38
kasan_save_free_info+0x34/0x58
__kasan_slab_free+0xe4/0x158
kmem_cache_free+0x19c/0x508
event_file_put+0xa0/0x120
remove_event_file_dir+0x180/0x320
event_trace_del_tracer+0xb0/0x180
__remove_instance+0x224/0x508
instance_rmdir+0x44/0x78
tracefs_syscall_rmdir+0xbc/0x140
vfs_rmdir+0x1cc/0x4c8
do_rmdir+0x220/0x2b8
__arm64_sys_unlinkat+0xc0/0x100
invoke_syscall+0x70/0x260
el0_svc_common.constprop.0+0xb0/0x280
do_el0_svc+0x44/0x60
el0_svc+0x34/0x68
el0t_64_sync_handler+0xb8/0xc0
el0t_64_sync+0x168/0x170

Link: https://lore.kernel.org/linux-trace-kernel/20231214012153.676155-1-zhengyejian1@huawei.com

Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>


# 545db7e2 21-Oct-2023 Christophe JAILLET <christophe.jaillet@wanadoo.fr>

tracing/histograms: Simplify last_cmd_set()

Turn a kzalloc()+strcpy()+strncat() into an equivalent and less verbose
kasprintf().

Link: https://lore.kernel.org/linux-trace-kernel/30b6fb04dadc10a03cc1ad08f5d8a93ef623a167.1697899346.git.christophe.jaillet@wanadoo.fr

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Mukesh ojha <quic_mojha@quicinc.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>


# 4b8b3905 14-Jul-2023 Mohamed Khalfella <mkhalfella@purestorage.com>

tracing/histograms: Return an error if we fail to add histogram to hist_vars list

Commit 6018b585e8c6 ("tracing/histograms: Add histograms to hist_vars if
they have referenced variables") added a check to fail histogram creation
if save_hist_vars() failed to add histogram to hist_vars list. But the
commit failed to set ret to failed return code before jumping to
unregister histogram, fix it.

Link: https://lore.kernel.org/linux-trace-kernel/20230714203341.51396-1-mkhalfella@purestorage.com

Cc: stable@vger.kernel.org
Fixes: 6018b585e8c6 ("tracing/histograms: Add histograms to hist_vars if they have referenced variables")
Signed-off-by: Mohamed Khalfella <mkhalfella@purestorage.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>


# 6018b585 12-Jul-2023 Mohamed Khalfella <mkhalfella@purestorage.com>

tracing/histograms: Add histograms to hist_vars if they have referenced variables

Hist triggers can have referenced variables without having direct
variables fields. This can be the case if referenced variables are added
for trigger actions. In this case the newly added references will not
have field variables. Not taking such referenced variables into
consideration can result in a bug where it would be possible to remove
hist trigger with variables being refenced. This will result in a bug
that is easily reproducable like so

$ cd /sys/kernel/tracing
$ echo 'synthetic_sys_enter char[] comm; long id' >> synthetic_events
$ echo 'hist:keys=common_pid.execname,id.syscall:vals=hitcount:comm=common_pid.execname' >> events/raw_syscalls/sys_enter/trigger
$ echo 'hist:keys=common_pid.execname,id.syscall:onmatch(raw_syscalls.sys_enter).synthetic_sys_enter($comm, id)' >> events/raw_syscalls/sys_enter/trigger
$ echo '!hist:keys=common_pid.execname,id.syscall:vals=hitcount:comm=common_pid.execname' >> events/raw_syscalls/sys_enter/trigger

[ 100.263533] ==================================================================
[ 100.264634] BUG: KASAN: slab-use-after-free in resolve_var_refs+0xc7/0x180
[ 100.265520] Read of size 8 at addr ffff88810375d0f0 by task bash/439
[ 100.266320]
[ 100.266533] CPU: 2 PID: 439 Comm: bash Not tainted 6.5.0-rc1 #4
[ 100.267277] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-20220807_005459-localhost 04/01/2014
[ 100.268561] Call Trace:
[ 100.268902] <TASK>
[ 100.269189] dump_stack_lvl+0x4c/0x70
[ 100.269680] print_report+0xc5/0x600
[ 100.270165] ? resolve_var_refs+0xc7/0x180
[ 100.270697] ? kasan_complete_mode_report_info+0x80/0x1f0
[ 100.271389] ? resolve_var_refs+0xc7/0x180
[ 100.271913] kasan_report+0xbd/0x100
[ 100.272380] ? resolve_var_refs+0xc7/0x180
[ 100.272920] __asan_load8+0x71/0xa0
[ 100.273377] resolve_var_refs+0xc7/0x180
[ 100.273888] event_hist_trigger+0x749/0x860
[ 100.274505] ? kasan_save_stack+0x2a/0x50
[ 100.275024] ? kasan_set_track+0x29/0x40
[ 100.275536] ? __pfx_event_hist_trigger+0x10/0x10
[ 100.276138] ? ksys_write+0xd1/0x170
[ 100.276607] ? do_syscall_64+0x3c/0x90
[ 100.277099] ? entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 100.277771] ? destroy_hist_data+0x446/0x470
[ 100.278324] ? event_hist_trigger_parse+0xa6c/0x3860
[ 100.278962] ? __pfx_event_hist_trigger_parse+0x10/0x10
[ 100.279627] ? __kasan_check_write+0x18/0x20
[ 100.280177] ? mutex_unlock+0x85/0xd0
[ 100.280660] ? __pfx_mutex_unlock+0x10/0x10
[ 100.281200] ? kfree+0x7b/0x120
[ 100.281619] ? ____kasan_slab_free+0x15d/0x1d0
[ 100.282197] ? event_trigger_write+0xac/0x100
[ 100.282764] ? __kasan_slab_free+0x16/0x20
[ 100.283293] ? __kmem_cache_free+0x153/0x2f0
[ 100.283844] ? sched_mm_cid_remote_clear+0xb1/0x250
[ 100.284550] ? __pfx_sched_mm_cid_remote_clear+0x10/0x10
[ 100.285221] ? event_trigger_write+0xbc/0x100
[ 100.285781] ? __kasan_check_read+0x15/0x20
[ 100.286321] ? __bitmap_weight+0x66/0xa0
[ 100.286833] ? _find_next_bit+0x46/0xe0
[ 100.287334] ? task_mm_cid_work+0x37f/0x450
[ 100.287872] event_triggers_call+0x84/0x150
[ 100.288408] trace_event_buffer_commit+0x339/0x430
[ 100.289073] ? ring_buffer_event_data+0x3f/0x60
[ 100.292189] trace_event_raw_event_sys_enter+0x8b/0xe0
[ 100.295434] syscall_trace_enter.constprop.0+0x18f/0x1b0
[ 100.298653] syscall_enter_from_user_mode+0x32/0x40
[ 100.301808] do_syscall_64+0x1a/0x90
[ 100.304748] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 100.307775] RIP: 0033:0x7f686c75c1cb
[ 100.310617] Code: 73 01 c3 48 8b 0d 65 3c 10 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 21 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 35 3c 10 00 f7 d8 64 89 01 48
[ 100.317847] RSP: 002b:00007ffc60137a38 EFLAGS: 00000246 ORIG_RAX: 0000000000000021
[ 100.321200] RAX: ffffffffffffffda RBX: 000055f566469ea0 RCX: 00007f686c75c1cb
[ 100.324631] RDX: 0000000000000001 RSI: 0000000000000001 RDI: 000000000000000a
[ 100.328104] RBP: 00007ffc60137ac0 R08: 00007f686c818460 R09: 000000000000000a
[ 100.331509] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000009
[ 100.334992] R13: 0000000000000007 R14: 000000000000000a R15: 0000000000000007
[ 100.338381] </TASK>

We hit the bug because when second hist trigger has was created
has_hist_vars() returned false because hist trigger did not have
variables. As a result of that save_hist_vars() was not called to add
the trigger to trace_array->hist_vars. Later on when we attempted to
remove the first histogram find_any_var_ref() failed to detect it is
being used because it did not find the second trigger in hist_vars list.

With this change we wait until trigger actions are created so we can take
into consideration if hist trigger has variable references. Also, now we
check the return value of save_hist_vars() and fail trigger creation if
save_hist_vars() fails.

Link: https://lore.kernel.org/linux-trace-kernel/20230712223021.636335-1-mkhalfella@purestorage.com

Cc: stable@vger.kernel.org
Fixes: 067fe038e70f6 ("tracing: Add variable reference handling to hist triggers")
Signed-off-by: Mohamed Khalfella <mkhalfella@purestorage.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>


# 4b512860 23-May-2023 Steven Rostedt (Google) <rostedt@goodmis.org>

tracing: Rename stacktrace field to common_stacktrace

The histogram and synthetic events can use a pseudo event called
"stacktrace" that will create a stacktrace at the time of the event and
use it just like it was a normal field. We have other pseudo events such
as "common_cpu" and "common_timestamp". To stay consistent with that,
convert "stacktrace" to "common_stacktrace". As this was used in older
kernels, to keep backward compatibility, this will act just like
"common_cpu" did with "cpu". That is, "cpu" will be the same as
"common_cpu" unless the event has a "cpu" field. In which case, the
event's field is used. The same is true with "stacktrace".

Also update the documentation to reflect this change.

Link: https://lore.kernel.org/linux-trace-kernel/20230523230913.6860e28d@rorschach.local.home

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Tom Zanussi <zanussi@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>


# e30fbc61 23-May-2023 Steven Rostedt (Google) <rostedt@goodmis.org>

tracing/histograms: Allow variables to have some modifiers

Modifiers are used to change the behavior of keys. For instance, they
can grouped into buckets, converted to syscall names (from the syscall
identifier), show task->comm of the current pid, be an array of longs
that represent a stacktrace, and more.

It was found that nothing stopped a value from taking a modifier. As
values are simple counters. If this happened, it would call code that
was not expecting a modifier and crash the kernel. This was fixed by
having the ___create_val_field() function test if a modifier was present
and fail if one was. This fixed the crash.

Now there's a problem with variables. Variables are used to pass fields
from one event to another. Variables are allowed to have some modifiers,
as the processing may need to happen at the time of the event (like
stacktraces and comm names of the current pid). The issue is that it too
uses __create_val_field(). Now that fails on modifiers, variables can no
longer use them (this is a regression).

As not all modifiers are for variables, have them use a separate check.

Link: https://lore.kernel.org/linux-trace-kernel/20230523221108.064a5d82@rorschach.local.home

Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Tom Zanussi <zanussi@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Fixes: e0213434fe3e4 ("tracing: Do not let histogram values have some modifiers")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>


# 9f116f76 01-Mar-2023 Steven Rostedt (Google) <rostedt@goodmis.org>

tracing: Check field value in hist_field_name()

The function hist_field_name() cannot handle being passed a NULL field
parameter. It should never be NULL, but due to a previous bug, NULL was
passed to the function and the kernel crashed due to a NULL dereference.
Mark Rutland reported this to me on IRC.

The bug was fixed, but to prevent future bugs from crashing the kernel,
check the field and add a WARN_ON() if it is NULL.

Link: https://lkml.kernel.org/r/20230302020810.762384440@goodmis.org

Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Reported-by: Mark Rutland <mark.rutland@arm.com>
Fixes: c6afad49d127f ("tracing: Add hist trigger 'sym' and 'sym-offset' modifiers")
Tested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>


# e0213434 01-Mar-2023 Steven Rostedt (Google) <rostedt@goodmis.org>

tracing: Do not let histogram values have some modifiers

Histogram values can not be strings, stacktraces, graphs, symbols,
syscalls, or grouped in buckets or log. Give an error if a value is set to
do so.

Note, the histogram code was not prepared to handle these modifiers for
histograms and caused a bug.

Mark Rutland reported:

# echo 'p:copy_to_user __arch_copy_to_user n=$arg2' >> /sys/kernel/tracing/kprobe_events
# echo 'hist:keys=n:vals=hitcount.buckets=8:sort=hitcount' > /sys/kernel/tracing/events/kprobes/copy_to_user/trigger
# cat /sys/kernel/tracing/events/kprobes/copy_to_user/hist
[ 143.694628] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
[ 143.695190] Mem abort info:
[ 143.695362] ESR = 0x0000000096000004
[ 143.695604] EC = 0x25: DABT (current EL), IL = 32 bits
[ 143.695889] SET = 0, FnV = 0
[ 143.696077] EA = 0, S1PTW = 0
[ 143.696302] FSC = 0x04: level 0 translation fault
[ 143.702381] Data abort info:
[ 143.702614] ISV = 0, ISS = 0x00000004
[ 143.702832] CM = 0, WnR = 0
[ 143.703087] user pgtable: 4k pages, 48-bit VAs, pgdp=00000000448f9000
[ 143.703407] [0000000000000000] pgd=0000000000000000, p4d=0000000000000000
[ 143.704137] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
[ 143.704714] Modules linked in:
[ 143.705273] CPU: 0 PID: 133 Comm: cat Not tainted 6.2.0-00003-g6fc512c10a7c #3
[ 143.706138] Hardware name: linux,dummy-virt (DT)
[ 143.706723] pstate: 80000005 (Nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 143.707120] pc : hist_field_name.part.0+0x14/0x140
[ 143.707504] lr : hist_field_name.part.0+0x104/0x140
[ 143.707774] sp : ffff800008333a30
[ 143.707952] x29: ffff800008333a30 x28: 0000000000000001 x27: 0000000000400cc0
[ 143.708429] x26: ffffd7a653b20260 x25: 0000000000000000 x24: ffff10d303ee5800
[ 143.708776] x23: ffffd7a6539b27b0 x22: ffff10d303fb8c00 x21: 0000000000000001
[ 143.709127] x20: ffff10d303ec2000 x19: 0000000000000000 x18: 0000000000000000
[ 143.709478] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
[ 143.709824] x14: 0000000000000000 x13: 203a6f666e692072 x12: 6567676972742023
[ 143.710179] x11: 0a230a6d6172676f x10: 000000000000002c x9 : ffffd7a6521e018c
[ 143.710584] x8 : 000000000000002c x7 : 7f7f7f7f7f7f7f7f x6 : 000000000000002c
[ 143.710915] x5 : ffff10d303b0103e x4 : ffffd7a653b20261 x3 : 000000000000003d
[ 143.711239] x2 : 0000000000020001 x1 : 0000000000000001 x0 : 0000000000000000
[ 143.711746] Call trace:
[ 143.712115] hist_field_name.part.0+0x14/0x140
[ 143.712642] hist_field_name.part.0+0x104/0x140
[ 143.712925] hist_field_print+0x28/0x140
[ 143.713125] event_hist_trigger_print+0x174/0x4d0
[ 143.713348] hist_show+0xf8/0x980
[ 143.713521] seq_read_iter+0x1bc/0x4b0
[ 143.713711] seq_read+0x8c/0xc4
[ 143.713876] vfs_read+0xc8/0x2a4
[ 143.714043] ksys_read+0x70/0xfc
[ 143.714218] __arm64_sys_read+0x24/0x30
[ 143.714400] invoke_syscall+0x50/0x120
[ 143.714587] el0_svc_common.constprop.0+0x4c/0x100
[ 143.714807] do_el0_svc+0x44/0xd0
[ 143.714970] el0_svc+0x2c/0x84
[ 143.715134] el0t_64_sync_handler+0xbc/0x140
[ 143.715334] el0t_64_sync+0x190/0x194
[ 143.715742] Code: a9bd7bfd 910003fd a90153f3 aa0003f3 (f9400000)
[ 143.716510] ---[ end trace 0000000000000000 ]---
Segmentation fault

Link: https://lkml.kernel.org/r/20230302020810.559462599@goodmis.org

Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Fixes: c6afad49d127f ("tracing: Add hist trigger 'sym' and 'sym-offset' modifiers")
Reported-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>


# f5914b30 10-Feb-2023 Tom Zanussi <zanussi@kernel.org>

tracing/histogram: Fix stacktrace key

The current code will always use the current stacktrace as a key even
if a stacktrace contained in a specific event field was specified.

For example, we expect to use the 'unsigned long[] stack' field in the
below event in the histogram:

# echo 's:block_lat pid_t pid; u64 delta; unsigned long[] stack;' > /sys/kernel/debug/tracing/dynamic_events
# echo 'hist:keys=delta.buckets=100,stack.stacktrace:sort=delta' > /sys/kernel/debug/tracing/events/synthetic/block_lat/trigger

But in fact, when we type out the trigger, we see that it's using the
plain old global 'stacktrace' as the key, which is just the stacktrace
when the event was hit and not the stacktrace contained in the event,
which is what we want:

# cat /sys/kernel/debug/tracing/events/synthetic/block_lat/trigger
hist:keys=delta.buckets=100,stacktrace:vals=hitcount:sort=delta.buckets=100:size=2048 [active]

And in fact, there's no code to actually retrieve it from the event,
so we need to add HIST_FIELD_FN_STACK and hist_field_stack() to get it
and hook it into the trigger code. For now, since the stack is just
using dynamic strings, this could just use the dynamic string
function, but it seems cleaner to have a dedicated function an be able
to tweak independently as necessary.

Link: https://lkml.kernel.org/r/11aa614c82976adbfa4ea763dbe885b5fb01d59c.1676063532.git.zanussi@kernel.org

Signed-off-by: Tom Zanussi <zanussi@kernel.org>
[ Fixed 32bit build warning reported by kernel test robot <lkp@intel.com> ]
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>


# 2bacfd9f 10-Feb-2023 Tom Zanussi <zanussi@kernel.org>

tracing/histogram: Fix a few problems with stacktrace variable printing

Currently, there are a few problems when printing hist triggers and
trace output when using stacktrace variables. This fixes the problems
seen below:

# echo 'hist:keys=delta.buckets=100,stack.stacktrace:sort=delta' > /sys/kernel/debug/tracing/events/synthetic/block_lat/trigger
# cat /sys/kernel/debug/tracing/events/synthetic/block_lat/trigger
hist:keys=delta.buckets=100,stacktrace:vals=hitcount:sort=delta.buckets=100:size=2048 [active]

# echo 'hist:keys=next_pid:ts=common_timestamp.usecs,st=stacktrace if prev_state == 2' >> /sys/kernel/debug/tracing/events/sched/sched_switch/trigger
# cat /sys/kernel/debug/tracing/events/sched/sched_switch/trigger
hist:keys=next_pid:vals=hitcount:ts=common_timestamp.usecs,st=stacktrace.stacktrace:sort=hitcount:size=2048:clock=global if prev_state == 2 [active]

and also in the trace output (should be stack.stacktrace):

{ delta: ~ 100-199, stacktrace __schedule+0xa19/0x1520

Link: https://lkml.kernel.org/r/60bebd4e546728e012a7a2bcbf58716d48ba6edb.1676063532.git.zanussi@kernel.org

Signed-off-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>


# 8261ef2e 15-Feb-2023 Steven Rostedt (Google) <rostedt@goodmis.org>

tracing: Add BUILD_BUG() to make sure stacktrace fits in strings

The max string length for a histogram variable is 256 bytes. The max depth
of a stacktrace is 16. With 8byte words, that's 16 * 8 = 128. Which can
easily fit in the string variable. The histogram stacktrace is being
stored in the string value (with the given max length), with the
assumption it will fit. To make sure that this is always the case (in the
case that the stack trace depth increases), add a BUILD_BUG_ON() to test
this.

Link: https://lore.kernel.org/linux-trace-kernel/20230214002418.0103b9e765d3e5c374d2aa7d@kernel.org/

Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>


# fc1a9dc1 10-Feb-2023 Tom Zanussi <zanussi@kernel.org>

tracing/histogram: Don't use strlen to find length of stacktrace variables

Because stacktraces are saved in dynamic strings,
trace_event_raw_event_synth() uses strlen to determine the length of
the stack. Stacktraces may contain 0-bytes, though, in the saved
addresses, so the length found and passed to reserve() will be too
small.

Fix this by using the first unsigned long in the stack variables to
store the actual number of elements in the stack and have
trace_event_raw_event_synth() use that to determine the length of the
stack.

Link: https://lkml.kernel.org/r/1ed6906cd9d6477ef2bd8e63c61de20a9ffe64d7.1676063532.git.zanussi@kernel.org

Signed-off-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>


# cc5fc8bf 17-Jan-2023 Steven Rostedt (Google) <rostedt@goodmis.org>

tracing/histogram: Add stacktrace type

Now that stacktraces can be part of synthetic events, allow a key to be
typed as a stacktrace.

# cd /sys/kernel/tracing
# echo 's:block_lat u64 delta; unsigned long stack[];' >> dynamic_events
# echo 'hist:keys=next_pid:ts=common_timestamp.usecs,st=stacktrace if prev_state == 2' >> events/sched/sched_switch/trigger
# echo 'hist:keys=prev_pid:delta=common_timestamp.usecs-$ts,st2=$st:onmatch(sched.sched_switch).trace(block_lat,$delta,$st2)' >> events/sched/sched_switch/trigger
# echo 'hist:keys=delta.buckets=100,stack.stacktrace:sort=delta' > events/synthetic/block_lat/trigger

# cat events/synthetic/block_lat/hist

# event histogram
#
# trigger info: hist:keys=delta.buckets=100,stacktrace:vals=hitcount:sort=delta.buckets=100:size=2048 [active]
#

{ delta: ~ 0-99, stacktrace:
event_hist_trigger+0x464/0x480
event_triggers_call+0x52/0xe0
trace_event_buffer_commit+0x193/0x250
trace_event_raw_event_sched_switch+0xfc/0x150
__traceiter_sched_switch+0x41/0x60
__schedule+0x448/0x7b0
schedule_idle+0x26/0x40
cpu_startup_entry+0x19/0x20
start_secondary+0xed/0xf0
secondary_startup_64_no_verify+0xe0/0xeb
} hitcount: 6
{ delta: ~ 0-99, stacktrace:
event_hist_trigger+0x464/0x480
event_triggers_call+0x52/0xe0
trace_event_buffer_commit+0x193/0x250
trace_event_raw_event_sched_switch+0xfc/0x150
__traceiter_sched_switch+0x41/0x60
__schedule+0x448/0x7b0
schedule_idle+0x26/0x40
cpu_startup_entry+0x19/0x20
__pfx_kernel_init+0x0/0x10
arch_call_rest_init+0xa/0x24
start_kernel+0x964/0x98d
secondary_startup_64_no_verify+0xe0/0xeb
} hitcount: 3
{ delta: ~ 0-99, stacktrace:
event_hist_trigger+0x464/0x480
event_triggers_call+0x52/0xe0
trace_event_buffer_commit+0x193/0x250
trace_event_raw_event_sched_switch+0xfc/0x150
__traceiter_sched_switch+0x41/0x60
__schedule+0x448/0x7b0
schedule+0x5a/0xb0
worker_thread+0xaf/0x380
kthread+0xe9/0x110
ret_from_fork+0x2c/0x50
} hitcount: 1
{ delta: ~ 100-199, stacktrace:
event_hist_trigger+0x464/0x480
event_triggers_call+0x52/0xe0
trace_event_buffer_commit+0x193/0x250
trace_event_raw_event_sched_switch+0xfc/0x150
__traceiter_sched_switch+0x41/0x60
__schedule+0x448/0x7b0
schedule_idle+0x26/0x40
cpu_startup_entry+0x19/0x20
start_secondary+0xed/0xf0
secondary_startup_64_no_verify+0xe0/0xeb
} hitcount: 15
[..]
{ delta: ~ 8500-8599, stacktrace:
event_hist_trigger+0x464/0x480
event_triggers_call+0x52/0xe0
trace_event_buffer_commit+0x193/0x250
trace_event_raw_event_sched_switch+0xfc/0x150
__traceiter_sched_switch+0x41/0x60
__schedule+0x448/0x7b0
schedule_idle+0x26/0x40
cpu_startup_entry+0x19/0x20
start_secondary+0xed/0xf0
secondary_startup_64_no_verify+0xe0/0xeb
} hitcount: 1

Totals:
Hits: 89
Entries: 11
Dropped: 0

Link: https://lkml.kernel.org/r/20230117152236.167046397@goodmis.org

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Tom Zanussi <zanussi@kernel.org>
Cc: Ross Zwisler <zwisler@google.com>
Cc: Ching-lin Yu <chinglinyu@google.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>


# 00cf3d67 17-Jan-2023 Steven Rostedt (Google) <rostedt@goodmis.org>

tracing: Allow synthetic events to pass around stacktraces

Allow a stacktrace from one event to be displayed by the end event of a
synthetic event. This is very useful when looking for the longest latency
of a sleep or something blocked on I/O.

# cd /sys/kernel/tracing/
# echo 's:block_lat pid_t pid; u64 delta; unsigned long[] stack;' > dynamic_events
# echo 'hist:keys=next_pid:ts=common_timestamp.usecs,st=stacktrace if prev_state == 1||prev_state == 2' > events/sched/sched_switch/trigger
# echo 'hist:keys=prev_pid:delta=common_timestamp.usecs-$ts,s=$st:onmax($delta).trace(block_lat,prev_pid,$delta,$s)' >> events/sched/sched_switch/trigger

The above creates a "block_lat" synthetic event that take the stacktrace of
when a task schedules out in either the interruptible or uninterruptible
states, and on a new per process max $delta (the time it was scheduled
out), will print the process id and the stacktrace.

# echo 1 > events/synthetic/block_lat/enable
# cat trace
# TASK-PID CPU# ||||| TIMESTAMP FUNCTION
# | | | ||||| | |
kworker/u16:0-767 [006] d..4. 560.645045: block_lat: pid=767 delta=66 stack=STACK:
=> __schedule
=> schedule
=> pipe_read
=> vfs_read
=> ksys_read
=> do_syscall_64
=> 0x966000aa

<idle>-0 [003] d..4. 561.132117: block_lat: pid=0 delta=413787 stack=STACK:
=> __schedule
=> schedule
=> schedule_hrtimeout_range_clock
=> do_sys_poll
=> __x64_sys_poll
=> do_syscall_64
=> 0x966000aa

<...>-153 [006] d..4. 562.068407: block_lat: pid=153 delta=54 stack=STACK:
=> __schedule
=> schedule
=> io_schedule
=> rq_qos_wait
=> wbt_wait
=> __rq_qos_throttle
=> blk_mq_submit_bio
=> submit_bio_noacct_nocheck
=> ext4_bio_write_page
=> mpage_submit_page
=> mpage_process_page_bufs
=> mpage_prepare_extent_to_map
=> ext4_do_writepages
=> ext4_writepages
=> do_writepages
=> __writeback_single_inode

Link: https://lkml.kernel.org/r/20230117152236.010941267@goodmis.org

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Tom Zanussi <zanussi@kernel.org>
Cc: Ross Zwisler <zwisler@google.com>
Cc: Ching-lin Yu <chinglinyu@google.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>


# 288709c9 17-Jan-2023 Steven Rostedt (Google) <rostedt@goodmis.org>

tracing: Allow stacktraces to be saved as histogram variables

Allow to save stacktraces into a histogram variable. This will be used by
synthetic events to allow a stacktrace from one event to be passed and
displayed by another event.

The special keyword "stacktrace" is to be used to trigger a stack
trace for the event that the histogram trigger is attached to.

echo 'hist:keys=pid:st=stacktrace" > events/sched/sched_waking/trigger

Currently nothing can get access to the "$st" variable above that contains
the stack trace, but that will soon change.

Link: https://lkml.kernel.org/r/20230117152235.856323729@goodmis.org

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Tom Zanussi <zanussi@kernel.org>
Cc: Ross Zwisler <zwisler@google.com>
Cc: Ching-lin Yu <chinglinyu@google.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>


# 8b152e91 11-Jan-2023 Natalia Petrova <n.petrova@fintech.ru>

trace_events_hist: add check for return value of 'create_hist_field'

Function 'create_hist_field' is called recursively at
trace_events_hist.c:1954 and can return NULL-value that's why we have
to check it to avoid null pointer dereference.

Found by Linux Verification Center (linuxtesting.org) with SVACE.

Link: https://lkml.kernel.org/r/20230111120409.4111-1-n.petrova@fintech.ru

Cc: stable@vger.kernel.org
Fixes: 30350d65ac56 ("tracing: Add variable support to hist triggers")
Signed-off-by: Natalia Petrova <n.petrova@fintech.ru>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>


# 608c6ed3 07-Dec-2022 Zheng Yejian <zhengyejian1@huawei.com>

tracing/hist: Fix issue of losting command info in error_log

When input some constructed invalid 'trigger' command, command info
in 'error_log' are lost [1].

The root cause is that there is a path that event_hist_trigger_parse()
is recursely called once and 'last_cmd' which save origin command is
cleared, then later calling of hist_err() will no longer record origin
command info:

event_hist_trigger_parse() {
last_cmd_set() // <1> 'last_cmd' save origin command here at first
create_actions() {
onmatch_create() {
action_create() {
trace_action_create() {
trace_action_create_field_var() {
create_field_var_hist() {
event_hist_trigger_parse() { // <2> recursely called once
hist_err_clear() // <3> 'last_cmd' is cleared here
}
hist_err() // <4> No longer find origin command!!!

Since 'glob' is empty string while running into the recurse call, we
can trickly check it and bypass the call of hist_err_clear() to solve it.

[1]
# cd /sys/kernel/tracing
# echo "my_synth_event int v1; int v2; int v3;" >> synthetic_events
# echo 'hist:keys=pid' >> events/sched/sched_waking/trigger
# echo "hist:keys=next_pid:onmatch(sched.sched_waking).my_synth_event(\
pid,pid1)" >> events/sched/sched_switch/trigger
# cat error_log
[ 8.405018] hist:sched:sched_switch: error: Couldn't find synthetic event
Command:
hist:keys=next_pid:onmatch(sched.sched_waking).my_synth_event(pid,pid1)
^
[ 8.816902] hist:sched:sched_switch: error: Couldn't find field
Command:
hist:keys=next_pid:onmatch(sched.sched_waking).my_synth_event(pid,pid1)
^
[ 8.816902] hist:sched:sched_switch: error: Couldn't parse field variable
Command:
hist:keys=next_pid:onmatch(sched.sched_waking).my_synth_event(pid,pid1)
^
[ 8.999880] : error: Couldn't find field
Command:
^
[ 8.999880] : error: Couldn't parse field variable
Command:
^
[ 8.999880] : error: Couldn't find field
Command:
^
[ 8.999880] : error: Couldn't create histogram for field
Command:
^

Link: https://lore.kernel.org/linux-trace-kernel/20221207135326.3483216-1-zhengyejian1@huawei.com

Cc: <mhiramat@kernel.org>
Cc: <zanussi@kernel.org>
Fixes: f404da6e1d46 ("tracing: Add 'last error' error facility for hist triggers")
Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>


# 82470f7d 06-Dec-2022 Zheng Yejian <zhengyejian1@huawei.com>

tracing/hist: Fix out-of-bound write on 'action_data.var_ref_idx'

When generate a synthetic event with many params and then create a trace
action for it [1], kernel panic happened [2].

It is because that in trace_action_create() 'data->n_params' is up to
SYNTH_FIELDS_MAX (current value is 64), and array 'data->var_ref_idx'
keeps indices into array 'hist_data->var_refs' for each synthetic event
param, but the length of 'data->var_ref_idx' is TRACING_MAP_VARS_MAX
(current value is 16), so out-of-bound write happened when 'data->n_params'
more than 16. In this case, 'data->match_data.event' is overwritten and
eventually cause the panic.

To solve the issue, adjust the length of 'data->var_ref_idx' to be
SYNTH_FIELDS_MAX and add sanity checks to avoid out-of-bound write.

[1]
# cd /sys/kernel/tracing/
# echo "my_synth_event int v1; int v2; int v3; int v4; int v5; int v6;\
int v7; int v8; int v9; int v10; int v11; int v12; int v13; int v14;\
int v15; int v16; int v17; int v18; int v19; int v20; int v21; int v22;\
int v23; int v24; int v25; int v26; int v27; int v28; int v29; int v30;\
int v31; int v32; int v33; int v34; int v35; int v36; int v37; int v38;\
int v39; int v40; int v41; int v42; int v43; int v44; int v45; int v46;\
int v47; int v48; int v49; int v50; int v51; int v52; int v53; int v54;\
int v55; int v56; int v57; int v58; int v59; int v60; int v61; int v62;\
int v63" >> synthetic_events
# echo 'hist:keys=pid:ts0=common_timestamp.usecs if comm=="bash"' >> \
events/sched/sched_waking/trigger
# echo "hist:keys=next_pid:onmatch(sched.sched_waking).my_synth_event(\
pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,\
pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,\
pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,\
pid,pid,pid,pid,pid,pid,pid,pid,pid)" >> events/sched/sched_switch/trigger

[2]
BUG: unable to handle page fault for address: ffff91c900000000
PGD 61001067 P4D 61001067 PUD 0
Oops: 0000 [#1] PREEMPT SMP NOPTI
CPU: 2 PID: 322 Comm: bash Tainted: G W 6.1.0-rc8+ #229
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014
RIP: 0010:strcmp+0xc/0x30
Code: 75 f7 31 d2 44 0f b6 04 16 44 88 04 11 48 83 c2 01 45 84 c0 75 ee
c3 cc cc cc cc 0f 1f 00 31 c0 eb 08 48 83 c0 01 84 d2 74 13 <0f> b6 14
07 3a 14 06 74 ef 19 c0 83 c8 01 c3 cc cc cc cc 31 c3
RSP: 0018:ffff9b3b00f53c48 EFLAGS: 00000246
RAX: 0000000000000000 RBX: ffffffffba958a68 RCX: 0000000000000000
RDX: 0000000000000010 RSI: ffff91c943d33a90 RDI: ffff91c900000000
RBP: ffff91c900000000 R08: 00000018d604b529 R09: 0000000000000000
R10: ffff91c9483eddb1 R11: ffff91ca483eddab R12: ffff91c946171580
R13: ffff91c9479f0538 R14: ffff91c9457c2848 R15: ffff91c9479f0538
FS: 00007f1d1cfbe740(0000) GS:ffff91c9bdc80000(0000)
knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffff91c900000000 CR3: 0000000006316000 CR4: 00000000000006e0
Call Trace:
<TASK>
__find_event_file+0x55/0x90
action_create+0x76c/0x1060
event_hist_trigger_parse+0x146d/0x2060
? event_trigger_write+0x31/0xd0
trigger_process_regex+0xbb/0x110
event_trigger_write+0x6b/0xd0
vfs_write+0xc8/0x3e0
? alloc_fd+0xc0/0x160
? preempt_count_add+0x4d/0xa0
? preempt_count_add+0x70/0xa0
ksys_write+0x5f/0xe0
do_syscall_64+0x3b/0x90
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7f1d1d0cf077
Code: 64 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e
fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00
f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74
RSP: 002b:00007ffcebb0e568 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000000143 RCX: 00007f1d1d0cf077
RDX: 0000000000000143 RSI: 00005639265aa7e0 RDI: 0000000000000001
RBP: 00005639265aa7e0 R08: 000000000000000a R09: 0000000000000142
R10: 000056392639c017 R11: 0000000000000246 R12: 0000000000000143
R13: 00007f1d1d1ae6a0 R14: 00007f1d1d1aa4a0 R15: 00007f1d1d1a98a0
</TASK>
Modules linked in:
CR2: ffff91c900000000
---[ end trace 0000000000000000 ]---
RIP: 0010:strcmp+0xc/0x30
Code: 75 f7 31 d2 44 0f b6 04 16 44 88 04 11 48 83 c2 01 45 84 c0 75 ee
c3 cc cc cc cc 0f 1f 00 31 c0 eb 08 48 83 c0 01 84 d2 74 13 <0f> b6 14
07 3a 14 06 74 ef 19 c0 83 c8 01 c3 cc cc cc cc 31 c3
RSP: 0018:ffff9b3b00f53c48 EFLAGS: 00000246
RAX: 0000000000000000 RBX: ffffffffba958a68 RCX: 0000000000000000
RDX: 0000000000000010 RSI: ffff91c943d33a90 RDI: ffff91c900000000
RBP: ffff91c900000000 R08: 00000018d604b529 R09: 0000000000000000
R10: ffff91c9483eddb1 R11: ffff91ca483eddab R12: ffff91c946171580
R13: ffff91c9479f0538 R14: ffff91c9457c2848 R15: ffff91c9479f0538
FS: 00007f1d1cfbe740(0000) GS:ffff91c9bdc80000(0000)
knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffff91c900000000 CR3: 0000000006316000 CR4: 00000000000006e0

Link: https://lore.kernel.org/linux-trace-kernel/20221207035143.2278781-1-zhengyejian1@huawei.com

Cc: <mhiramat@kernel.org>
Cc: <zanussi@kernel.org>
Cc: stable@vger.kernel.org
Fixes: d380dcde9a07 ("tracing: Fix now invalid var_ref_vals assumption in trace action")
Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>


# 2cc6a528 06-Dec-2022 Zheng Yejian <zhengyejian1@huawei.com>

tracing/hist: Fix wrong return value in parse_action_params()

When number of synth fields is more than SYNTH_FIELDS_MAX,
parse_action_params() should return -EINVAL.

Link: https://lore.kernel.org/linux-trace-kernel/20221207034635.2253990-1-zhengyejian1@huawei.com

Cc: <mhiramat@kernel.org>
Cc: <zanussi@kernel.org>
Cc: stable@vger.kernel.org
Fixes: c282a386a397 ("tracing: Add 'onmatch' hist trigger action support")
Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>


# ccf47f5c 19-Oct-2022 Masami Hiramatsu (Google) <mhiramat@kernel.org>

tracing: Add nohitcount option for suppressing display of raw hitcount

Add 'nohitcount' ('NOHC' for short) option for suppressing display of
the raw hitcount column in the histogram.
Note that you must specify at least one value except raw 'hitcount'
when you specify this nohitcount option.

# cd /sys/kernel/debug/tracing/
# echo hist:keys=pid:vals=runtime.percent,runtime.graph:sort=pid:NOHC > \
events/sched/sched_stat_runtime/trigger
# sleep 10
# cat events/sched/sched_stat_runtime/hist
# event histogram
#
# trigger info: hist:keys=pid:vals=runtime.percent,runtime.graph:sort=pid:size=2048:nohitcount [active]
#

{ pid: 8 } runtime (%): 3.02 runtime: #
{ pid: 14 } runtime (%): 2.25 runtime:
{ pid: 16 } runtime (%): 2.25 runtime:
{ pid: 26 } runtime (%): 0.17 runtime:
{ pid: 61 } runtime (%): 11.52 runtime: ####
{ pid: 67 } runtime (%): 1.56 runtime:
{ pid: 68 } runtime (%): 0.84 runtime:
{ pid: 76 } runtime (%): 0.92 runtime:
{ pid: 117 } runtime (%): 2.50 runtime: #
{ pid: 146 } runtime (%): 49.88 runtime: ####################
{ pid: 157 } runtime (%): 16.63 runtime: ######
{ pid: 158 } runtime (%): 8.38 runtime: ###

Link: https://lore.kernel.org/linux-trace-kernel/166610814787.56030.4980636083486339906.stgit@devnote2

Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Tom Zanussi <zanussi@kernel.org>
Tested-by: Tom Zanussi <zanussi@kernel.org>


# a2c54256 19-Oct-2022 Masami Hiramatsu (Google) <mhiramat@kernel.org>

tracing: Add .graph suffix option to histogram value

Add the .graph suffix which shows the bar graph of the histogram value.

For example, the below example shows that the bar graph
of the histogram of the runtime for each tasks.

------
# cd /sys/kernel/debug/tracing/
# echo hist:keys=pid:vals=runtime.graph:sort=pid > \
events/sched/sched_stat_runtime/trigger
# sleep 10
# cat events/sched/sched_stat_runtime/hist
# event histogram
#
# trigger info: hist:keys=pid:vals=hitcount,runtime.graph:sort=pid:size=2048 [active]
#

{ pid: 14 } hitcount: 2 runtime:
{ pid: 16 } hitcount: 8 runtime:
{ pid: 26 } hitcount: 1 runtime:
{ pid: 57 } hitcount: 3 runtime:
{ pid: 61 } hitcount: 20 runtime: ###
{ pid: 66 } hitcount: 2 runtime:
{ pid: 70 } hitcount: 3 runtime:
{ pid: 72 } hitcount: 2 runtime:
{ pid: 145 } hitcount: 14 runtime: ####################
{ pid: 152 } hitcount: 5 runtime: #######
{ pid: 153 } hitcount: 2 runtime: ####

Totals:
Hits: 62
Entries: 11
Dropped: 0
-------

Link: https://lore.kernel.org/linux-trace-kernel/166610813953.56030.10944148382315789485.stgit@devnote2

Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Tom Zanussi <zanussi@kernel.org>
Tested-by: Tom Zanussi <zanussi@kernel.org>


# abaa5258 19-Oct-2022 Masami Hiramatsu (Google) <mhiramat@kernel.org>

tracing: Add .percent suffix option to histogram values

Add .percent suffix option to show the histogram values in percentage.
This feature is useful when we need yo undersntand the overall trend
for the histograms of large values.
E.g. this shows the runtime percentage for each tasks.

------
# cd /sys/kernel/debug/tracing/
# echo hist:keys=pid:vals=hitcount,runtime.percent:sort=pid > \
events/sched/sched_stat_runtime/trigger
# sleep 10
# cat events/sched/sched_stat_runtime/hist
# event histogram
#
# trigger info: hist:keys=pid:vals=hitcount,runtime.percent:sort=pid:size=2048 [active]
#

{ pid: 8 } hitcount: 7 runtime (%): 4.14
{ pid: 14 } hitcount: 5 runtime (%): 3.69
{ pid: 16 } hitcount: 11 runtime (%): 3.41
{ pid: 61 } hitcount: 41 runtime (%): 19.75
{ pid: 65 } hitcount: 4 runtime (%): 1.48
{ pid: 70 } hitcount: 6 runtime (%): 3.60
{ pid: 72 } hitcount: 2 runtime (%): 1.10
{ pid: 144 } hitcount: 10 runtime (%): 32.01
{ pid: 151 } hitcount: 8 runtime (%): 22.66
{ pid: 152 } hitcount: 2 runtime (%): 8.10

Totals:
Hits: 96
Entries: 10
Dropped: 0
-----

Link: https://lore.kernel.org/linux-trace-kernel/166610813077.56030.4238090506973562347.stgit@devnote2

Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Tom Zanussi <zanussi@kernel.org>
Tested-by: Tom Zanussi <zanussi@kernel.org>


# 5f2e094e 19-Oct-2022 Tom Zanussi <zanussi@kernel.org>

tracing: Allow multiple hitcount values in histograms

The hitcount is treated specially in the histograms - since it's
always expected to be there regardless of whether the user specified
anything or not, it's always added as the first histogram value.

Currently the code doesn't allow it to be added more than once as a
value, which is inconsistent with all the other possible values. It
would seem to be a pointless thing to want to do, but other features
being added such as percent and graph modifiers don't work properly
with the current hitcount restrictions.

Fix this by allowing multiple hitcounts to be added.

Link: https://lore.kernel.org/linux-trace-kernel/166610812248.56030.16754785928712505251.stgit@devnote2

Signed-off-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Tested-by: Tom Zanussi <zanussi@kernel.org>


# ef38c79a 23-Nov-2022 Steven Rostedt (Google) <rostedt@goodmis.org>

tracing: Fix race where histograms can be called before the event

commit 94eedf3dded5 ("tracing: Fix race where eprobes can be called before
the event") fixed an issue where if an event is soft disabled, and the
trigger is being added, there's a small window where the event sees that
there's a trigger but does not see that it requires reading the event yet,
and then calls the trigger with the record == NULL.

This could be solved with adding memory barriers in the hot path, or to
make sure that all the triggers requiring a record check for NULL. The
latter was chosen.

Commit 94eedf3dded5 set the eprobe trigger handle to check for NULL, but
the same needs to be done with histograms.

Link: https://lore.kernel.org/linux-trace-kernel/20221118211809.701d40c0f8a757b0df3c025a@kernel.org/
Link: https://lore.kernel.org/linux-trace-kernel/20221123164323.03450c3a@gandalf.local.home

Cc: Tom Zanussi <zanussi@kernel.org>
Cc: stable@vger.kernel.org
Fixes: 7491e2c442781 ("tracing: Add a probe that attaches to trace events")
Reported-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>


# 0a068f4a 20-Oct-2022 Colin Ian King <colin.i.king@gmail.com>

tracing/hist: add in missing * in comment blocks

There are a couple of missing * in comment blocks. Fix these.
Cleans up two clang warnings:

kernel/trace/trace_events_hist.c:986: warning: bad line:
kernel/trace/trace_events_hist.c:3229: warning: bad line:

Link: https://lkml.kernel.org/r/20221020133019.1547587-1-colin.i.king@gmail.com

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>


# 86087383 06-Sep-2022 Steven Rostedt (Google) <rostedt@goodmis.org>

tracing/hist: Call hist functions directly via a switch statement

Due to retpolines, indirect calls are much more expensive than direct
calls. The histograms have a select set of functions it uses for the
histograms, instead of using function pointers to call them, create a
hist_fn_call() function that uses a switch statement to call the histogram
functions directly. This gives a 13% speedup to the histogram logic.

Using the histogram benchmark:

Before:

# event histogram
#
# trigger info: hist:keys=delta:vals=hitcount:sort=delta:size=2048 if delta > 0 [active]
#

{ delta: 129 } hitcount: 2213
{ delta: 130 } hitcount: 285965
{ delta: 131 } hitcount: 1146545
{ delta: 132 } hitcount: 5185432
{ delta: 133 } hitcount: 19896215
{ delta: 134 } hitcount: 53118616
{ delta: 135 } hitcount: 83816709
{ delta: 136 } hitcount: 68329562
{ delta: 137 } hitcount: 41859349
{ delta: 138 } hitcount: 46257797
{ delta: 139 } hitcount: 54400831
{ delta: 140 } hitcount: 72875007
{ delta: 141 } hitcount: 76193272
{ delta: 142 } hitcount: 49504263
{ delta: 143 } hitcount: 38821072
{ delta: 144 } hitcount: 47702679
{ delta: 145 } hitcount: 41357297
{ delta: 146 } hitcount: 22058238
{ delta: 147 } hitcount: 9720002
{ delta: 148 } hitcount: 3193542
{ delta: 149 } hitcount: 927030
{ delta: 150 } hitcount: 850772
{ delta: 151 } hitcount: 1477380
{ delta: 152 } hitcount: 2687977
{ delta: 153 } hitcount: 2865985
{ delta: 154 } hitcount: 1977492
{ delta: 155 } hitcount: 2475607
{ delta: 156 } hitcount: 3403612

After:

# event histogram
#
# trigger info: hist:keys=delta:vals=hitcount:sort=delta:size=2048 if delta > 0 [active]
#

{ delta: 113 } hitcount: 272
{ delta: 114 } hitcount: 840
{ delta: 118 } hitcount: 344
{ delta: 119 } hitcount: 25428
{ delta: 120 } hitcount: 350590
{ delta: 121 } hitcount: 1892484
{ delta: 122 } hitcount: 6205004
{ delta: 123 } hitcount: 11583521
{ delta: 124 } hitcount: 37590979
{ delta: 125 } hitcount: 108308504
{ delta: 126 } hitcount: 131672461
{ delta: 127 } hitcount: 88700598
{ delta: 128 } hitcount: 65939870
{ delta: 129 } hitcount: 45055004
{ delta: 130 } hitcount: 33174464
{ delta: 131 } hitcount: 31813493
{ delta: 132 } hitcount: 29011676
{ delta: 133 } hitcount: 22798782
{ delta: 134 } hitcount: 22072486
{ delta: 135 } hitcount: 17034113
{ delta: 136 } hitcount: 8982490
{ delta: 137 } hitcount: 2865908
{ delta: 138 } hitcount: 980382
{ delta: 139 } hitcount: 1651944
{ delta: 140 } hitcount: 4112073
{ delta: 141 } hitcount: 3963269
{ delta: 142 } hitcount: 1712508
{ delta: 143 } hitcount: 575941
{ delta: 144 } hitcount: 351427
{ delta: 145 } hitcount: 218077
{ delta: 146 } hitcount: 167297
{ delta: 147 } hitcount: 146198
{ delta: 148 } hitcount: 116122
{ delta: 149 } hitcount: 58993
{ delta: 150 } hitcount: 40228

The delta above is in nanoseconds. It brings the fastest time down from
129ns to 113ns, and the peak from 141ns to 126ns.

Link: https://lkml.kernel.org/r/20220906225529.411545333@goodmis.org

Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>


# fb991f19 29-Jun-2022 Zheng Yejian <zhengyejian1@huawei.com>

tracing/histograms: Simplify create_hist_fields()

When I look into implements of create_hist_fields(), I think there can be
following two simplifications:
1. If something wrong happened in parse_var_defs(), free_var_defs() would
have been called in it, so no need goto free again after calling it;
2. After calling create_key_fields(), regardless of the value of 'ret', it
then always runs into 'out: ', so the judge of 'ret' is redundant.

Link: https://lkml.kernel.org/r/20220630013152.164871-1-zhengyejian1@huawei.com

Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com>
Reviewed-by: Tom Rix <trix@redhat.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>


# 7edc3945 10-Jul-2022 Zheng Yejian <zhengyejian1@huawei.com>

tracing/histograms: Fix memory leak problem

This reverts commit 46bbe5c671e06f070428b9be142cc4ee5cedebac.

As commit 46bbe5c671e0 ("tracing: fix double free") said, the
"double free" problem reported by clang static analyzer is:
> In parse_var_defs() if there is a problem allocating
> var_defs.expr, the earlier var_defs.name is freed.
> This free is duplicated by free_var_defs() which frees
> the rest of the list.

However, if there is a problem allocating N-th var_defs.expr:
+ in parse_var_defs(), the freed 'earlier var_defs.name' is
actually the N-th var_defs.name;
+ then in free_var_defs(), the names from 0th to (N-1)-th are freed;

IF ALLOCATING PROBLEM HAPPENED HERE!!! -+
\
|
0th 1th (N-1)-th N-th V
+-------------+-------------+-----+-------------+-----------
var_defs: | name | expr | name | expr | ... | name | expr | name | ///
+-------------+-------------+-----+-------------+-----------

These two frees don't act on same name, so there was no "double free"
problem before. Conversely, after that commit, we get a "memory leak"
problem because the above "N-th var_defs.name" is not freed.

If enable CONFIG_DEBUG_KMEMLEAK and inject a fault at where the N-th
var_defs.expr allocated, then execute on shell like:
$ echo 'hist:key=call_site:val=$v1,$v2:v1=bytes_req,v2=bytes_alloc' > \
/sys/kernel/debug/tracing/events/kmem/kmalloc/trigger

Then kmemleak reports:
unreferenced object 0xffff8fb100ef3518 (size 8):
comm "bash", pid 196, jiffies 4295681690 (age 28.538s)
hex dump (first 8 bytes):
76 31 00 00 b1 8f ff ff v1......
backtrace:
[<0000000038fe4895>] kstrdup+0x2d/0x60
[<00000000c99c049a>] event_hist_trigger_parse+0x206f/0x20e0
[<00000000ae70d2cc>] trigger_process_regex+0xc0/0x110
[<0000000066737a4c>] event_trigger_write+0x75/0xd0
[<000000007341e40c>] vfs_write+0xbb/0x2a0
[<0000000087fde4c2>] ksys_write+0x59/0xd0
[<00000000581e9cdf>] do_syscall_64+0x3a/0x80
[<00000000cf3b065c>] entry_SYSCALL_64_after_hwframe+0x46/0xb0

Link: https://lkml.kernel.org/r/20220711014731.69520-1-zhengyejian1@huawei.com

Cc: stable@vger.kernel.org
Fixes: 46bbe5c671e0 ("tracing: fix double free")
Reported-by: Hulk Robot <hulkci@huawei.com>
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Reviewed-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>


# 2d601b98 12-May-2022 liqiong <liqiong@nfschina.com>

tracing: Change "char *" string form to "char []"

The "char []" string form declares a single variable. It is better
than "char *" which creates two variables in the final assembly.

Link: https://lkml.kernel.org/r/20220512143230.28796-1-liqiong@nfschina.com

Signed-off-by: liqiong <liqiong@nfschina.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>


# 99696a25 25-Apr-2022 Keita Suzuki <keitasuzuki.park@sslab.ics.keio.ac.jp>

tracing: Fix potential double free in create_var_ref()

In create_var_ref(), init_var_ref() is called to initialize the fields
of variable ref_field, which is allocated in the previous function call
to create_hist_field(). Function init_var_ref() allocates the
corresponding fields such as ref_field->system, but frees these fields
when the function encounters an error. The caller later calls
destroy_hist_field() to conduct error handling, which frees the fields
and the variable itself. This results in double free of the fields which
are already freed in the previous function.

Fix this by storing NULL to the corresponding fields when they are freed
in init_var_ref().

Link: https://lkml.kernel.org/r/20220425063739.3859998-1-keitasuzuki.park@sslab.ics.keio.ac.jp

Fixes: 067fe038e70f ("tracing: Add variable reference handling to hist triggers")
CC: stable@vger.kernel.org
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Keita Suzuki <keitasuzuki.park@sslab.ics.keio.ac.jp>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>


# 45e333ce 27-Apr-2022 Jakob Koschel <jakobkoschel@gmail.com>

tracing: Replace usage of found with dedicated list iterator variable

To move the list iterator variable into the list_for_each_entry_*()
macro in the future it should be avoided to use the list iterator
variable after the loop body.

To *never* use the list iterator variable after the loop it was
concluded to use a separate iterator variable instead of a
found boolean [1].

This removes the need to use a found variable and simply checking if
the variable was set, can determine if the break/goto was hit.

Link: https://lkml.kernel.org/r/20220427170734.819891-4-jakobkoschel@gmail.com

Cc: Ingo Molnar <mingo@redhat.com>
Link: https://lore.kernel.org/all/CAHk-=wgRr_D8CB-D9Kg-c=EHreAsk5SqXPwr9Y7k9sA6cWXJ6w@mail.gmail.com/ [1]
Signed-off-by: Jakob Koschel <jakobkoschel@gmail.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>


# 69686fcb 17-Apr-2022 Ammar Faizi <ammarfaizi2@gnuweeb.org>

tracing: Change `if (strlen(glob))` to `if (glob[0])`

No need to traverse to the end of string. If the first byte is not a NUL
char, it's guaranteed `if (strlen(glob))` is true.

Link: https://lkml.kernel.org/r/20220417185630.199062-3-ammarfaizi2@gnuweeb.org

Cc: Ingo Molnar <mingo@redhat.com>
Cc: GNU/Weeb Mailing List <gwml@vger.gnuweeb.org>
Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>


# 97a5d2e5 17-Apr-2022 Ammar Faizi <ammarfaizi2@gnuweeb.org>

tracing: Return -EINVAL if WARN_ON(!glob) triggered in event_hist_trigger_parse()

If `WARN_ON(!glob)` is ever triggered, we will still continue executing
the next lines. This will trigger the more serious problem, a NULL
pointer dereference bug.

Just return -EINVAL if @glob is NULL.

Link: https://lkml.kernel.org/r/20220417185630.199062-2-ammarfaizi2@gnuweeb.org

Cc: Ingo Molnar <mingo@redhat.com>
Cc: GNU/Weeb Mailing List <gwml@vger.gnuweeb.org>
Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>


# a7e6b7dc 04-Feb-2022 Tom Zanussi <zanussi@kernel.org>

tracing: Separate hist state updates from hist registration

hist_register_trigger() handles both new hist registration as well as
existing hist registration through event_command.reg().

Adding a new function, existing_hist_update_only(), that checks and
updates existing histograms and exits after doing so allows the
confusing logic in event_hist_trigger_parse() to be simplified.

Link: https://lkml.kernel.org/r/211b2cd3e3d7e00f4f8ad45ef8b33063da6a7e05.1644010576.git.zanussi@kernel.org

Signed-off-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>


# e1f187d0 04-Feb-2022 Tom Zanussi <zanussi@kernel.org>

tracing: Have existing event_command.parse() implementations use helpers

Simplify the existing event_command.parse() implementations by having
them make use of the helper functions previously introduced.

Link: https://lkml.kernel.org/r/b353e3427a81f9d3adafd98fd7d73e78a8209f43.1644010576.git.zanussi@kernel.org

Signed-off-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>


# 47670541 04-Feb-2022 Tom Zanussi <zanussi@kernel.org>

tracing: Remove redundant trigger_ops params

Since event_trigger_data contains the .ops trigger_ops field, there's
no reason to pass the trigger_ops separately. Remove it as a param
from functions whenever event_trigger_data is passed.

Link: https://lkml.kernel.org/r/9856c9bc81bde57077f5b8d6f8faa47156c6354a.1644010575.git.zanussi@kernel.org

Signed-off-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>


# b8cc44a4 04-Feb-2022 Tom Zanussi <zanussi@kernel.org>

tracing: Remove logic for registering multiple event triggers at a time

Code for registering triggers assumes it's possible to register more
than one trigger at a time. In fact, it's unimplemented and there
doesn't seem to be a reason to do that.

Remove the n_registered param from event_trigger_register() and fix up
callers.

Doing so simplifies the logic in event_trigger_register to the point
that it just becomes a wrapper calling event_command.reg().

It also removes the problematic call to event_command.unreg() in case
of failure. A new function, event_trigger_unregister() is also added
for callers to call themselves.

The changes to trace_events_hist.c simply allow compilation; a
separate patch follows which updates the hist triggers to work
correctly with the new changes.

Link: https://lkml.kernel.org/r/6149fec7a139d93e84fa4535672fb5bef88006b0.1644010575.git.zanussi@kernel.org

Signed-off-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>


# 5677a3d7 08-Mar-2022 Steven Rostedt (Google) <rostedt@goodmis.org>

tracing: Fix last_cmd_set() string management in histogram code

Using strnlen(dest, str, n) is confusing, as the size of dest must be
strlen(dest) + n + 1. Even more confusing, using sizeof(string constant)
gives you strlen(string constant) + 1 and not just strlen().

These two together made using strncat() with a constant string a bit off
in the calculations as we have:

len = sizeof(HIST_PREFIX) + strlen(str) + 1;
kfree(last_cmd);
last_cmd = kzalloc(len, GFP_KERNEL);
strcpy(last_cmd, HIST_PREFIX);
len -= sizeof(HIST_PREFIX) + 1;
strncat(last_cmd, str, len);

The above works if we s/sizeof/strlen/ with HIST_PREFIX (which is defined
as "hist:", but because sizeof(HIST_PREFIX) is equal to
strlen(HIST_PREFIX) + 1, we can drop the +1 in the code. But at least
comment that we are doing so.

Link: https://lore.kernel.org/all/202203082112.Iu7tvFl4-lkp@intel.com/

Fixes: 9f8e5aee93ed2 ("tracing: Fix allocation of last_cmd in last_cmd_set()")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>


# 9f8e5aee 13-Feb-2022 Steven Rostedt (Google) <rostedt@goodmis.org>

tracing: Fix allocation of last_cmd in last_cmd_set()

The strncat() used in last_cmd_set() includes the nul byte of length of
the string being copied in, when it should only hold the size of the
string being copied (not the nul byte). Change it to subtract the length
of the allocated space and the nul byte to pass that into the strncat().

Also, assign "len" instead of initializing it to zero and its first update
is to do a "+=".

Link: https://lore.kernel.org/all/202202140628.fj6e4w4v-lkp@intel.com/

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>


# edfeed31 28-Jan-2022 Tom Zanussi <zanussi@kernel.org>

tracing: Remove size restriction on hist trigger cmd error logging

Currently, hist trigger command error strings are restricted to a
length of MAX_FILTER_STR_VAL (256), which is too short for some
commands already seen in the wild (with cmd strings longer than that
showing up truncated in err_log).

Remove the restriction so that no hist trigger command error string is
ever truncated.

Link: https://lkml.kernel.org/r/0f9d46407222eaf6632cd3b417bc50a11f401b71.1643399022.git.zanussi@kernel.org

Signed-off-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>


# 1d1898f6 01-Mar-2022 Steven Rostedt (Google) <rostedt@goodmis.org>

tracing/histogram: Fix sorting on old "cpu" value

When trying to add a histogram against an event with the "cpu" field, it
was impossible due to "cpu" being a keyword to key off of the running CPU.
So to fix this, it was changed to "common_cpu" to match the other generic
fields (like "common_pid"). But since some scripts used "cpu" for keying
off of the CPU (for events that did not have "cpu" as a field, which is
most of them), a backward compatibility trick was added such that if "cpu"
was used as a key, and the event did not have "cpu" as a field name, then
it would fallback and switch over to "common_cpu".

This fix has a couple of subtle bugs. One was that when switching over to
"common_cpu", it did not change the field name, it just set a flag. But
the code still found a "cpu" field. The "cpu" field is used for filtering
and is returned when the event does not have a "cpu" field.

This was found by:

# cd /sys/kernel/tracing
# echo hist:key=cpu,pid:sort=cpu > events/sched/sched_wakeup/trigger
# cat events/sched/sched_wakeup/hist

Which showed the histogram unsorted:

{ cpu: 19, pid: 1175 } hitcount: 1
{ cpu: 6, pid: 239 } hitcount: 2
{ cpu: 23, pid: 1186 } hitcount: 14
{ cpu: 12, pid: 249 } hitcount: 2
{ cpu: 3, pid: 994 } hitcount: 5

Instead of hard coding the "cpu" checks, take advantage of the fact that
trace_event_field_field() returns a special field for "cpu" and "CPU" if
the event does not have "cpu" as a field. This special field has the
"filter_type" of "FILTER_CPU". Check that to test if the returned field is
of the CPU type instead of doing the string compare.

Also, fix the sorting bug by testing for the hist_field flag of
HIST_FIELD_FL_CPU when setting up the sort routine. Otherwise it will use
the special CPU field to know what compare routine to use, and since that
special field does not have a size, it returns tracing_map_cmp_none.

Cc: stable@vger.kernel.org
Fixes: 1e3bac71c505 ("tracing/histogram: Rename "cpu" to "common_cpu"")
Reported-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>


# 097f1eef 27-Jan-2022 Tom Zanussi <zanussi@kernel.org>

tracing: Propagate is_signed to expression

During expression parsing, a new expression field is created which
should inherit the properties of the operands, such as size and
is_signed.

is_signed propagation was missing, causing spurious errors with signed
operands. Add it in parse_expr() and parse_unary() to fix the problem.

Link: https://lkml.kernel.org/r/f4dac08742fd7a0920bf80a73c6c44042f5eaa40.1643319703.git.zanussi@kernel.org

Cc: stable@vger.kernel.org
Fixes: 100719dcef447 ("tracing: Add simple expression support to hist triggers")
Reported-by: Yordan Karadzhov <ykaradzhov@vmware.com>
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215513
Signed-off-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>


# b59f2f2b 27-Jan-2022 Tom Zanussi <zanussi@kernel.org>

tracing: Fix smatch warning for do while check in event_hist_trigger_parse()

The patch ec5ce0987541: "tracing: Allow whitespace to surround hist
trigger filter" from Jan 15, 2018, leads to the following Smatch
static checker warning:

kernel/trace/trace_events_hist.c:6199 event_hist_trigger_parse()
warn: 'p' can't be NULL.

Since p is always checked for a NULL value at the top of loop and
nothing in the rest of the loop will set it to NULL, the warning
is correct and might as well be 1 to silence the warning.

Link: https://lkml.kernel.org/r/a1d4c79766c0cf61e20438dc35244d216633fef6.1643319703.git.zanussi@kernel.org

Fixes: ec5ce09875410 ("tracing: Allow whitespace to surround hist trigger filter")
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>


# 798a5b6c 27-Jan-2022 Tom Zanussi <zanussi@kernel.org>

tracing: Fix smatch warning for null glob in event_hist_trigger_parse()

The recent rename of event_hist_trigger_parse() caused smatch
re-evaluation of trace_events_hist.c and as a result an old warning
was found:

kernel/trace/trace_events_hist.c:6174 event_hist_trigger_parse()
error: we previously assumed 'glob' could be null (see line 6166)

glob should never be null (and apparently smatch can also figure that
out and skip the warning when using the cross-function DB (but which
can't be used with a 0day build as it takes too much time to
generate)).

Nonetheless for clarity, remove the test but add a WARN_ON() in case
the code ever changes.

Link: https://lkml.kernel.org/r/96925e5c1f116654ada7ea0613d930b1266b5e1c.1643319703.git.zanussi@kernel.org

Fixes: f404da6e1d46c ("tracing: Add 'last error' error facility for hist triggers")
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>


# e629e7b5 24-Jan-2022 Xiaoke Wang <xkernel.wang@foxmail.com>

tracing/histogram: Fix a potential memory leak for kstrdup()

kfree() is missing on an error path to free the memory allocated by
kstrdup():

p = param = kstrdup(data->params[i], GFP_KERNEL);

So it is better to free it via kfree(p).

Link: https://lkml.kernel.org/r/tencent_C52895FD37802832A3E5B272D05008866F0A@qq.com

Cc: stable@vger.kernel.org
Fixes: d380dcde9a07c ("tracing: Fix now invalid var_ref_vals assumption in trace action")
Signed-off-by: Xiaoke Wang <xkernel.wang@foxmail.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>


# 2378a2d6 10-Jan-2022 Tom Zanussi <zanussi@kernel.org>

tracing: Remove ops param from event_command reg()/unreg() callbacks

The event_trigger_ops for an event_command are already accessible via
event_trigger_data.ops so remove the redundant ops from the callback.

Link: https://lkml.kernel.org/r/4c6f2a41820452f9cacddc7634ad442928aa2aa6.1641823001.git.zanussi@kernel.org

Signed-off-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# fb339e53 10-Jan-2022 Tom Zanussi <zanussi@kernel.org>

tracing: Change event_trigger_ops func() to trigger()

The name of the func() callback on event_trigger_ops is too generic
and is easily confused with other callbacks with that name, so change
it to something that reflects its actual purpose.

In this case, the main purpose of the callback is to implement an
event trigger, so call it trigger() instead.

Also add some more documentation to event_trigger_ops describing the
callbacks a bit better.

Link: https://lkml.kernel.org/r/36ab812e3ee74ee03ae0043fda41a858ee728c00.1641823001.git.zanussi@kernel.org

Signed-off-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 9ec5a7d1 10-Jan-2022 Tom Zanussi <zanussi@kernel.org>

tracing: Change event_command func() to parse()

The name of the func() callback on event_command is too generic and is
easily confused with other callbacks with that name, so change it to
something that reflects its actual purpose.

In this case, the main purpose of the callback is to parse an event
command, so call it parse() instead.

Link: https://lkml.kernel.org/r/7784e321840752ed88aac0b349c0c685fc9247b1.1641823001.git.zanussi@kernel.org

Signed-off-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 05770dd0 22-Nov-2021 Masami Hiramatsu <mhiramat@kernel.org>

tracing: Support __rel_loc relative dynamic data location attribute

Add '__rel_loc' new dynamic data location attribute which encodes
the data location from the next to the field itself.

The '__data_loc' is used for encoding the dynamic data location on
the trace event record. But '__data_loc' is not useful if the writer
doesn't know the event header (e.g. user event), because it records
the dynamic data offset from the entry of the record, not the field
itself.

This new '__rel_loc' attribute encodes the data location relatively
from the next of the field. For example, when there is a record like
below (the number in the parentheses is the size of fields)

|header(N)|common(M)|fields(K)|__data_loc(4)|fields(L)|data(G)|

In this case, '__data_loc' field will be

__data_loc = (G << 16) | (N+M+K+4+L)

If '__rel_loc' is used, this will be

|header(N)|common(M)|fields(K)|__rel_loc(4)|fields(L)|data(G)|

where

__rel_loc = (G << 16) | (L)

This case shows L bytes after the '__rel_loc' attribute field,
if there is no fields after the __rel_loc field, L must be 0.

This is relatively easy (and no need to consider the kernel header
change) when the event data fields are composed by user who doesn't
know header and common fields.

Link: https://lkml.kernel.org/r/163757341258.510314.4214431827833229956.stgit@devnote2

Cc: Beau Belgrave <beaub@linux.microsoft.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 450fec13 29-Nov-2021 Steven Rostedt (VMware) <rostedt@goodmis.org>

tracing/histograms: String compares should not care about signed values

When comparing two strings for the "onmatch" histogram trigger, fields
that are strings use string comparisons, which do not care about being
signed or not.

Do not fail to match two string fields if one is unsigned char array and
the other is a signed char array.

Link: https://lore.kernel.org/all/20211129123043.5cfd687a@gandalf.local.home/

Cc: stable@vgerk.kernel.org
Cc: Tom Zanussi <zanussi@kernel.org>
Cc: Yafang Shao <laoar.shao@gmail.com>
Fixes: b05e89ae7cf3b ("tracing: Accept different type for synthetic event fields")
Reviewed-by: Masami Hiramatsu <mhiramatsu@kernel.org>
Reported-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# f86b0aaa 17-Nov-2021 Kalesh Singh <kaleshsingh@google.com>

tracing/histogram: Fix UAF in destroy_hist_field()

Calling destroy_hist_field() on an expression will recursively free
any operands associated with the expression. If during expression
parsing the operands of the expression are already set when an error
is encountered, there is no need to explicity free the operands. Doing
so will result in destroy_hist_field() being called twice for the
operands and lead to a use-after-free (UAF) error.

If the operands are associated with the expression, only call
destroy_hist_field() on the expression since the operands will be
recursively freed.

Link: https://lore.kernel.org/all/CAHk-=wgcrEbFgkw9720H3tW-AhHOoEKhYwZinYJw4FpzSaJ6_Q@mail.gmail.com/
Link: https://lkml.kernel.org/r/20211118011542.1420131-1-kaleshsingh@google.com

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Fixes: 8b5d46fd7a38 ("tracing/histogram: Optimize division by constants")
Reported-by: kernel test robot <oliver.sang@intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 938aa33f 14-Nov-2021 Steven Rostedt (VMware) <rostedt@goodmis.org>

tracing: Add length protection to histogram string copies

The string copies to the histogram storage has a max size of 256 bytes
(defined by MAX_FILTER_STR_VAL). Only the string size of the event field
needs to be copied to the event storage, but no more than what is in the
event storage. Although nothing should be bigger than 256 bytes, there's
no protection against overwriting of the storage if one day there is.

Copy no more than the destination size, and enforce it.

Also had to turn MAX_FILTER_STR_VAL into an unsigned int, to keep the
min() comparison of the string sizes of comparable types.

Link: https://lore.kernel.org/all/CAHk-=wjREUihCGrtRBwfX47y_KrLCGjiq3t6QtoNJpmVrAEb1w@mail.gmail.com/
Link: https://lkml.kernel.org/r/20211114132834.183429a4@rorschach.local.home

Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Tom Zanussi <zanussi@kernel.org>
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Fixes: 63f84ae6b82b ("tracing/histogram: Do not copy the fixed-size char array field over the field size")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 1cab6bce 12-Nov-2021 Kalesh Singh <kaleshsingh@google.com>

tracing/histogram: Fix check for missing operands in an expression

If a binary operation is detected while parsing an expression string,
the operand strings are deduced by splitting the experssion string at
the position of the detected binary operator. Both operand strings are
sub-strings (can be empty string) of the expression string but will
never be NULL.

Currently a NULL check is used for missing operands, fix this by
checking for empty strings instead.

Link: https://lkml.kernel.org/r/20211112191324.1302505-1-kaleshsingh@google.com

Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Fixes: 9710b2f341a0 ("tracing: Fix operator precedence for hist triggers expression")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 63f84ae6 12-Nov-2021 Masami Hiramatsu <mhiramat@kernel.org>

tracing/histogram: Do not copy the fixed-size char array field over the field size

Do not copy the fixed-size char array field of the events over
the field size. The histogram treats char array as a string and
there are 2 types of char array in the event, fixed-size and
dynamic string. The dynamic string (__data_loc) field must be
null terminated, but the fixed-size char array field may not
be null terminated (not a string, but just a data).
In that case, histogram can copy the data after the field.
This uses the original field size for fixed-size char array
field to restrict the histogram not to access over the original
field size.

Link: https://lkml.kernel.org/r/163673292822.195747.3696966210526410250.stgit@devnote2

Fixes: 02205a6752f2 (tracing: Add support for 'field variables')
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 8b5d46fd 29-Oct-2021 Kalesh Singh <kaleshsingh@google.com>

tracing/histogram: Optimize division by constants

If the divisor is a constant use specific division functions to
avoid extra branches when the trigger is hit.

If the divisor constant but not a power of 2, the division can be
replaced with a multiplication and shift in the following case:

Let X = dividend and Y = divisor.

Choose Z = some power of 2. If Y <= Z, then:
X / Y = (X * (Z / Y)) / Z

(Z / Y) is a constant (mult) which is calculated at parse time, so:
X / Y = (X * mult) / Z

The division by Z can be replaced by a shift since Z is a power of 2:
X / Y = (X * mult) >> shift

As long, as X < Z the results will not be off by more than 1.

Link: https://lkml.kernel.org/r/20211029232410.3494196-1-kaleshsingh@google.com

Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# feea69ec 29-Oct-2021 kernel test robot <lkp@intel.com>

tracing/histogram: Fix semicolon.cocci warnings

kernel/trace/trace_events_hist.c:6039:2-3: Unneeded semicolon

Remove unneeded semicolon.

Generated by: scripts/coccinelle/misc/semicolon.cocci

Link: https://lkml.kernel.org/r/20211030005615.GA41257@3074f0d39c61

Fixes: c5eac6ee8bc5 ("tracing/histogram: Simplify handling of .sym-offset in expressions")
CC: Kalesh Singh <kaleshsingh@google.com>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: kernel test robot <lkp@intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 722eddaa 25-Oct-2021 Kalesh Singh <kaleshsingh@google.com>

tracing/histogram: Optimize division by a power of 2

The division is a slow operation. If the divisor is a power of 2, use a
shift instead.

Results were obtained using Android's version of perf (simpleperf[1]) as
described below:

1. hist_field_div() is modified to call 2 test functions:
test_hist_field_div_[not]_optimized(); passing them the
same args. Use noinline and volatile to ensure these are
not optimized out by the compiler.
2. Create a hist event trigger that uses division:
events/kmem/rss_stat$ echo 'hist:keys=common_pid:x=size/<divisor>'
>> trigger
events/kmem/rss_stat$ echo 'hist:keys=common_pid:vals=$x'
>> trigger
3. Run Android's lmkd_test[2] to generate rss_stat events, and
record CPU samples with Android's simpleperf:
simpleperf record -a --exclude-perf --post-unwind=yes -m 16384 -g
-f 2000 -o perf.data

== Results ==

Divisor is a power of 2 (divisor == 32):

test_hist_field_div_not_optimized | 8,717,091 cpu-cycles
test_hist_field_div_optimized | 1,643,137 cpu-cycles

If the divisor is a power of 2, the optimized version is ~5.3x faster.

Divisor is not a power of 2 (divisor == 33):

test_hist_field_div_not_optimized | 4,444,324 cpu-cycles
test_hist_field_div_optimized | 5,497,958 cpu-cycles

If the divisor is not a power of 2, as expected, the optimized version is
slightly slower (~24% slower).

[1] https://android.googlesource.com/platform/system/extras/+/master/simpleperf/doc/README.md
[2] https://cs.android.com/android/platform/superproject/+/master:system/memory/lmkd/tests/lmkd_test.cpp

Link: https://lkml.kernel.org/r/20211025200852.3002369-7-kaleshsingh@google.com

Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# f47716b7 25-Oct-2021 Kalesh Singh <kaleshsingh@google.com>

tracing/histogram: Covert expr to const if both operands are constants

If both operands of a hist trigger expression are constants, convert the
expression to a constant. This optimization avoids having to perform the
same calculation multiple times and also saves on memory since the
merged constants are represented by a single struct hist_field instead
or multiple.

Link: https://lkml.kernel.org/r/20211025200852.3002369-6-kaleshsingh@google.com

Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# c5eac6ee 25-Oct-2021 Kalesh Singh <kaleshsingh@google.com>

tracing/histogram: Simplify handling of .sym-offset in expressions

The '-' in .sym-offset can confuse the hist trigger arithmetic
expression parsing. Simplify the handling of this by replacing the
'sym-offset' with 'symXoffset'. This allows us to correctly evaluate
expressions where the user may have inadvertently added a .sym-offset
modifier to one of the operands in an expression, instead of bailing
out. In this case the .sym-offset has no effect on the evaluation of the
expression. The only valid use of the .sym-offset is as a hist key
modifier.

Link: https://lkml.kernel.org/r/20211025200852.3002369-5-kaleshsingh@google.com

Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 9710b2f3 25-Oct-2021 Kalesh Singh <kaleshsingh@google.com>

tracing: Fix operator precedence for hist triggers expression

The current histogram expression evaluation logic evaluates the
expression from right to left. This can lead to incorrect results
if the operations are not associative (as is the case for subtraction
and, the now added, division operators).
e.g. 16-8-4-2 should be 2 not 10 --> 16-8-4-2 = ((16-8)-4)-2
64/8/4/2 should be 1 not 16 --> 64/8/4/2 = ((64/8)/4)/2

Division and multiplication are currently limited to single operation
expression due to operator precedence support not yet implemented.

Rework the expression parsing to support the correct evaluation of
expressions containing operators of different precedences; and fix
the associativity error by evaluating expressions with operators of
the same precedence from left to right.

Examples:
(1) echo 'hist:keys=common_pid:a=8,b=4,c=2,d=1,w=$a-$b-$c-$d' \
>> event/trigger
(2) echo 'hist:keys=common_pid:x=$a/$b/3/2' >> event/trigger
(3) echo 'hist:keys=common_pid:y=$a+10/$c*1024' >> event/trigger
(4) echo 'hist:keys=common_pid:z=$a/$b+$c*$d' >> event/trigger

Link: https://lkml.kernel.org/r/20211025200852.3002369-4-kaleshsingh@google.com

Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# bcef0441 25-Oct-2021 Kalesh Singh <kaleshsingh@google.com>

tracing: Add division and multiplication support for hist triggers

Adds basic support for division and multiplication operations for
hist trigger variable expressions.

For simplicity this patch only supports, division and multiplication
for a single operation expression (e.g. x=$a/$b), as currently
expressions are always evaluated right to left. This can lead to some
incorrect results:

e.g. echo 'hist:keys=common_pid:x=8-4-2' >> event/trigger

8-4-2 should evaluate to 2 i.e. (8-4)-2
but currently x evaluate to 6 i.e. 8-(4-2)

Multiplication and division in sub-expressions will work correctly, once
correct operator precedence support is added (See next patch in this
series).

For the undefined case of division by 0, the histogram expression
evaluates to (u64)(-1). Since this cannot be detected when the
expression is created, it is the responsibility of the user to be
aware and account for this possibility.

Examples:
echo 'hist:keys=common_pid:a=8,b=4,x=$a/$b' \
>> event/trigger

echo 'hist:keys=common_pid:y=5*$b' \
>> event/trigger

Link: https://lkml.kernel.org/r/20211025200852.3002369-3-kaleshsingh@google.com

Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 52cfb373 25-Oct-2021 Kalesh Singh <kaleshsingh@google.com>

tracing: Add support for creating hist trigger variables from literal

Currently hist trigger expressions don't support the use of numeric
literals:
e.g. echo 'hist:keys=common_pid:x=$y-1234'
--> is not valid expression syntax

Having the ability to use numeric constants in hist triggers supports
a wider range of expressions for creating variables.

Add support for creating trace event histogram variables from numeric
literals.

e.g. echo 'hist:keys=common_pid:x=1234,y=size-1024' >> event/trigger

A negative numeric constant is created, using unary minus operator
(parentheses are required).

e.g. echo 'hist:keys=common_pid:z=-(2)' >> event/trigger

Constants can be used with division/multiplication (added in the
next patch in this series) to implement granularity filters for frequent
trace events. For instance we can limit emitting the rss_stat
trace event to when there is a 512KB cross over in the rss size:

# Create a synthetic event to monitor instead of the high frequency
# rss_stat event
echo 'rss_stat_throttled unsigned int mm_id; unsigned int curr;
int member; long size' >> tracing/synthetic_events

# Create a hist trigger that emits the synthetic rss_stat_throttled
# event only when the rss size crosses a 512KB boundary.
echo 'hist:keys=keys=mm_id,member:bucket=size/0x80000:onchange($bucket)
.rss_stat_throttled(mm_id,curr,member,size)'
>> events/kmem/rss_stat/trigger

A use case for using constants with addition/subtraction is not yet
known, but for completeness the use of constants are supported for all
operators.

Link: https://lkml.kernel.org/r/20211025200852.3002369-2-kaleshsingh@google.com

Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 8720aeec 19-Oct-2021 Arnd Bergmann <arnd@arndb.de>

tracing: use %ps format string to print symbols

clang started warning about excessive stack usage in
hist_trigger_print_key()

kernel/trace/trace_events_hist.c:4723:13: error: stack frame size (1336) exceeds limit (1024) in function 'hist_trigger_print_key' [-Werror,-Wframe-larger-than]

The problem is that there are two 512-byte arrays on the stack if
hist_trigger_stacktrace_print() gets inlined. I don't think this has
changed in the past five years, but something probably changed the
inlining decisions made by the compiler, so the problem is now made
more obvious.

Rather than printing the symbol names into separate buffers, it
seems we can simply use the special %ps format string modifier
to print the pointers symbolically and get rid of both buffers.

Marking hist_trigger_stacktrace_print() would be a simpler
way of avoiding the warning, but that would not address the
excessive stack usage.

Link: https://lkml.kernel.org/r/20211019153337.294790-1-arnd@kernel.org

Fixes: 69a0200c2e25 ("tracing: Add hist trigger support for stacktraces as keys")
Link: https://lore.kernel.org/all/20211015095704.49a99859@gandalf.local.home/
Reviewed-by: Tom Zanussi <zanussi@kernel.org>
Tested-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# b26503b1 06-Oct-2021 Colin Ian King <colin.king@canonical.com>

tracing: Fix missing * in comment block

There is a missing * in a comment block, add it in.

Link: https://lkml.kernel.org/r/20211006172830.1025336-1-colin.king@canonical.com

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# c910db94 02-Sep-2021 Tom Zanussi <zanussi@kernel.org>

tracing: Dynamically allocate the per-elt hist_elt_data array

Setting the hist_elt_data.field_var_str[] array unconditionally to a
size of SYNTH_FIELD_MAX elements wastes space unnecessarily. The
actual number of elements needed can be calculated at run-time
instead.

In most cases, this will save a lot of space since it's a per-elt
array which isn't normally close to being full. It also allows us to
increase SYNTH_FIELD_MAX without worrying about even more wastage when
we do that.

Link: https://lkml.kernel.org/r/d52ae0ad5e1b59af7c4f54faf3fc098461fd82b3.camel@kernel.org

Signed-off-by: Tom Zanussi <zanussi@kernel.org>
Tested-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# ed2cf907 22-Jul-2021 Steven Rostedt (VMware) <rostedt@goodmis.org>

tracing: Allow execnames to be passed as args for synthetic events

Allow common_pid.execname to be saved in a variable in one histogram to be
passed to another histogram that can pass it as a parameter to a synthetic
event.

># echo 'hist:keys=pid:__arg__1=common_timestamp.usecs:arg2=common_pid.execname' \
> events/sched/sched_waking/trigger
># echo 'wakeup_lat s32 pid; u64 delta; char wake_comm[]' > synthetic_events
># echo 'hist:keys=next_pid:pid=next_pid,delta=common_timestamp.usecs-$__arg__1,exec=$arg2'\
':onmatch(sched.sched_waking).trace(wakeup_lat,$pid,$delta,$exec)' \
> events/sched/sched_switch/trigger

The above is a wake up latency synthetic event setup that passes the execname
of the common_pid that woke the task to the scheduling of that task, which
triggers a synthetic event that passes the original execname as a
parameter to display it.

># echo 1 > events/synthetic/enable
># cat trace
<idle>-0 [006] d..4 186.863801: wakeup_lat: pid=1306 delta=65 wake_comm=kworker/u16:3
<idle>-0 [000] d..4 186.863858: wakeup_lat: pid=163 delta=27 wake_comm=<idle>
<idle>-0 [001] d..4 186.863903: wakeup_lat: pid=1307 delta=36 wake_comm=kworker/u16:4
<idle>-0 [000] d..4 186.863927: wakeup_lat: pid=163 delta=5 wake_comm=<idle>
<idle>-0 [006] d..4 186.863957: wakeup_lat: pid=1306 delta=24 wake_comm=kworker/u16:3
sshd-1306 [006] d..4 186.864051: wakeup_lat: pid=61 delta=62 wake_comm=<idle>
<idle>-0 [000] d..4 186.965030: wakeup_lat: pid=609 delta=18 wake_comm=<idle>
<idle>-0 [006] d..4 186.987582: wakeup_lat: pid=1306 delta=65 wake_comm=kworker/u16:3
<idle>-0 [000] d..4 186.987639: wakeup_lat: pid=163 delta=27 wake_comm=<idle>

Link: https://lkml.kernel.org/r/20210722142837.458596338@goodmis.org

Reviewed-by: Tom Zanussi <zanussi@kernel.org>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 3347d80b 22-Jul-2021 Steven Rostedt (VMware) <rostedt@goodmis.org>

tracing: Have histogram types be constant when possible

Instead of kstrdup("const", GFP_KERNEL), have the hist_field type simply
assign the constant hist_field->type = "const"; And when the value passed
to it is a variable, use "kstrdup_const(var, GFP_KERNEL);" which will just
copy the value if the variable is already a constant. This saves on having
to allocate when not needed.

All frees of the hist_field->type will need to use kfree_const().

Link: https://lkml.kernel.org/r/20210722142837.280718447@goodmis.org

Suggested-by: Masami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# de9a48a3 07-Jul-2021 Steven Rostedt (VMware) <rostedt@goodmis.org>

tracing: Add linear buckets to histogram logic

There's been several times I wished the histogram logic had a "grouping"
feature for the buckets. Currently, each bucket has a size of one. That
is, if you trace the amount of requested allocations, each allocation is
its own bucket, even if you are interested in what allocates 100 bytes or
less, 100 to 200, 200 to 300, etc.

Also, without grouping, it fills up the allocated histogram buckets
quickly. If you are tracking latency, and don't care if something is 200
microseconds off, or 201 microseconds off, but want to track them by say
10 microseconds each. This can not currently be done.

There is a log2 but that grouping get's too big too fast for a lot of
cases.

Introduce a "buckets=SIZE" command to each field where it will record in a
rounded number. For example:

># echo 'hist:keys=bytes_req.buckets=100:sort=bytes_req' > events/kmem/kmalloc/trigger
># cat events/kmem/kmalloc/hist
# event histogram
#
# trigger info:
hist:keys=bytes_req.buckets=100:vals=hitcount:sort=bytes_req.buckets=100:size=2048
[active]
#

{ bytes_req: ~ 0-99 } hitcount: 3149
{ bytes_req: ~ 100-199 } hitcount: 1468
{ bytes_req: ~ 200-299 } hitcount: 39
{ bytes_req: ~ 300-399 } hitcount: 306
{ bytes_req: ~ 400-499 } hitcount: 364
{ bytes_req: ~ 500-599 } hitcount: 32
{ bytes_req: ~ 600-699 } hitcount: 69
{ bytes_req: ~ 700-799 } hitcount: 37
{ bytes_req: ~ 1200-1299 } hitcount: 16
{ bytes_req: ~ 1400-1499 } hitcount: 30
{ bytes_req: ~ 2000-2099 } hitcount: 6
{ bytes_req: ~ 4000-4099 } hitcount: 2168
{ bytes_req: ~ 5000-5099 } hitcount: 6

Totals:
Hits: 7690
Entries: 13
Dropped: 0

Link: https://lkml.kernel.org/r/20210707213921.980359719@goodmis.org

Acked-by: Namhyung Kim <namhyung@kernel.org>
Reviewed-by: Tom Zanussi <zanussi@kernel.org>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Tested-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 5acce0bf 07-Aug-2021 Steven Rostedt (VMware) <rostedt@goodmis.org>

tracing / histogram: Fix NULL pointer dereference on strcmp() on NULL event name

The following commands:

# echo 'read_max u64 size;' > synthetic_events
# echo 'hist:keys=common_pid:count=count:onmax($count).trace(read_max,count)' > events/syscalls/sys_enter_read/trigger

Causes:

BUG: kernel NULL pointer dereference, address: 0000000000000000
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: 0000 [#1] PREEMPT SMP
CPU: 4 PID: 1763 Comm: bash Not tainted 5.14.0-rc2-test+ #155
Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01
v03.03 07/14/2016
RIP: 0010:strcmp+0xc/0x20
Code: 75 f7 31 c0 0f b6 0c 06 88 0c 02 48 83 c0 01 84 c9 75 f1 4c 89 c0
c3 0f 1f 80 00 00 00 00 31 c0 eb 08 48 83 c0 01 84 d2 74 0f <0f> b6 14 07
3a 14 06 74 ef 19 c0 83 c8 01 c3 31 c0 c3 66 90 48 89
RSP: 0018:ffffb5fdc0963ca8 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffffffb3a4e040 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff9714c0d0b640 RDI: 0000000000000000
RBP: 0000000000000000 R08: 00000022986b7cde R09: ffffffffb3a4dff8
R10: 0000000000000000 R11: 0000000000000000 R12: ffff9714c50603c8
R13: 0000000000000000 R14: ffff97143fdf9e48 R15: ffff9714c01a2210
FS: 00007f1fa6785740(0000) GS:ffff9714da400000(0000)
knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 000000002d863004 CR4: 00000000001706e0
Call Trace:
__find_event_file+0x4e/0x80
action_create+0x6b7/0xeb0
? kstrdup+0x44/0x60
event_hist_trigger_func+0x1a07/0x2130
trigger_process_regex+0xbd/0x110
event_trigger_write+0x71/0xd0
vfs_write+0xe9/0x310
ksys_write+0x68/0xe0
do_syscall_64+0x3b/0x90
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f1fa6879e87

The problem was the "trace(read_max,count)" where the "count" should be
"$count" as "onmax()" only handles variables (although it really should be
able to figure out that "count" is a field of sys_enter_read). But there's
a path that does not find the variable and ends up passing a NULL for the
event, which ends up getting passed to "strcmp()".

Add a check for NULL to return and error on the command with:

# cat error_log
hist:syscalls:sys_enter_read: error: Couldn't create or find variable
Command: hist:keys=common_pid:count=count:onmax($count).trace(read_max,count)
^
Link: https://lkml.kernel.org/r/20210808003011.4037f8d0@oasis.local.home

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: stable@vger.kernel.org
Fixes: 50450603ec9cb tracing: Add 'onmax' hist trigger action support
Reviewed-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# a9d10ca4 27-Jul-2021 Masami Hiramatsu <mhiramat@kernel.org>

tracing: Reject string operand in the histogram expression

Since the string type can not be the target of the addition / subtraction
operation, it must be rejected. Without this fix, the string type silently
converted to digits.

Link: https://lkml.kernel.org/r/162742654278.290973.1523000673366456634.stgit@devnote2

Cc: stable@vger.kernel.org
Fixes: 100719dcef447 ("tracing: Add simple expression support to hist triggers")
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 2c05caa7 30-Jul-2021 Steven Rostedt (VMware) <rostedt@goodmis.org>

tracing / histogram: Give calculation hist_fields a size

When working on my user space applications, I found a bug in the synthetic
event code where the automated synthetic event field was not matching the
event field calculation it was attached to. Looking deeper into it, it was
because the calculation hist_field was not given a size.

The synthetic event fields are matched to their hist_fields either by
having the field have an identical string type, or if that does not match,
then the size and signed values are used to match the fields.

The problem arose when I tried to match a calculation where the fields
were "unsigned int". My tool created a synthetic event of type "u32". But
it failed to match. The string was:

diff=field1-field2:onmatch(event).trace(synth,$diff)

Adding debugging into the kernel, I found that the size of "diff" was 0.
And since it was given "unsigned int" as a type, the histogram fallback
code used size and signed. The signed matched, but the size of u32 (4) did
not match zero, and the event failed to be created.

This can be worse if the field you want to match is not one of the
acceptable fields for a synthetic event. As event fields can have any type
that is supported in Linux, this can cause an issue. For example, if a
type is an enum. Then there's no way to use that with any calculations.

Have the calculation field simply take on the size of what it is
calculating.

Link: https://lkml.kernel.org/r/20210730171951.59c7743f@oasis.local.home

Cc: Tom Zanussi <zanussi@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: stable@vger.kernel.org
Fixes: 100719dcef447 ("tracing: Add simple expression support to hist triggers")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 1e3bac71 21-Jul-2021 Steven Rostedt (VMware) <rostedt@goodmis.org>

tracing/histogram: Rename "cpu" to "common_cpu"

Currently the histogram logic allows the user to write "cpu" in as an
event field, and it will record the CPU that the event happened on.

The problem with this is that there's a lot of events that have "cpu"
as a real field, and using "cpu" as the CPU it ran on, makes it
impossible to run histograms on the "cpu" field of events.

For example, if I want to have a histogram on the count of the
workqueue_queue_work event on its cpu field, running:

># echo 'hist:keys=cpu' > events/workqueue/workqueue_queue_work/trigger

Gives a misleading and wrong result.

Change the command to "common_cpu" as no event should have "common_*"
fields as that's a reserved name for fields used by all events. And
this makes sense here as common_cpu would be a field used by all events.

Now we can even do:

># echo 'hist:keys=common_cpu,cpu if cpu < 100' > events/workqueue/workqueue_queue_work/trigger
># cat events/workqueue/workqueue_queue_work/hist
# event histogram
#
# trigger info: hist:keys=common_cpu,cpu:vals=hitcount:sort=hitcount:size=2048 if cpu < 100 [active]
#

{ common_cpu: 0, cpu: 2 } hitcount: 1
{ common_cpu: 0, cpu: 4 } hitcount: 1
{ common_cpu: 7, cpu: 7 } hitcount: 1
{ common_cpu: 0, cpu: 7 } hitcount: 1
{ common_cpu: 0, cpu: 1 } hitcount: 1
{ common_cpu: 0, cpu: 6 } hitcount: 2
{ common_cpu: 0, cpu: 5 } hitcount: 2
{ common_cpu: 1, cpu: 1 } hitcount: 4
{ common_cpu: 6, cpu: 6 } hitcount: 4
{ common_cpu: 5, cpu: 5 } hitcount: 14
{ common_cpu: 4, cpu: 4 } hitcount: 26
{ common_cpu: 0, cpu: 0 } hitcount: 39
{ common_cpu: 2, cpu: 2 } hitcount: 184

Now for backward compatibility, I added a trick. If "cpu" is used, and
the field is not found, it will fall back to "common_cpu" and work as
it did before. This way, it will still work for old programs that use
"cpu" to get the actual CPU, but if the event has a "cpu" as a field, it
will get that event's "cpu" field, which is probably what it wants
anyway.

I updated the tracefs/README to include documentation about both the
common_timestamp and the common_cpu. This way, if that text is present in
the README, then an application can know that common_cpu is supported over
just plain "cpu".

Link: https://lkml.kernel.org/r/20210721110053.26b4f641@oasis.local.home

Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: stable@vger.kernel.org
Fixes: 8b7622bf94a44 ("tracing: Add cpu field for hist triggers")
Reviewed-by: Tom Zanussi <zanussi@kernel.org>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 704adfb5 14-Jul-2021 Steven Rostedt (VMware) <rostedt@goodmis.org>

tracing: Do not reference char * as a string in histograms

The histogram logic was allowing events with char * pointers to be used as
normal strings. But it was easy to crash the kernel with:

# echo 'hist:keys=filename' > events/syscalls/sys_enter_openat/trigger

And open some files, and boom!

BUG: unable to handle page fault for address: 00007f2ced0c3280
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 1173fa067 P4D 1173fa067 PUD 1171b6067 PMD 1171dd067 PTE 0
Oops: 0000 [#1] PREEMPT SMP
CPU: 6 PID: 1810 Comm: cat Not tainted 5.13.0-rc5-test+ #61
Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01
v03.03 07/14/2016
RIP: 0010:strlen+0x0/0x20
Code: f6 82 80 2a 0b a9 20 74 11 0f b6 50 01 48 83 c0 01 f6 82 80 2a 0b
a9 20 75 ef c3 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 <80> 3f 00 74
10 48 89 f8 48 83 c0 01 80 38 00 75 f7 48 29 f8 c3

RSP: 0018:ffffbdbf81567b50 EFLAGS: 00010246
RAX: 0000000000000003 RBX: ffff93815cdb3800 RCX: ffff9382401a22d0
RDX: 0000000000000100 RSI: 0000000000000000 RDI: 00007f2ced0c3280
RBP: 0000000000000100 R08: ffff9382409ff074 R09: ffffbdbf81567c98
R10: ffff9382409ff074 R11: 0000000000000000 R12: ffff9382409ff074
R13: 0000000000000001 R14: ffff93815a744f00 R15: 00007f2ced0c3280
FS: 00007f2ced0f8580(0000) GS:ffff93825a800000(0000)
knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f2ced0c3280 CR3: 0000000107069005 CR4: 00000000001706e0
Call Trace:
event_hist_trigger+0x463/0x5f0
? find_held_lock+0x32/0x90
? sched_clock_cpu+0xe/0xd0
? lock_release+0x155/0x440
? kernel_init_free_pages+0x6d/0x90
? preempt_count_sub+0x9b/0xd0
? kernel_init_free_pages+0x6d/0x90
? get_page_from_freelist+0x12c4/0x1680
? __rb_reserve_next+0xe5/0x460
? ring_buffer_lock_reserve+0x12a/0x3f0
event_triggers_call+0x52/0xe0
ftrace_syscall_enter+0x264/0x2c0
syscall_trace_enter.constprop.0+0x1ee/0x210
do_syscall_64+0x1c/0x80
entry_SYSCALL_64_after_hwframe+0x44/0xae

Where it triggered a fault on strlen(key) where key was the filename.

The reason is that filename is a char * to user space, and the histogram
code just blindly dereferenced it, with obvious bad results.

I originally tried to use strncpy_from_user/kernel_nofault() but found
that there's other places that its dereferenced and not worth the effort.

Just do not allow "char *" to act like strings.

Link: https://lkml.kernel.org/r/20210715000206.025df9d2@rorschach.local.home

Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Tzvetomir Stoyanov <tz.stoyanov@gmail.com>
Cc: stable@vger.kernel.org
Acked-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Tom Zanussi <zanussi@kernel.org>
Fixes: 79e577cbce4c4 ("tracing: Support string type key properly")
Fixes: 5967bd5c4239 ("tracing: Let filter_assign_type() detect FILTER_PTR_STRING")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 26c56373 07-Jul-2021 Steven Rostedt (VMware) <rostedt@goodmis.org>

tracing/histograms: Fix parsing of "sym-offset" modifier

With the addition of simple mathematical operations (plus and minus), the
parsing of the "sym-offset" modifier broke, as it took the '-' part of the
"sym-offset" as a minus, and tried to break it up into a mathematical
operation of "field.sym - offset", in which case it failed to parse
(unless the event had a field called "offset").

Both .sym and .sym-offset modifiers should not be entered into
mathematical calculations anyway. If ".sym-offset" is found in the
modifier, then simply make it not an operation that can be calculated on.

Link: https://lkml.kernel.org/r/20210707110821.188ae255@oasis.local.home

Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: stable@vger.kernel.org
Fixes: 100719dcef447 ("tracing: Add simple expression support to hist triggers")
Reviewed-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 6c610dba 29-May-2021 Hyeonggon Yoo <42.hyeyoo@gmail.com>

tracing: Add WARN_ON_ONCE when returned value is negative

ret is assigned return value of event_hist_trigger_func, but the value
is unused. It is better to warn when returned value is negative,
rather than just ignoring it.

Link: https://lkml.kernel.org/r/20210529061423.GA103954@hyeyoo

Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 614db49c 08-May-2021 Jiapeng Chong <jiapeng.chong@linux.alibaba.com>

tracing: Remove redundant assignment to event_var

Variable event_var is set to 'ERR_PTR(-EINVAL)', but this value
is never read as it is overwritten or not used later on, hence
it is a redundant assignment and can be removed.

Clean up the following clang-analyzer warning:

kernel/trace/trace_events_hist.c:2437:21: warning: Value stored to
'event_var' during its initialization is never read
[clang-analyzer-deadcode.DeadStores].

Link: https://lkml.kernel.org/r/1620470236-26562-1-git-send-email-jiapeng.chong@linux.alibaba.com

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# b94bc80d 15-Mar-2021 Steven Rostedt (VMware) <rostedt@goodmis.org>

tracing: Use a no_filter_buffering_ref to stop using the filter buffer

Currently, the trace histograms relies on it using absolute time stamps to
trigger the tracing to not use the temp buffer if filters are set. That's
because the histograms need the full timestamp that is saved in the ring
buffer. That is no longer the case, as the ring_buffer_event_time_stamp()
can now return the time stamp for all events without all triggering a full
absolute time stamp.

Now that the absolute time stamp is an unrelated dependency to not using
the filters. There's nothing about having absolute timestamps to keep from
using the filter buffer. Instead, change the interface to explicitly state
to disable filter buffering that the histogram logic can use.

Link: https://lkml.kernel.org/r/20210316164113.847886563@goodmis.org

Reviewed-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# efe6196a 15-Mar-2021 Steven Rostedt (VMware) <rostedt@goodmis.org>

ring-buffer: Allow ring_buffer_event_time_stamp() to return time stamp of all events

Currently, ring_buffer_event_time_stamp() only returns an accurate time
stamp of the event if it has an absolute extended time stamp attached to
it. To make it more robust, use the event_stamp() in case the event does
not have an absolute value attached to it.

This will allow ring_buffer_event_time_stamp() to be used in more cases
than just histograms, and it will also allow histograms to not require
including absolute values all the time.

Link: https://lkml.kernel.org/r/20210316164113.704830885@goodmis.org

Reviewed-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# b47e3302 15-Mar-2021 Steven Rostedt (VMware) <rostedt@goodmis.org>

tracing: Pass buffer of event to trigger operations

The ring_buffer_event_time_stamp() is going to be updated to extract the
time stamp for the event without needing it to be set to have absolute
values for all events. But to do so, it needs the buffer that the event is
on as the buffer saves information for the event before it is committed to
the buffer.

If the trace buffer is disabled, a temporary buffer is used, and there's
no access to this buffer from the current histogram triggers, even though
it is passed to the trace event code.

Pass the buffer that the event is on all the way down to the histogram
triggers.

Link: https://lkml.kernel.org/r/20210316164113.542448131@goodmis.org

Reviewed-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 2b5894cc 29-Oct-2020 Qiujun Huang <hqjagain@gmail.com>

tracing: Fix some typos in comments

s/detetector/detector/
s/enfoced/enforced/
s/writen/written/
s/actualy/actually/
s/bascially/basically/
s/Regarldess/Regardless/
s/zeroes/zeros/
s/followd/followed/
s/incrememented/incremented/
s/separatelly/separately/
s/accesible/accessible/
s/sythetic/synthetic/
s/enabed/enabled/
s/heurisitc/heuristic/
s/assocated/associated/
s/otherwides/otherwise/
s/specfied/specified/
s/seaching/searching/
s/hierachry/hierarchy/
s/internel/internal/
s/Thise/This/

Link: https://lkml.kernel.org/r/20201029150554.3354-1-hqjagain@gmail.com

Signed-off-by: Qiujun Huang <hqjagain@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 6d9bd139 13-Oct-2020 Steven Rostedt (VMware) <rostedt@goodmis.org>

tracing: Check return value of __create_val_fields() before using its result

After having a typo for writing a histogram trigger.

Wrote:
echo 'hist:key=pid:ts=common_timestamp.usec' > events/sched/sched_waking/trigger

Instead of:
echo 'hist:key=pid:ts=common_timestamp.usecs' > events/sched/sched_waking/trigger

and the following crash happened:

BUG: kernel NULL pointer dereference, address: 0000000000000008
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: 0000 [#1] PREEMPT SMP PTI
CPU: 4 PID: 1641 Comm: sh Not tainted 5.9.0-rc5-test+ #549
Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v03.03 07/14/2016
RIP: 0010:event_hist_trigger_func+0x70b/0x1ee0
Code: 24 08 89 d5 49 89 cc e9 8c 00 00 00 4c 89 f2 41 b9 00 10 00 00 4c 89 e1 44 89 ee 4c 89 ff e8 dc d3 ff ff 45 89 ea 4b 8b 14 d7 <f6> 42 08 04 74 17 41 8b 8f c0 00 00 00 8d 71 01 41 89 b7 c0 00 00
RSP: 0018:ffff959213d53db0 EFLAGS: 00010202
RAX: ffffffffffffffea RBX: 0000000000000000 RCX: 0000000000084c04
RDX: 0000000000000000 RSI: df7326aefebd174c RDI: 0000000000031080
RBP: 0000000000000002 R08: 0000000000000001 R09: 0000000000000001
R10: 0000000000000001 R11: 0000000000000046 R12: ffff959211dcf690
R13: 0000000000000001 R14: ffff95925a36e370 R15: ffff959251c89800
FS: 00007fb9ea934740(0000) GS:ffff95925ab00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000008 CR3: 00000000c976c005 CR4: 00000000001706e0
Call Trace:
? trigger_process_regex+0x78/0x110
trigger_process_regex+0xc5/0x110
event_trigger_write+0x71/0xd0
vfs_write+0xca/0x210
ksys_write+0x70/0xf0
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7fb9eaa29487
Code: 64 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24

This was caused by accessing the hlist_data fields after the call to
__create_val_fields() without checking if the creation succeed.

Link: https://lkml.kernel.org/r/20201013154852.3abd8702@gandalf.local.home

Fixes: 63a1e5de3006 ("tracing: Save normal string variables")
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# bd82631d 04-Oct-2020 Tom Zanussi <zanussi@kernel.org>

tracing: Add support for dynamic strings to synthetic events

Currently, sythetic events only support static string fields such as:

# echo 'test_latency u64 lat; char somename[32]' > /sys/kernel/debug/tracing/synthetic_events

Which is fine, but wastes a lot of space in the event.

It also prevents the most commonly-defined strings in the existing
trace events e.g. those defined using __string(), from being passed to
synthetic events via the trace() action.

With this change, synthetic events with dynamic fields can be defined:

# echo 'test_latency u64 lat; char somename[]' > /sys/kernel/debug/tracing/synthetic_events

And the trace() action can be used to generate events using either
dynamic or static strings:

# echo 'hist:keys=name:lat=common_timestamp.usecs-$ts0:onmatch(sys.event).test_latency($lat,name)' > /sys/kernel/debug/tracing/events

The synthetic event dynamic strings are implemented in the same way as
the existing __data_loc strings and appear as such in the format file.

[ <rostedt@goodmis.org>: added __set_synth_event_print_fmt() changes:

I added the following to make it work with trace-cmd. Dynamic strings
must have __get_str() for events in the print_fmt otherwise it can't be
parsed correctly. ]

Link: https://lore.kernel.org/r/cover.1601588066.git.zanussi@kernel.org
Link: https://lkml.kernel.org/r/3ed35b6d0e390f5b94cb4a9ba1cc18f5982ab277.1601848695.git.zanussi@kernel.org

Tested-by: Axel Rasmussen <axelrasmussen@google.com>
Signed-off-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 63a1e5de 04-Oct-2020 Tom Zanussi <zanussi@kernel.org>

tracing: Save normal string variables

String variables created as field variables and save variables are
already handled properly by having their values copied when set. The
same isn't done for normal variables, but needs to be - simply saving
a pointer to a string contained in an old event isn't sufficient,
since that event's data may quickly become overwritten and therefore a
string pointer to it could yield garbage.

This change uses the same mechanism as field variables and simply
appends the new strings to the existing per-element field_var_str[]
array allocated for that purpose.

Link: https://lkml.kernel.org/r/1c1a03798b02e67307412a0c719d1bfb69b13007.1601848695.git.zanussi@kernel.org

Fixes: 02205a6752f2 (tracing: Add support for 'field variables')
Signed-off-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 4a4a56b4 04-Oct-2020 Tom Zanussi <zanussi@kernel.org>

tracing: Change STR_VAR_MAX_LEN

32 is too small for this value, and anyway it makes more sense to use
MAX_FILTER_STR_VAL, as this is also the value used for variable-length
__strings.

Link: https://lkml.kernel.org/r/6adfd1668ac1fd8670bd58206944a762061a5559.1601848695.git.zanussi@kernel.org

Tested-by: Axel Rasmussen <axelrasmussen@google.com>
Signed-off-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 46bbe5c6 07-Sep-2020 Tom Rix <trix@redhat.com>

tracing: fix double free

clang static analyzer reports this problem

trace_events_hist.c:3824:3: warning: Attempt to free
released memory
kfree(hist_data->attrs->var_defs.name[i]);

In parse_var_defs() if there is a problem allocating
var_defs.expr, the earlier var_defs.name is freed.
This free is duplicated by free_var_defs() which frees
the rest of the list.

Because free_var_defs() has to run anyway, remove the
second free fom parse_var_defs().

Link: https://lkml.kernel.org/r/20200907135845.15804-1-trix@redhat.com

Cc: stable@vger.kernel.org
Fixes: 30350d65ac56 ("tracing: Add variable support to hist triggers")
Reviewed-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Tom Rix <trix@redhat.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 726721a5 28-May-2020 Tom Zanussi <zanussi@kernel.org>

tracing: Move synthetic events to a separate file

With the addition of the in-kernel synthetic event API, synthetic
events are no longer specifically tied to the histogram triggers.

The synthetic event code is also making trace_event_hist.c very
bloated, so for those reasons, move it to a separate file,
trace_events_synth.c, along with a new trace_synth.h header file.

Because synthetic events are now independent from hist triggers, add a
new CONFIG_SYNTH_EVENTS config option, and have CONFIG_HIST_TRIGGERS
select it, and have CONFIG_SYNTH_EVENT_GEN_TEST depend on it.

Link: http://lkml.kernel.org/r/4d1fa1f85ed5982706ac44844ac92451dcb04715.1590693308.git.zanussi@kernel.org

Signed-off-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 2d19bd79 03-Apr-2020 Tom Zanussi <zanussi@kernel.org>

tracing: Add hist_debug trace event files for histogram debugging

Add a new "hist_debug" file for each trace event, which when read will
dump out a bunch of internal details about the hist triggers defined
on that event.

This is normally off but can be enabled by saying 'y' to the new
CONFIG_HIST_TRIGGERS_DEBUG config option.

This is in support of the new Documentation file describing histogram
internals, Documentation/trace/histogram-design.rst, which was
requested by developers trying to understand the internals when
extending or making use of the hist triggers for higher-level tools.

The histogram-design.rst documentation refers to the hist_debug files
and demonstrates their use with output in the test examples.

Link: http://lkml.kernel.org/r/77914c22b0ba493d9783c53bbfbc6087d6a7e1b1.1585941485.git.zanussi@kernel.org

Signed-off-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 1b94b3ae 03-Apr-2020 Tom Zanussi <zanussi@kernel.org>

tracing: Check state.disabled in synth event trace functions

Since trace_state.disabled is set in __synth_event_trace_start() at
the same time -ENOENT is returned, don't bother returning -ENOENT -
just have callers check trace_state.disabled instead, and avoid the
extra return val munging.

Link: http://lkml.kernel.org/r/87315c3889af870e8370e82b76cf48b426d70130.1585941485.git.zanussi@kernel.org

Suggested-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@godmis.org>


# 9da73974 22-Apr-2020 Vamshi K Sthambamkadi <vamshi.k.sthambamkadi@gmail.com>

tracing: Fix memory leaks in trace_events_hist.c

kmemleak report 1:
[<9092c50b>] kmem_cache_alloc_trace+0x138/0x270
[<05a2c9ed>] create_field_var+0xcf/0x180
[<528a2d68>] action_create+0xe2/0xc80
[<63f50b61>] event_hist_trigger_func+0x15b5/0x1920
[<28ea5d3d>] trigger_process_regex+0x7b/0xc0
[<3138e86f>] event_trigger_write+0x4d/0xb0
[<ffd66c19>] __vfs_write+0x30/0x200
[<4f424a0d>] vfs_write+0x96/0x1b0
[<da59a290>] ksys_write+0x53/0xc0
[<3717101a>] __ia32_sys_write+0x15/0x20
[<c5f23497>] do_fast_syscall_32+0x70/0x250
[<46e2629c>] entry_SYSENTER_32+0xaf/0x102

This is because save_vars[] of struct hist_trigger_data are
not destroyed

kmemleak report 2:
[<9092c50b>] kmem_cache_alloc_trace+0x138/0x270
[<6e5e97c5>] create_var+0x3c/0x110
[<de82f1b9>] create_field_var+0xaf/0x180
[<528a2d68>] action_create+0xe2/0xc80
[<63f50b61>] event_hist_trigger_func+0x15b5/0x1920
[<28ea5d3d>] trigger_process_regex+0x7b/0xc0
[<3138e86f>] event_trigger_write+0x4d/0xb0
[<ffd66c19>] __vfs_write+0x30/0x200
[<4f424a0d>] vfs_write+0x96/0x1b0
[<da59a290>] ksys_write+0x53/0xc0
[<3717101a>] __ia32_sys_write+0x15/0x20
[<c5f23497>] do_fast_syscall_32+0x70/0x250
[<46e2629c>] entry_SYSENTER_32+0xaf/0x102

struct hist_field allocated through create_var() do not initialize
"ref" field to 1. The code in __destroy_hist_field() does not destroy
object if "ref" is initialized to zero, the condition
if (--hist_field->ref > 1) always passes since unsigned int wraps.

kmemleak report 3:
[<f8666fcc>] __kmalloc_track_caller+0x139/0x2b0
[<bb7f80a5>] kstrdup+0x27/0x50
[<39d70006>] init_var_ref+0x58/0xd0
[<8ca76370>] create_var_ref+0x89/0xe0
[<f045fc39>] action_create+0x38f/0xc80
[<7c146821>] event_hist_trigger_func+0x15b5/0x1920
[<07de3f61>] trigger_process_regex+0x7b/0xc0
[<e87daf8f>] event_trigger_write+0x4d/0xb0
[<19bf1512>] __vfs_write+0x30/0x200
[<64ce4d27>] vfs_write+0x96/0x1b0
[<a6f34170>] ksys_write+0x53/0xc0
[<7d4230cd>] __ia32_sys_write+0x15/0x20
[<8eadca00>] do_fast_syscall_32+0x70/0x250
[<235cf985>] entry_SYSENTER_32+0xaf/0x102

hist_fields (system & event_name) are not freed

Link: http://lkml.kernel.org/r/20200422061503.GA5151@cosmos

Signed-off-by: Vamshi K Sthambamkadi <vamshi.k.sthambamkadi@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 7ab215f2 17-Feb-2020 Masami Hiramatsu <mhiramat@kernel.org>

tracing: Clear trace_state when starting trace

Clear trace_state data structure when starting trace
in __synth_event_trace_start() internal function.

Currently trace_state is initialized only in the
synth_event_trace_start() API, but the trace_state
in synth_event_trace() and synth_event_trace_array()
are on the stack without initialization.
This means those APIs will see wrong parameters and
wil skip closing process in __synth_event_trace_end()
because trace_state->disabled may be !0.

Link: http://lkml.kernel.org/r/158193315899.8868.1781259176894639952.stgit@devnote2

Reviewed-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 784bd084 14-Feb-2020 Tom Zanussi <zanussi@kernel.org>

tracing: Fix number printing bug in print_synth_event()

Fix a varargs-related bug in print_synth_event() which resulted in
strange output and oopses on 32-bit x86 systems. The problem is that
trace_seq_printf() expects the varargs to match the format string, but
print_synth_event() was always passing u64 values regardless. This
results in unspecified behavior when unpacking with va_arg() in
trace_seq_printf().

Add a function that takes the size into account when calling
trace_seq_printf().

Before:

modprobe-1731 [003] .... 919.039758: gen_synth_test: next_pid_field=777(null)next_comm_field=hula hoops ts_ns=1000000 ts_ms=1000 cpu=3(null)my_string_field=thneed my_int_field=598(null)

After:

insmod-1136 [001] .... 36.634590: gen_synth_test: next_pid_field=777 next_comm_field=hula hoops ts_ns=1000000 ts_ms=1000 cpu=1 my_string_field=thneed my_int_field=598

Link: http://lkml.kernel.org/r/a9b59eb515dbbd7d4abe53b347dccf7a8e285657.1581720155.git.zanussi@kernel.org

Reported-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 38430837 14-Feb-2020 Tom Zanussi <zanussi@kernel.org>

tracing: Check that number of vals matches number of synth event fields

Commit 7276531d4036('tracing: Consolidate trace() functions')
inadvertently dropped the synth_event_trace() and
synth_event_trace_array() checks that verify the number of values
passed in matches the number of fields in the synthetic event being
traced, so add them back.

Link: http://lkml.kernel.org/r/32819cac708714693669e0dfe10fe9d935e94a16.1581720155.git.zanussi@kernel.org

Signed-off-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 1d9d4c90 14-Feb-2020 Tom Zanussi <zanussi@kernel.org>

tracing: Make synth_event trace functions endian-correct

synth_event_trace(), synth_event_trace_array() and
__synth_event_add_val() write directly into the trace buffer and need
to take endianness into account, like trace_event_raw_event_synth()
does.

Link: http://lkml.kernel.org/r/2011354355e405af9c9d28abba430d1f5ff7771a.1581720155.git.zanussi@kernel.org

Signed-off-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 7276531d 10-Feb-2020 Tom Zanussi <zanussi@kernel.org>

tracing: Consolidate trace() functions

Move the checking, buffer reserve and buffer commit code in
synth_event_trace_start/end() into inline functions
__synth_event_trace_start/end() so they can also be used by
synth_event_trace() and synth_event_trace_array(), and then have all
those functions use them.

Also, change synth_event_trace_state.enabled to disabled so it only
needs to be set if the event is disabled, which is not normally the
case.

Link: http://lkml.kernel.org/r/b1f3108d0f450e58192955a300e31d0405ab4149.1581374549.git.zanussi@kernel.org

Signed-off-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 0c62f6cd 10-Feb-2020 Tom Zanussi <zanussi@kernel.org>

tracing: Don't return -EINVAL when tracing soft disabled synth events

There's no reason to return -EINVAL when tracing a synthetic event if
it's soft disabled - treat it the same as if it were hard disabled and
return normally.

Have synth_event_trace() and synth_event_trace_array() just return
normally, and have synth_event_trace_start set the trace state to
disabled and return.

Link: http://lkml.kernel.org/r/df5d02a1625aff97c9866506c5bada6a069982ba.1581374549.git.zanussi@kernel.org

Fixes: 8dcc53ad956d2 ("tracing: Add synth_event_trace() and related functions")
Signed-off-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# d090409a 10-Feb-2020 Tom Zanussi <zanussi@kernel.org>

tracing: Add missing nest end to synth_event_trace_start() error case

If the ring_buffer reserve in synth_event_trace_start() fails, the
matching ring_buffer_nest_end() should be called in the error code,
since nothing else will ever call it in this case.

Link: http://lkml.kernel.org/r/20abc444b3eeff76425f895815380abe7aa53ff8.1581374549.git.zanussi@kernel.org

Fixes: 8dcc53ad956d2 ("tracing: Add synth_event_trace() and related functions")
Signed-off-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 2b90927c 31-Jan-2020 Tom Zanussi <zanussi@kernel.org>

tracing: Use seq_buf for building dynevent_cmd string

The dynevent_cmd commands that build up the command string don't need
to do that themselves - there's a seq_buf facility that does pretty
much the same thing those command are doing manually, so use it
instead.

Link: http://lkml.kernel.org/r/eb8a6e835c964d0ab8a38cbf5ffa60746b54a465.1580506712.git.zanussi@kernel.org

Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 74403b6c 31-Jan-2020 Tom Zanussi <zanussi@kernel.org>

tracing: Remove check_arg() callbacks from dynevent args

It's kind of strange to have check_arg() callbacks as part of the arg
objects themselves; it makes more sense to just pass these in when the
args are added instead.

Remove the check_arg() callbacks from those objects which also means
removing the check_arg() args from the init functions, adding them to
the add functions and fixing up existing callers.

Link: http://lkml.kernel.org/r/c7708d6f177fcbe1a36b6e4e8e150907df0fa5d2.1580506712.git.zanussi@kernel.org

Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 249d7b2e 31-Jan-2020 Tom Zanussi <zanussi@kernel.org>

tracing: Consolidate some synth_event_trace code

The synth_event trace code contains some almost identical functions
and some small functions that are called only once - consolidate the
common code into single functions and fold in the small functions to
simplify the code overall.

Link: http://lkml.kernel.org/r/d1c8d8ad124a653b7543afe801d38c199ca5c20e.1580506712.git.zanussi@kernel.org

Signed-off-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# d380dcde 29-Jan-2020 Tom Zanussi <zanussi@kernel.org>

tracing: Fix now invalid var_ref_vals assumption in trace action

The patch 'tracing: Fix histogram code when expression has same var as
value' added code to return an existing variable reference when
creating a new variable reference, which resulted in var_ref_vals
slots being reused instead of being duplicated.

The implementation of the trace action assumes that the end of the
var_ref_vals array starting at action_data.var_ref_idx corresponds to
the values that will be assigned to the trace params. The patch
mentioned above invalidates that assumption, which means that each
param needs to explicitly specify its index into var_ref_vals.

This fix changes action_data.var_ref_idx to an array of var ref
indexes to account for that.

Link: https://lore.kernel.org/r/1580335695.6220.8.camel@kernel.org

Fixes: 8bcebc77e85f ("tracing: Fix histogram code when expression has same var as value")
Signed-off-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# fdeb1aca 28-Jan-2020 Tom Zanussi <zanussi@kernel.org>

tracing: Change trace_boot to use synth_event interface

Have trace_boot_add_synth_event() use the synth_event interface.

Also, rename synth_event_run_cmd() to synth_event_run_command() now
that trace_boot's version is gone.

Link: http://lkml.kernel.org/r/94f1fa0e31846d0bddca916b8663404b20559e34.1580323897.git.zanussi@kernel.org

Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 8dcc53ad 28-Jan-2020 Tom Zanussi <zanussi@kernel.org>

tracing: Add synth_event_trace() and related functions

Add an exported function named synth_event_trace(), allowing modules
or other kernel code to trace synthetic events.

Also added are several functions that allow the same functionality to
be broken out in a piecewise fashion, which are useful in situations
where tracing an event from a full array of values would be
cumbersome. Those functions are synth_event_trace_start/end() and
synth_event_add_(next)_val().

Link: http://lkml.kernel.org/r/7a84de5f1854acf4144b57efe835ca645afa764f.1580323897.git.zanussi@kernel.org

Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 35ca5207 28-Jan-2020 Tom Zanussi <zanussi@kernel.org>

tracing: Add synthetic event command generation functions

Add functions used to generate synthetic event commands, built on top
of the dynevent_cmd interface.

synth_event_gen_cmd_start() is used to create a synthetic event
command using a variable arg list and
synth_event_gen_cmd_array_start() does the same thing but using an
array of field descriptors. synth_event_add_field(),
synth_event_add_field_str() and synth_event_add_fields() can be used
to add single fields one by one or as a group. Once all desired
fields are added, synth_event_gen_cmd_end() is used to actually
execute the command and create the event.

synth_event_create() does everything, including creating the event, in
a single call.

Link: http://lkml.kernel.org/r/38fef702fad5ef208009f459552f34a94befd860.1580323897.git.zanussi@kernel.org

Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# f5f6b255 28-Jan-2020 Tom Zanussi <zanussi@kernel.org>

tracing: Add synth_event_delete()

create_or_delete_synth_event() contains code to delete a synthetic
event, which would be useful on its own - specifically, it would be
useful to allow event-creating modules to call it separately.

Separate out the delete code from that function and create an exported
function named synth_event_delete().

Link: http://lkml.kernel.org/r/050db3b06df7f0a4b8a2922da602d1d879c7c1c2.1580323897.git.zanussi@kernel.org

Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# d0a49706 27-Jun-2019 Tom Zanussi <zanussi@kernel.org>

tracing: Add 'hist:' to hist trigger error log error string

The 'hist:' prefix gets stripped from the command text during command
processing, but should be added back when displaying the command
during error processing.

Not only because it's what should be displayed but also because not
having it means the test cases fail because the caret is miscalculated
by the length of the prefix string.

Link: http://lkml.kernel.org/r/449df721f560042e22382f67574bcc5b4d830d3d.1561743018.git.zanussi@kernel.org

Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 4de26c8c 27-Jun-2019 Tom Zanussi <zanussi@kernel.org>

tracing: Add hist trigger error messages for sort specification

Add error codes and messages for all the error paths leading to sort
specification parsing errors.

Link: http://lkml.kernel.org/r/237830dc05e583fbb53664d817a784297bf961be.1561743018.git.zanussi@kernel.org

Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# b527b638 27-Jun-2019 Tom Zanussi <zanussi@kernel.org>

tracing: Simplify assignment parsing for hist triggers

In the process of adding better error messages for sorting, I realized
that strsep was being used incorrectly and some of the error paths I
was expecting to be hit weren't and just fell through to the common
invalid key error case.

It also became obvious that for keyword assignments, it wasn't
necessary to save the full assignment and reparse it later, and having
a common empty-assignment check would also make more sense in terms of
error processing.

Change the code to fix these problems and simplify it for new error
message changes in a subsequent patch.

Link: http://lkml.kernel.org/r/1c3ef0b6655deaf345f6faee2584a0298ac2d743.1561743018.git.zanussi@kernel.org

Fixes: e62347d24534 ("tracing: Add hist trigger support for user-defined sorting ('sort=' param)")
Fixes: 7ef224d1d0e3 ("tracing: Add 'hist' event trigger command")
Fixes: a4072fe85ba3 ("tracing: Add a clock attribute for hist triggers")
Reported-by: Masami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 8bcebc77 20-Jan-2020 Steven Rostedt (VMware) <rostedt@goodmis.org>

tracing: Fix histogram code when expression has same var as value

While working on a tool to convert SQL syntex into the histogram language of
the kernel, I discovered the following bug:

# echo 'first u64 start_time u64 end_time pid_t pid u64 delta' >> synthetic_events
# echo 'hist:keys=pid:start=common_timestamp' > events/sched/sched_waking/trigger
# echo 'hist:keys=next_pid:delta=common_timestamp-$start,start2=$start:onmatch(sched.sched_waking).trace(first,$start2,common_timestamp,next_pid,$delta)' > events/sched/sched_switch/trigger

Would not display any histograms in the sched_switch histogram side.

But if I were to swap the location of

"delta=common_timestamp-$start" with "start2=$start"

Such that the last line had:

# echo 'hist:keys=next_pid:start2=$start,delta=common_timestamp-$start:onmatch(sched.sched_waking).trace(first,$start2,common_timestamp,next_pid,$delta)' > events/sched/sched_switch/trigger

The histogram works as expected.

What I found out is that the expressions clear out the value once it is
resolved. As the variables are resolved in the order listed, when
processing:

delta=common_timestamp-$start

The $start is cleared. When it gets to "start2=$start", it errors out with
"unresolved symbol" (which is silent as this happens at the location of the
trace), and the histogram is dropped.

When processing the histogram for variable references, instead of adding a
new reference for a variable used twice, use the same reference. That way,
not only is it more efficient, but the order will no longer matter in
processing of the variables.

From Tom Zanussi:

"Just to clarify some more about what the problem was is that without
your patch, we would have two separate references to the same variable,
and during resolve_var_refs(), they'd both want to be resolved
separately, so in this case, since the first reference to start wasn't
part of an expression, it wouldn't get the read-once flag set, so would
be read normally, and then the second reference would do the read-once
read and also be read but using read-once. So everything worked and
you didn't see a problem:

from: start2=$start,delta=common_timestamp-$start

In the second case, when you switched them around, the first reference
would be resolved by doing the read-once, and following that the second
reference would try to resolve and see that the variable had already
been read, so failed as unset, which caused it to short-circuit out and
not do the trigger action to generate the synthetic event:

to: delta=common_timestamp-$start,start2=$start

With your patch, we only have the single resolution which happens
correctly the one time it's resolved, so this can't happen."

Link: https://lore.kernel.org/r/20200116154216.58ca08eb@gandalf.local.home

Cc: stable@vger.kernel.org
Fixes: 067fe038e70f6 ("tracing: Add variable reference handling to hist triggers")
Reviewed-by: Tom Zanuss <zanussi@kernel.org>
Tested-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# aeed8aa3 19-Dec-2019 Masami Hiramatsu <mhiramat@kernel.org>

tracing: trigger: Replace unneeded RCU-list traversals

With CONFIG_PROVE_RCU_LIST, I had many suspicious RCU warnings
when I ran ftracetest trigger testcases.

-----
# dmesg -c > /dev/null
# ./ftracetest test.d/trigger
...
# dmesg | grep "RCU-list traversed" | cut -f 2 -d ] | cut -f 2 -d " "
kernel/trace/trace_events_hist.c:6070
kernel/trace/trace_events_hist.c:1760
kernel/trace/trace_events_hist.c:5911
kernel/trace/trace_events_trigger.c:504
kernel/trace/trace_events_hist.c:1810
kernel/trace/trace_events_hist.c:3158
kernel/trace/trace_events_hist.c:3105
kernel/trace/trace_events_hist.c:5518
kernel/trace/trace_events_hist.c:5998
kernel/trace/trace_events_hist.c:6019
kernel/trace/trace_events_hist.c:6044
kernel/trace/trace_events_trigger.c:1500
kernel/trace/trace_events_trigger.c:1540
kernel/trace/trace_events_trigger.c:539
kernel/trace/trace_events_trigger.c:584
-----

I investigated those warnings and found that the RCU-list
traversals in event trigger and hist didn't need to use
RCU version because those were called only under event_mutex.

I also checked other RCU-list traversals related to event
trigger list, and found that most of them were called from
event_hist_trigger_func() or hist_unregister_trigger() or
register/unregister functions except for a few cases.

Replace these unneeded RCU-list traversals with normal list
traversal macro and lockdep_assert_held() to check the
event_mutex is held.

Link: http://lkml.kernel.org/r/157680910305.11685.15110237954275915782.stgit@devnote2

Cc: stable@vger.kernel.org
Fixes: 30350d65ac567 ("tracing: Add variable support to hist triggers")
Reviewed-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 3b42a4c8 19-Dec-2019 Masami Hiramatsu <mhiramat@kernel.org>

tracing: trigger: Replace unneeded RCU-list traversals

With CONFIG_PROVE_RCU_LIST, I had many suspicious RCU warnings
when I ran ftracetest trigger testcases.

-----
# dmesg -c > /dev/null
# ./ftracetest test.d/trigger
...
# dmesg | grep "RCU-list traversed" | cut -f 2 -d ] | cut -f 2 -d " "
kernel/trace/trace_events_hist.c:6070
kernel/trace/trace_events_hist.c:1760
kernel/trace/trace_events_hist.c:5911
kernel/trace/trace_events_trigger.c:504
kernel/trace/trace_events_hist.c:1810
kernel/trace/trace_events_hist.c:3158
kernel/trace/trace_events_hist.c:3105
kernel/trace/trace_events_hist.c:5518
kernel/trace/trace_events_hist.c:5998
kernel/trace/trace_events_hist.c:6019
kernel/trace/trace_events_hist.c:6044
kernel/trace/trace_events_trigger.c:1500
kernel/trace/trace_events_trigger.c:1540
kernel/trace/trace_events_trigger.c:539
kernel/trace/trace_events_trigger.c:584
-----

I investigated those warnings and found that the RCU-list
traversals in event trigger and hist didn't need to use
RCU version because those were called only under event_mutex.

I also checked other RCU-list traversals related to event
trigger list, and found that most of them were called from
event_hist_trigger_func() or hist_unregister_trigger() or
register/unregister functions except for a few cases.

Replace these unneeded RCU-list traversals with normal list
traversal macro and lockdep_assert_held() to check the
event_mutex is held.

Link: http://lkml.kernel.org/r/157680910305.11685.15110237954275915782.stgit@devnote2

Reviewed-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 3fbe2d6e 10-Jan-2020 Masami Hiramatsu <mhiramat@kernel.org>

tracing/boot: Add synthetic event support

Add synthetic event node support to boot time tracing.
The synthetic event is a kind of event node, but the group
name is "synthetic".

- ftrace.event.synthetic.EVENT.fields = FIELD[, FIELD2...]
Defines new synthetic event with FIELDs. Each field should be
"type varname".

The synthetic node requires "fields" string arraies, which defines
the fields as same as tracing/synth_events interface.

Link: http://lkml.kernel.org/r/157867241236.17873.12411615143321557709.stgit@devnote2

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 48ac9488 10-Jan-2020 Masami Hiramatsu <mhiramat@kernel.org>

tracing: Add NULL trace-array check in print_synth_event()

Add NULL trace-array check in print_synth_event(), because
if we enable tp_printk option, iter->tr can be NULL.

Link: http://lkml.kernel.org/r/157867236536.17873.12529350542460184019.stgit@devnote2

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# b05e89ae 10-Jan-2020 Masami Hiramatsu <mhiramat@kernel.org>

tracing: Accept different type for synthetic event fields

Make the synthetic event accepts a different type field to record.
However, the size and signed flag must be same.

Link: http://lkml.kernel.org/r/157867235358.17873.61732996461602171.stgit@devnote2

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 13292494 13-Dec-2019 Steven Rostedt (VMware) <rostedt@goodmis.org>

tracing: Make struct ring_buffer less ambiguous

As there's two struct ring_buffers in the kernel, it causes some confusion.
The other one being the perf ring buffer. It was agreed upon that as neither
of the ring buffers are generic enough to be used globally, they should be
renamed as:

perf's ring_buffer -> perf_buffer
ftrace's ring_buffer -> trace_buffer

This implements the changes to the ring buffer that ftrace uses.

Link: https://lore.kernel.org/r/20191213140531.116b3200@gandalf.local.home

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 1c5eb448 09-Jan-2020 Steven Rostedt (VMware) <rostedt@goodmis.org>

tracing: Rename trace_buffer to array_buffer

As we are working to remove the generic "ring_buffer" name that is used by
both tracing and perf, the ring_buffer name for tracing will be renamed to
trace_buffer, and perf's ring buffer will be renamed to perf_buffer.

As there already exists a trace_buffer that is used by the trace_arrays, it
needs to be first renamed to array_buffer.

Link: https://lore.kernel.org/r/20191213153553.GE20583@krava

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# fe6e096a 18-Dec-2019 Sven Schnelle <svens@linux.ibm.com>

tracing: Fix endianness bug in histogram trigger

At least on PA-RISC and s390 synthetic histogram triggers are failing
selftests because trace_event_raw_event_synth() always writes a 64 bit
values, but the reader expects a field->size sized value. On little endian
machines this doesn't hurt, but on big endian this makes the reader always
read zero values.

Link: http://lore.kernel.org/linux-trace-devel/20191218074427.96184-4-svens@linux.ibm.com

Cc: stable@vger.kernel.org
Fixes: 4b147936fa509 ("tracing: Add support for 'synthetic' events")
Acked-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 04ae87a5 24-Oct-2019 Peter Zijlstra <peterz@infradead.org>

ftrace: Rework event_create_dir()

Rework event_create_dir() to use an array of static data instead of
function pointers where possible.

The problem is that it would call the function pointer on module load
before parse_args(), possibly even before jump_labels were initialized.
Luckily the generated functions don't use jump_labels but it still seems
fragile. It also gets in the way of changing when we make the module map
executable.

The generated function are basically calling trace_define_field() with a
bunch of static arguments. So instead of a function, capture these
arguments in a static array, avoiding the function call.

Now there are a number of cases where the fields are dynamic (syscall
arguments, kprobes and uprobes), in which case a static array does not
work, for these we preserve the function call. Luckily all these cases
are not related to modules and so we can retain the function call for
them.

Also fix up all broken tracepoint definitions that now generate a
compile error.

Tested-by: Alexei Starovoitov <ast@kernel.org>
Tested-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20191111132458.342979914@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 58a74a29 15-Nov-2019 Artem Bityutskiy <artem.bityutskiy@linux.intel.com>

tracing: Increase SYNTH_FIELDS_MAX for synthetic_events

Increase the maximum allowed count of synthetic event fields from 16 to 32
in order to allow for larger-than-usual events.

Link: http://lkml.kernel.org/r/20191115091730.9192-1-dedekind1@gmail.com

Reviewed-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 9fa8c9c6 17-Oct-2019 Zhengjun Xing <zhengjun.xing@linux.intel.com>

tracing: Fix "gfp_t" format for synthetic events

In the format of synthetic events, the "gfp_t" is shown as "signed:1",
but in fact the "gfp_t" is "unsigned", should be shown as "signed:0".

The issue can be reproduced by the following commands:

echo 'memlatency u64 lat; unsigned int order; gfp_t gfp_flags; int migratetype' > /sys/kernel/debug/tracing/synthetic_events
cat /sys/kernel/debug/tracing/events/synthetic/memlatency/format

name: memlatency
ID: 2233
format:
field:unsigned short common_type; offset:0; size:2; signed:0;
field:unsigned char common_flags; offset:2; size:1; signed:0;
field:unsigned char common_preempt_count; offset:3; size:1; signed:0;
field:int common_pid; offset:4; size:4; signed:1;

field:u64 lat; offset:8; size:8; signed:0;
field:unsigned int order; offset:16; size:4; signed:0;
field:gfp_t gfp_flags; offset:24; size:4; signed:1;
field:int migratetype; offset:32; size:4; signed:1;

print fmt: "lat=%llu, order=%u, gfp_flags=%x, migratetype=%d", REC->lat, REC->order, REC->gfp_flags, REC->migratetype

Link: http://lkml.kernel.org/r/20191018012034.6404-1-zhengjun.xing@linux.intel.com

Reviewed-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Zhengjun Xing <zhengjun.xing@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 17911ff3 11-Oct-2019 Steven Rostedt (VMware) <rostedt@goodmis.org>

tracing: Add locked_down checks to the open calls of files created for tracefs

Added various checks on open tracefs calls to see if tracefs is in lockdown
mode, and if so, to return -EPERM.

Note, the event format files (which are basically standard on all machines)
as well as the enabled_functions file (which shows what is currently being
traced) are not lockde down. Perhaps they should be, but it seems counter
intuitive to lockdown information to help you know if the system has been
modified.

Link: http://lkml.kernel.org/r/CAHk-=wj7fGPKUspr579Cii-w_y60PtRaiDgKuxVtBAMK0VNNkA@mail.gmail.com

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 8530dec6 11-Oct-2019 Steven Rostedt (VMware) <rostedt@goodmis.org>

tracing: Add tracing_check_open_get_tr()

Currently, most files in the tracefs directory test if tracing_disabled is
set. If so, it should return -ENODEV. The tracing_disabled is called when
tracing is found to be broken. Originally it was done in case the ring
buffer was found to be corrupted, and we wanted to prevent reading it from
crashing the kernel. But it's also called if a tracing selftest fails on
boot. It's a one way switch. That is, once it is triggered, tracing is
disabled until reboot.

As most tracefs files can also be used by instances in the tracefs
directory, they need to be carefully done. Each instance has a trace_array
associated to it, and when the instance is removed, the trace_array is
freed. But if an instance is opened with a reference to the trace_array,
then it requires looking up the trace_array to get its ref counter (as there
could be a race with it being deleted and the open itself). Once it is
found, a reference is added to prevent the instance from being removed (and
the trace_array associated with it freed).

Combine the two checks (tracing_disabled and trace_array_get()) into a
single helper function. This will also make it easier to add lockdown to
tracefs later.

Link: http://lkml.kernel.org/r/20191011135458.7399da44@gandalf.local.home

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 17f8607a 01-Sep-2019 Tom Zanussi <zanussi@kernel.org>

tracing: Make sure variable reference alias has correct var_ref_idx

Original changelog from Steve Rostedt (except last sentence which
explains the problem, and the Fixes: tag):

I performed a three way histogram with the following commands:

echo 'irq_lat u64 lat pid_t pid' > synthetic_events
echo 'wake_lat u64 lat u64 irqlat pid_t pid' >> synthetic_events
echo 'hist:keys=common_pid:irqts=common_timestamp.usecs if function == 0xffffffff81200580' > events/timer/hrtimer_start/trigger
echo 'hist:keys=common_pid:lat=common_timestamp.usecs-$irqts:onmatch(timer.hrtimer_start).irq_lat($lat,pid) if common_flags & 1' > events/sched/sched_waking/trigger
echo 'hist:keys=pid:wakets=common_timestamp.usecs,irqlat=lat' > events/synthetic/irq_lat/trigger
echo 'hist:keys=next_pid:lat=common_timestamp.usecs-$wakets,irqlat=$irqlat:onmatch(synthetic.irq_lat).wake_lat($lat,$irqlat,next_pid)' > events/sched/sched_switch/trigger
echo 1 > events/synthetic/wake_lat/enable

Basically I wanted to see:

hrtimer_start (calling function tick_sched_timer)

Note:

# grep tick_sched_timer /proc/kallsyms
ffffffff81200580 t tick_sched_timer

And save the time of that, and then record sched_waking if it is called
in interrupt context and with the same pid as the hrtimer_start, it
will record the latency between that and the waking event.

I then look at when the task that is woken is scheduled in, and record
the latency between the wakeup and the task running.

At the end, the wake_lat synthetic event will show the wakeup to
scheduled latency, as well as the irq latency in from hritmer_start to
the wakeup. The problem is that I found this:

<idle>-0 [007] d... 190.485261: wake_lat: lat=27 irqlat=190485230 pid=698
<idle>-0 [005] d... 190.485283: wake_lat: lat=40 irqlat=190485239 pid=10
<idle>-0 [002] d... 190.488327: wake_lat: lat=56 irqlat=190488266 pid=335
<idle>-0 [005] d... 190.489330: wake_lat: lat=64 irqlat=190489262 pid=10
<idle>-0 [003] d... 190.490312: wake_lat: lat=43 irqlat=190490265 pid=77
<idle>-0 [005] d... 190.493322: wake_lat: lat=54 irqlat=190493262 pid=10
<idle>-0 [005] d... 190.497305: wake_lat: lat=35 irqlat=190497267 pid=10
<idle>-0 [005] d... 190.501319: wake_lat: lat=50 irqlat=190501264 pid=10

The irqlat seemed quite large! Investigating this further, if I had
enabled the irq_lat synthetic event, I noticed this:

<idle>-0 [002] d.s. 249.429308: irq_lat: lat=164968 pid=335
<idle>-0 [002] d... 249.429369: wake_lat: lat=55 irqlat=249429308 pid=335

Notice that the timestamp of the irq_lat "249.429308" is awfully
similar to the reported irqlat variable. In fact, all instances were
like this. It appeared that:

irqlat=$irqlat

Wasn't assigning the old $irqlat to the new irqlat variable, but
instead was assigning the $irqts to it.

The issue is that assigning the old $irqlat to the new irqlat variable
creates a variable reference alias, but the alias creation code
forgets to make sure the alias uses the same var_ref_idx to access the
reference.

Link: http://lkml.kernel.org/r/1567375321.5282.12.camel@kernel.org

Cc: Linux Trace Devel <linux-trace-devel@vger.kernel.org>
Cc: linux-rt-users <linux-rt-users@vger.kernel.org>
Cc: stable@vger.kernel.org
Fixes: 7e8b88a30b085 ("tracing: Add hist trigger support for variable reference aliases")
Reported-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# ac681546 11-Jul-2019 Zhengjun Xing <zhengjun.xing@linux.intel.com>

tracing: Add "gfp_t" support in synthetic_events

Add "gfp_t" support in synthetic_events, then the "gfp_t" type
parameter in some functions can be traced.

Prints the gfp flags as hex in addition to the human-readable flag
string. Example output:

whoopsie-630 [000] ...1 78.969452: testevent: bar=b20 (GFP_ATOMIC|__GFP_ZERO)
rcuc/0-11 [000] ...1 81.097555: testevent: bar=a20 (GFP_ATOMIC)
rcuc/0-11 [000] ...1 81.583123: testevent: bar=a20 (GFP_ATOMIC)

Link: http://lkml.kernel.org/r/20190712015308.9908-1-zhengjun.xing@linux.intel.com

Signed-off-by: Zhengjun Xing <zhengjun.xing@linux.intel.com>
[ Added printing of flag names ]
Signed-off-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 30199137 19-Jun-2019 Masami Hiramatsu <mhiramat@kernel.org>

tracing/dynevent: Pass extra arguments to match operation

Pass extra arguments to match operation for checking
exact match. If the event doesn't support exact match,
it will be ignored.

Link: http://lkml.kernel.org/r/156095685930.28024.10405547027475590975.stgit@devnote2

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 9b2ca371 18-Apr-2019 Tom Zanussi <tom.zanussi@linux.intel.com>

tracing: Add a check_val() check before updating cond_snapshot() track_val

Without this check a snapshot is taken whenever a bucket's max is hit,
rather than only when the global max is hit, as it should be.

Before:

In this example, we do a first run of the workload (cyclictest),
examine the output, note the max ('triggering value') (347), then do
a second run and note the max again.

In this case, the max in the second run (39) is below the max in the
first run, but since we haven't cleared the histogram, the first max
is still in the histogram and is higher than any other max, so it
should still be the max for the snapshot. It isn't however - the
value should still be 347 after the second run.

# echo 'hist:keys=pid:ts0=common_timestamp.usecs if comm=="cyclictest"' >> /sys/kernel/debug/tracing/events/sched/sched_waking/trigger
# echo 'hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0:onmax($wakeup_lat).save(next_prio,next_comm,prev_pid,prev_prio,prev_comm):onmax($wakeup_lat).snapshot() if next_comm=="cyclictest"' >> /sys/kernel/debug/tracing/events/sched/sched_switch/trigger

# cyclictest -p 80 -n -s -t 2 -D 2

# cat /sys/kernel/debug/tracing/events/sched/sched_switch/hist

{ next_pid: 2143 } hitcount: 199
max: 44 next_prio: 120 next_comm: cyclictest
prev_pid: 0 prev_prio: 120 prev_comm: swapper/4

{ next_pid: 2145 } hitcount: 1325
max: 38 next_prio: 19 next_comm: cyclictest
prev_pid: 0 prev_prio: 120 prev_comm: swapper/2

{ next_pid: 2144 } hitcount: 1982
max: 347 next_prio: 19 next_comm: cyclictest
prev_pid: 0 prev_prio: 120 prev_comm: swapper/6

Snapshot taken (see tracing/snapshot). Details:
triggering value { onmax($wakeup_lat) }: 347
triggered by event with key: { next_pid: 2144 }

# cyclictest -p 80 -n -s -t 2 -D 2

# cat /sys/kernel/debug/tracing/events/sched/sched_switch/hist

{ next_pid: 2143 } hitcount: 199
max: 44 next_prio: 120 next_comm: cyclictest
prev_pid: 0 prev_prio: 120 prev_comm: swapper/4

{ next_pid: 2148 } hitcount: 199
max: 16 next_prio: 120 next_comm: cyclictest
prev_pid: 0 prev_prio: 120 prev_comm: swapper/1

{ next_pid: 2145 } hitcount: 1325
max: 38 next_prio: 19 next_comm: cyclictest
prev_pid: 0 prev_prio: 120 prev_comm: swapper/2

{ next_pid: 2150 } hitcount: 1326
max: 39 next_prio: 19 next_comm: cyclictest
prev_pid: 0 prev_prio: 120 prev_comm: swapper/4

{ next_pid: 2144 } hitcount: 1982
max: 347 next_prio: 19 next_comm: cyclictest
prev_pid: 0 prev_prio: 120 prev_comm: swapper/6

{ next_pid: 2149 } hitcount: 1983
max: 130 next_prio: 19 next_comm: cyclictest
prev_pid: 0 prev_prio: 120 prev_comm: swapper/0

Snapshot taken (see tracing/snapshot). Details:
triggering value { onmax($wakeup_lat) }: 39
triggered by event with key: { next_pid: 2150 }

After:

In this example, we do a first run of the workload (cyclictest),
examine the output, note the max ('triggering value') (375), then do
a second run and note the max again.

In this case, the max in the second run is still 375, the highest in
any bucket, as it should be.

# cyclictest -p 80 -n -s -t 2 -D 2

# cat /sys/kernel/debug/tracing/events/sched/sched_switch/hist

{ next_pid: 2072 } hitcount: 200
max: 28 next_prio: 120 next_comm: cyclictest
prev_pid: 0 prev_prio: 120 prev_comm: swapper/5

{ next_pid: 2074 } hitcount: 1323
max: 375 next_prio: 19 next_comm: cyclictest
prev_pid: 0 prev_prio: 120 prev_comm: swapper/2

{ next_pid: 2073 } hitcount: 1980
max: 153 next_prio: 19 next_comm: cyclictest
prev_pid: 0 prev_prio: 120 prev_comm: swapper/6

Snapshot taken (see tracing/snapshot). Details:
triggering value { onmax($wakeup_lat) }: 375
triggered by event with key: { next_pid: 2074 }

# cyclictest -p 80 -n -s -t 2 -D 2

# cat /sys/kernel/debug/tracing/events/sched/sched_switch/hist

{ next_pid: 2101 } hitcount: 199
max: 49 next_prio: 120 next_comm: cyclictest
prev_pid: 0 prev_prio: 120 prev_comm: swapper/6

{ next_pid: 2072 } hitcount: 200
max: 28 next_prio: 120 next_comm: cyclictest
prev_pid: 0 prev_prio: 120 prev_comm: swapper/5

{ next_pid: 2074 } hitcount: 1323
max: 375 next_prio: 19 next_comm: cyclictest
prev_pid: 0 prev_prio: 120 prev_comm: swapper/2

{ next_pid: 2103 } hitcount: 1325
max: 74 next_prio: 19 next_comm: cyclictest
prev_pid: 0 prev_prio: 120 prev_comm: swapper/4

{ next_pid: 2073 } hitcount: 1980
max: 153 next_prio: 19 next_comm: cyclictest
prev_pid: 0 prev_prio: 120 prev_comm: swapper/6

{ next_pid: 2102 } hitcount: 1981
max: 84 next_prio: 19 next_comm: cyclictest
prev_pid: 12 prev_prio: 120 prev_comm: kworker/0:1

Snapshot taken (see tracing/snapshot). Details:
triggering value { onmax($wakeup_lat) }: 375
triggered by event with key: { next_pid: 2074 }

Link: http://lkml.kernel.org/r/95958351329f129c07504b4d1769c47a97b70d65.1555597045.git.tom.zanussi@linux.intel.com

Cc: stable@vger.kernel.org
Fixes: a3785b7eca8fd ("tracing: Add hist trigger snapshot() action")
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# c8d94a18 18-Apr-2019 Tom Zanussi <tom.zanussi@linux.intel.com>

tracing: Check keys for variable references in expressions too

There's an existing check for variable references in keys, but it
doesn't go far enough. It checks whether a key field is a variable
reference but doesn't check whether it's an expression containing
variable references, which can cause the same problems for callers.

Use the existing field_has_hist_vars() function rather than a direct
top-level flag check to catch all possible variable references.

Link: http://lkml.kernel.org/r/e8c3d3d53db5ca90ceea5a46e5413103a6902fc7.1555597045.git.tom.zanussi@linux.intel.com

Cc: stable@vger.kernel.org
Fixes: 067fe038e70f6 ("tracing: Add variable reference handling to hist triggers")
Reported-by: Vincent Bernat <vincent@bernat.ch>
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 55267c88 18-Apr-2019 Tom Zanussi <tom.zanussi@linux.intel.com>

tracing: Prevent hist_field_var_ref() from accessing NULL tracing_map_elts

hist_field_var_ref() is an implementation of hist_field_fn_t(), which
can be called with a null tracing_map_elt elt param when assembling a
key in event_hist_trigger().

In the case of hist_field_var_ref() this doesn't make sense, because a
variable can only be resolved by looking it up using an already
assembled key i.e. a variable can't be used to assemble a key since
the key is required in order to access the variable.

Upper layers should prevent the user from constructing a key using a
variable in the first place, but in case one slips through, it
shouldn't cause a NULL pointer dereference. Also if one does slip
through, we want to know about it, so emit a one-time warning in that
case.

Link: http://lkml.kernel.org/r/64ec8dc15c14d305295b64cdfcc6b2b9dd14753f.1555597045.git.tom.zanussi@linux.intel.com

Reported-by: Vincent Bernat <vincent@bernat.ch>
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# e7d91663 25-Apr-2019 Thomas Gleixner <tglx@linutronix.de>

tracing: Simplify stacktrace retrieval in histograms

The indirection through struct stack_trace is not necessary at all. Use the
storage array based interface.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Reviewed-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Alexander Potapenko <glider@google.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: linux-mm@kvack.org
Cc: David Rientjes <rientjes@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: kasan-dev@googlegroups.com
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: iommu@lists.linux-foundation.org
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: David Sterba <dsterba@suse.com>
Cc: Chris Mason <clm@fb.com>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: linux-btrfs@vger.kernel.org
Cc: dm-devel@redhat.com
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: Alasdair Kergon <agk@redhat.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: intel-gfx@lists.freedesktop.org
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: dri-devel@lists.freedesktop.org
Cc: David Airlie <airlied@linux.ie>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Miroslav Benes <mbenes@suse.cz>
Cc: linux-arch@vger.kernel.org
Link: https://lkml.kernel.org/r/20190425094802.979089273@linutronix.de


# 4285f2fc 09-Apr-2019 Thomas Gleixner <tglx@linutronix.de>

tracing: Remove the ULONG_MAX stack trace hackery

No architecture terminates the stack trace with ULONG_MAX anymore. As the
code checks the number of entries stored anyway there is no point in
keeping all that ULONG_MAX magic around.

The histogram code zeroes the storage before saving the stack, so if the
trace is shorter than the maximum number of entries it can terminate the
print loop if a zero entry is detected.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Alexander Potapenko <glider@google.com>
Link: https://lkml.kernel.org/r/20190410103645.048761764@linutronix.de


# 2f754e77 01-Apr-2019 Steven Rostedt (VMware) <rostedt@goodmis.org>

tracing: Have the error logs show up in the proper instances

As each instance has their own error_log file, it makes more sense that the
instances show the errors of their own instead of all error_logs having the
same data. Make it that the errors show up in the instance error_log file
that the error happens in. If no instance trace_array is available, then
NULL can be passed in which will create the error in the top level instance
(the one at the top of the tracefs directory).

Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Tested-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# d0cd871b 01-Apr-2019 Steven Rostedt (VMware) <rostedt@goodmis.org>

tracing: Have histogram code pass around trace_array for error handling

Have the trace_array that associates the trace instance of the histogram
passed around to functions so that error handling can display the error
message in the proper instance.

Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Tested-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# d566c5e9 31-Mar-2019 Tom Zanussi <tom.zanussi@linux.intel.com>

tracing: Use tracing error_log with hist triggers

Replace hist_err() and hist_err_event() with tracing_log_err() from
the new tracing error_log mechanism.

Also add a couple related helper functions and remove most of the old
hist_err()-related code.

With this change, users no longer read the hist files for hist trigger
error information, but instead look at tracing/error_log for the same
information.

Link: http://lkml.kernel.org/r/c98f77a97c9715d18b623eeb5741057b330d5ac0.1554072478.git.tom.zanussi@linux.intel.com

Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# a1a05bb4 31-Mar-2019 Tom Zanussi <tom.zanussi@linux.intel.com>

tracing: Save the last hist command's associated event name

In preparation for making use of the new trace error log, save the
subsystem and event name associated with the last hist command - it
will be passed as the location param in the event_log_err() calls.

Link: http://lkml.kernel.org/r/eb0fd1362be8f39facb86c83eecf441b7a5876f8.1554072478.git.tom.zanussi@linux.intel.com

Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# ff9d31d0 19-Mar-2019 Tom Zanussi <tom.zanussi@linux.intel.com>

tracing: Remove unnecessary var_ref destroy in track_data_destroy()

Commit 656fe2ba85e8 (tracing: Use hist trigger's var_ref array to
destroy var_refs) centralized the destruction of all the var_refs
in one place so that other code didn't have to do it.

The track_data_destroy() added later ignored that and also destroyed
the track_data var_ref, causing a double-free error flagged by KASAN.

==================================================================
BUG: KASAN: use-after-free in destroy_hist_field+0x30/0x70
Read of size 8 at addr ffff888086df2210 by task bash/1694

CPU: 6 PID: 1694 Comm: bash Not tainted 5.1.0-rc1-test+ #15
Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v03.03
07/14/2016
Call Trace:
dump_stack+0x71/0xa0
? destroy_hist_field+0x30/0x70
print_address_description.cold.3+0x9/0x1fb
? destroy_hist_field+0x30/0x70
? destroy_hist_field+0x30/0x70
kasan_report.cold.4+0x1a/0x33
? __kasan_slab_free+0x100/0x150
? destroy_hist_field+0x30/0x70
destroy_hist_field+0x30/0x70
track_data_destroy+0x55/0xe0
destroy_hist_data+0x1f0/0x350
hist_unreg_all+0x203/0x220
event_trigger_open+0xbb/0x130
do_dentry_open+0x296/0x700
? stacktrace_count_trigger+0x30/0x30
? generic_permission+0x56/0x200
? __x64_sys_fchdir+0xd0/0xd0
? inode_permission+0x55/0x200
? security_inode_permission+0x18/0x60
path_openat+0x633/0x22b0
? path_lookupat.isra.50+0x420/0x420
? __kasan_kmalloc.constprop.12+0xc1/0xd0
? kmem_cache_alloc+0xe5/0x260
? getname_flags+0x6c/0x2a0
? do_sys_open+0x149/0x2b0
? do_syscall_64+0x73/0x1b0
? entry_SYSCALL_64_after_hwframe+0x44/0xa9
? _raw_write_lock_bh+0xe0/0xe0
? __kernel_text_address+0xe/0x30
? unwind_get_return_address+0x2f/0x50
? __list_add_valid+0x2d/0x70
? deactivate_slab.isra.62+0x1f4/0x5a0
? getname_flags+0x6c/0x2a0
? set_track+0x76/0x120
do_filp_open+0x11a/0x1a0
? may_open_dev+0x50/0x50
? _raw_spin_lock+0x7a/0xd0
? _raw_write_lock_bh+0xe0/0xe0
? __alloc_fd+0x10f/0x200
do_sys_open+0x1db/0x2b0
? filp_open+0x50/0x50
do_syscall_64+0x73/0x1b0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7fa7b24a4ca2
Code: 25 00 00 41 00 3d 00 00 41 00 74 4c 48 8d 05 85 7a 0d 00 8b 00 85 c0
75 6d 89 f2 b8 01 01 00 00 48 89 fe bf 9c ff ff ff 0f 05 <48> 3d 00 f0 ff ff
0f 87 a2 00 00 00 48 8b 4c 24 28 64 48 33 0c 25
RSP: 002b:00007fffbafb3af0 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
RAX: ffffffffffffffda RBX: 000055d3648ade30 RCX: 00007fa7b24a4ca2
RDX: 0000000000000241 RSI: 000055d364a55240 RDI: 00000000ffffff9c
RBP: 00007fffbafb3bf0 R08: 0000000000000020 R09: 0000000000000002
R10: 00000000000001b6 R11: 0000000000000246 R12: 0000000000000000
R13: 0000000000000003 R14: 0000000000000001 R15: 000055d364a55240
==================================================================

So remove the track_data_destroy() destroy_hist_field() call for that
var_ref.

Link: http://lkml.kernel.org/r/1deffec420f6a16d11dd8647318d34a66d1989a9.camel@linux.intel.com

Fixes: 466f4528fbc69 ("tracing: Generalize hist trigger onmax and save action")
Reported-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 27242c62 05-Mar-2019 Tom Zanussi <tom.zanussi@linux.intel.com>

tracing: Use strncpy instead of memcpy when copying comm for hist triggers

Because there may be random garbage beyond a string's null terminator,
code that might use the entire comm array e.g. histogram keys, can
give unexpected results if that garbage is copied in too, so avoid
that possibility by using strncpy instead of memcpy.

Link: http://lkml.kernel.org/r/1eb9f096a8086c3c82c7fc087c900005143cec54.1551802084.git.tom.zanussi@linux.intel.com

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 9f0bbf31 04-Feb-2019 Tom Zanussi <tom.zanussi@linux.intel.com>

tracing: Use strncpy instead of memcpy for string keys in hist triggers

Because there may be random garbage beyond a string's null terminator,
it's not correct to copy the the complete character array for use as a
hist trigger key. This results in multiple histogram entries for the
'same' string key.

So, in the case of a string key, use strncpy instead of memcpy to
avoid copying in the extra bytes.

Before, using the gdbus entries in the following hist trigger as an
example:

# echo 'hist:key=comm' > /sys/kernel/debug/tracing/events/sched/sched_waking/trigger
# cat /sys/kernel/debug/tracing/events/sched/sched_waking/hist

...

{ comm: ImgDecoder #4 } hitcount: 203
{ comm: gmain } hitcount: 213
{ comm: gmain } hitcount: 216
{ comm: StreamTrans #73 } hitcount: 221
{ comm: mozStorage #3 } hitcount: 230
{ comm: gdbus } hitcount: 233
{ comm: StyleThread#5 } hitcount: 253
{ comm: gdbus } hitcount: 256
{ comm: gdbus } hitcount: 260
{ comm: StyleThread#4 } hitcount: 271

...

# cat /sys/kernel/debug/tracing/events/sched/sched_waking/hist | egrep gdbus | wc -l
51

After:

# cat /sys/kernel/debug/tracing/events/sched/sched_waking/hist | egrep gdbus | wc -l
1

Link: http://lkml.kernel.org/r/50c35ae1267d64eee975b8125e151e600071d4dc.1549309756.git.tom.zanussi@linux.intel.com

Cc: Namhyung Kim <namhyung@kernel.org>
Cc: stable@vger.kernel.org
Fixes: 79e577cbce4c4 ("tracing: Support string type key properly")
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# ed581aaf 04-Feb-2019 Tom Zanussi <tom.zanussi@linux.intel.com>

tracing: Use str_has_prefix() in synth_event_create()

Since we now have a str_has_prefix() that returns the length, we can
use that instead of explicitly calculating it.

Link: http://lkml.kernel.org/r/03418373fd1e80030e7394b8e3e081c5de28a710.1549309756.git.tom.zanussi@linux.intel.com

Cc: Joe Perches <joe@perches.com>
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# e91eefd7 13-Feb-2019 Tom Zanussi <tom.zanussi@linux.intel.com>

tracing: Add alternative synthetic event trace action syntax

Add a 'trace(synthetic_event_name, params)' alternative to
synthetic_event_name(params).

Currently, the syntax used for generating synthetic events is to
invoke synthetic_event_name(params) i.e. use the synthetic event name
as a function call.

Users requested a new form that more explicitly shows that the
synthetic event is in effect being traced. In this version, a new
'trace()' keyword is used, and the synthetic event name is passed in
as the first argument.

In addition, for the sake of consistency with other actions, change
the documention to emphasize the trace() form over the function-call
form, which remains documented as equivalent.

Link: http://lkml.kernel.org/r/d082773e50232a001480cf837679a1e01c1a2eb7.1550100284.git.tom.zanussi@linux.intel.com

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# dff81f55 13-Feb-2019 Tom Zanussi <tom.zanussi@linux.intel.com>

tracing: Add hist trigger onchange() handler

Add support for a hist:onchange($var) handler, similar to the onmax()
handler but triggering whenever there's any change in $var, not just a
max.

Link: http://lkml.kernel.org/r/dfbc7e4ada242603e9ec3f049b5ad076a07dfd03.1550100284.git.tom.zanussi@linux.intel.com

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# a3785b7e 13-Feb-2019 Tom Zanussi <tom.zanussi@linux.intel.com>

tracing: Add hist trigger snapshot() action

Add support for hist:handlerXXX($var).snapshot(), which will take a
snapshot of the current trace buffer whenever handlerXXX is hit.

As a first user, this also adds snapshot() action support for the
onmax() handler i.e. hist:onmax($var).snapshot().

Also, the hist trigger key printing is moved into a separate function
so the snapshot() action can print a histogram key outside the
histogram display - add and use hist_trigger_print_key() for that
purpose.

Link: http://lkml.kernel.org/r/2f1a952c0dcd8aca8702ce81269581a692396d45.1550100284.git.tom.zanussi@linux.intel.com

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 466f4528 13-Feb-2019 Tom Zanussi <tom.zanussi@linux.intel.com>

tracing: Generalize hist trigger onmax and save action

The action refactor code allowed actions and handlers to be separated,
but the existing onmax handler and save action code is still not
flexible enough to handle arbitrary coupling. This change generalizes
them and in the process makes additional handlers and actions easier
to implement.

The onmax action can be broken up and thought of as two separate
components - a variable to be tracked (the parameter given to the
onmax($var_to_track) function) and an invisible variable created to
save the ongoing result of doing something with that variable, such as
saving the max value of that variable so far seen.

Separating it out like this and renaming it appropriately allows us to
use the same code for similar tracking functions such as
onchange($var_to_track), which would just track the last value seen
rather than the max seen so far, which is useful in some situations.

Additionally, because different handlers and actions may want to save
and access data differently e.g. save and retrieve tracking values as
local variables vs something more global, save_val() and get_val()
interface functions are introduced and max-specific implementations
are used instead.

The same goes for the code that checks whether a maximum has been hit
- a generic check_val() interface and max-checking implementation is
used instead, which allows future patches to make use of he same code
using their own implemetations of similar functionality.

Link: http://lkml.kernel.org/r/980ea73dd8e3f36db3d646f99652f8fed42b77d4.1550100284.git.tom.zanussi@linux.intel.com

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# c3e49506 13-Feb-2019 Tom Zanussi <tom.zanussi@linux.intel.com>

tracing: Split up onmatch action data

Currently, the onmatch action data binds the onmatch action to data
related to synthetic event generation. Since we want to allow the
onmatch handler to potentially invoke a different action, and because
we expect other handlers to generate synthetic events, we need to
separate the data related to these two functions.

Also rename the onmatch data to something more descriptive, and create
and use common action data destroy function.

Link: http://lkml.kernel.org/r/b9abbf9aae69fe3920cdc8ddbcaad544dd258d78.1550100284.git.tom.zanussi@linux.intel.com

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 7d18a10c 13-Feb-2019 Tom Zanussi <tom.zanussi@linux.intel.com>

tracing: Refactor hist trigger action code

The hist trigger action code currently implements two essentially
hard-coded pairs of 'actions' - onmax(), which tracks a variable and
saves some event fields when a max is hit, and onmatch(), which is
hard-coded to generate a synthetic event.

These hardcoded pairs (track max/save fields and detect match/generate
synthetic event) should really be decoupled into separate components
that can then be arbitrarily combined. The first component of each
pair (track max/detect match) is called a 'handler' in the new code,
while the second component (save fields/generate synthetic event) is
called an 'action' in this scheme.

This change refactors the action code to reflect this split by adding
two handlers, HANDLER_ONMATCH and HANDLER_ONMAX, along with two
actions, ACTION_SAVE and ACTION_TRACE.

The new code combines them to produce the existing ONMATCH/TRACE and
ONMAX/SAVE functionality, but doesn't implement the other combinations
now possible. Future patches will expand these to further useful
cases, such as ONMAX/TRACE, as well as add additional handlers and
actions such as ONCHANGE and SNAPSHOT.

Also, add abbreviated documentation for handlers and actions to
README.

Link: http://lkml.kernel.org/r/98bfdd48c1b4ff29fc5766442f99f5bc3c34b76b.1550100284.git.tom.zanussi@linux.intel.com

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 036876fa 21-Dec-2018 Steven Rostedt (VMware) <rostedt@goodmis.org>

tracing: Have the historgram use the result of str_has_prefix() for len of prefix

As str_has_prefix() returns the length on match, we can use that for the
updating of the string pointer instead of recalculating the prefix size.

Cc: Tom Zanussi <zanussi@kernel.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# b6b27355 20-Dec-2018 Steven Rostedt (VMware) <rostedt@goodmis.org>

tracing: Use str_has_prefix() instead of using fixed sizes

There are several instances of strncmp(str, "const", 123), where 123 is the
strlen of the const string to check if "const" is the prefix of str. But
this can be error prone. Use str_has_prefix() instead.

Acked-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 754481e6 19-Dec-2018 Steven Rostedt (VMware) <rostedt@goodmis.org>

tracing: Use str_has_prefix() helper for histogram code

The tracing histogram code contains a lot of instances of the construct:

strncmp(str, "const", sizeof("const") - 1)

This can be prone to bugs due to typos or bad cut and paste. Use the
str_has_prefix() helper macro instead that removes the need for having two
copies of the constant string.

Cc: Tom Zanussi <tom.zanussi@linux.intel.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 05ddb25c 18-Dec-2018 Tom Zanussi <tom.zanussi@linux.intel.com>

tracing: Add hist trigger comments for variable-related fields

Add a few comments to help clarify how variable and variable reference
fields are used in the code.

Link: http://lkml.kernel.org/r/ea857ce948531d7bec712bbb0f38360aa1d378ec.1545161087.git.tom.zanussi@linux.intel.com

Acked-by: Namhyung Kim <namhyung@kernel.org>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 91220134 18-Dec-2018 Tom Zanussi <tom.zanussi@linux.intel.com>

tracing: Remove hist trigger synth_var_refs

All var_refs are now handled uniformly and there's no reason to treat
the synth_refs in a special way now, so remove them and associated
functions.

Link: http://lkml.kernel.org/r/b4d3470526b8f0426dcec125399dad9ad9b8589d.1545161087.git.tom.zanussi@linux.intel.com

Acked-by: Namhyung Kim <namhyung@kernel.org>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 656fe2ba 18-Dec-2018 Tom Zanussi <tom.zanussi@linux.intel.com>

tracing: Use hist trigger's var_ref array to destroy var_refs

Since every var ref for a trigger has an entry in the var_ref[] array,
use that to destroy the var_refs, instead of piecemeal via the field
expressions.

This allows us to avoid having to keep and treat differently separate
lists for the action-related references, which future patches will
remove.

Link: http://lkml.kernel.org/r/fad1a164f0e257c158e70d6eadbf6c586e04b2a2.1545161087.git.tom.zanussi@linux.intel.com

Acked-by: Namhyung Kim <namhyung@kernel.org>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# de40f033 18-Dec-2018 Tom Zanussi <tom.zanussi@linux.intel.com>

tracing: Remove open-coding of hist trigger var_ref management

Have create_var_ref() manage the hist trigger's var_ref list, rather
than having similar code doing it in multiple places. This cleans up
the code and makes sure var_refs are always accounted properly.

Also, document the var_ref-related functions to make what their
purpose clearer.

Link: http://lkml.kernel.org/r/05ddae93ff514e66fc03897d6665231892939913.1545161087.git.tom.zanussi@linux.intel.com

Acked-by: Namhyung Kim <namhyung@kernel.org>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# e4f6d245 19-Dec-2018 Tom Zanussi <tzanussi@gmail.com>

tracing: Use var_refs[] for hist trigger reference checking

Since all the variable reference hist_fields are collected into
hist_data->var_refs[] array, there's no need to go through all the
fields looking for them, or in separate arrays like synth_var_refs[],
which will be going away soon anyway.

This also allows us to get rid of some unnecessary code and functions
currently used for the same purpose.

Link: http://lkml.kernel.org/r/1545246556.4239.7.camel@gmail.com

Acked-by: Namhyung Kim <namhyung@kernel.org>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 2f31ed93 18-Dec-2018 Tom Zanussi <tom.zanussi@linux.intel.com>

tracing: Change strlen to sizeof for hist trigger static strings

There's no need to use strlen() for static strings when the length is
already known, so update trace_events_hist.c with sizeof() for those
cases.

Link: http://lkml.kernel.org/r/e3e754f2bd18e56eaa8baf79bee619316ebf4cfc.1545161087.git.tom.zanussi@linux.intel.com

Acked-by: Namhyung Kim <namhyung@kernel.org>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 6801f0d5 18-Dec-2018 Tom Zanussi <tom.zanussi@linux.intel.com>

tracing: Remove unnecessary hist trigger struct field

hist_field.var_idx is completely unused, so remove it.

Link: http://lkml.kernel.org/r/d4e066c0f509f5f13ad3babc8c33ca6e7ddc439a.1545161087.git.tom.zanussi@linux.intel.com

Acked-by: Namhyung Kim <namhyung@kernel.org>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 7e1413ed 04-Dec-2018 Steven Rostedt (VMware) <rostedt@goodmis.org>

tracing: Consolidate trace_add/remove_event_call back to the nolock functions

The trace_add/remove_event_call_nolock() functions were added to allow
the tace_add/remove_event_call() code be called when the event_mutex
lock was already taken. Now that all callers are done within the
event_mutex, there's no reason to have two different interfaces.

Remove the current wrapper trace_add/remove_event_call()s and rename the
_nolock versions back to the original names.

Link: http://lkml.kernel.org/r/154140866955.17322.2081425494660638846.stgit@devbox

Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 0e2b81f7 05-Nov-2018 Masami Hiramatsu <mhiramat@kernel.org>

tracing: Remove unneeded synth_event_mutex

Rmove unneeded synth_event_mutex. This mutex protects the reference
count in synth_event, however, those operational points are already
protected by event_mutex.

1. In __create_synth_event() and create_or_delete_synth_event(),
those synth_event_mutex clearly obtained right after event_mutex.

2. event_hist_trigger_func() is trigger_hist_cmd.func() which is
called by trigger_process_regex(), which is a part of
event_trigger_regex_write() and this function takes event_mutex.

3. hist_unreg_all() is trigger_hist_cmd.unreg_all() which is called
by event_trigger_regex_open() and it takes event_mutex.

4. onmatch_destroy() and onmatch_create() have long call tree,
but both are finally invoked from event_trigger_regex_write()
and event_trace_del_tracer(), former takes event_mutex, and latter
ensures called under event_mutex locked.

Finally, I ensured there is no resource conflict. For safety,
I added lockdep_assert_held(&event_mutex) for each function.

Link: http://lkml.kernel.org/r/154140864134.17322.4796059721306031894.stgit@devbox

Reviewed-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Tested-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 7bbab38d 05-Nov-2018 Masami Hiramatsu <mhiramat@kernel.org>

tracing: Use dyn_event framework for synthetic events

Use dyn_event framework for synthetic events. This shows
synthetic events on "tracing/dynamic_events" file in addition
to tracing/synthetic_events interface.

User can also define new events via tracing/dynamic_events
with "s:" prefix. So, the new syntax is below;

s:[synthetic/]EVENT_NAME TYPE ARG; [TYPE ARG;]...

To remove events via tracing/dynamic_events, you can use
"-:" prefix as same as other events.

Link: http://lkml.kernel.org/r/154140861301.17322.15454611233735614508.stgit@devbox

Reviewed-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Tested-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# faacb361 05-Nov-2018 Masami Hiramatsu <mhiramat@kernel.org>

tracing: Simplify creation and deletion of synthetic events

Since the event_mutex and synth_event_mutex ordering issue
is gone, we can skip existing event check when adding or
deleting events, and some redundant code in error path.

This changes release_all_synth_events() to abort the process
when it hits any error and returns the error code. It succeeds
only if it has no error.

Link: http://lkml.kernel.org/r/154140847194.17322.17960275728005067803.stgit@devbox

Reviewed-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Tested-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# fc800a10 05-Nov-2018 Masami Hiramatsu <mhiramat@kernel.org>

tracing: Lock event_mutex before synth_event_mutex

synthetic event is using synth_event_mutex for protecting
synth_event_list, and event_trigger_write() path acquires
locks as below order.

event_trigger_write(event_mutex)
->trigger_process_regex(trigger_cmd_mutex)
->event_hist_trigger_func(synth_event_mutex)

On the other hand, synthetic event creation and deletion paths
call trace_add_event_call() and trace_remove_event_call()
which acquires event_mutex. In that case, if we keep the
synth_event_mutex locked while registering/unregistering synthetic
events, its dependency will be inversed.

To avoid this issue, current synthetic event is using a 2 phase
process to create/delete events. For example, it searches existing
events under synth_event_mutex to check for event-name conflicts, and
unlocks synth_event_mutex, then registers a new event under event_mutex
locked. Finally, it locks synth_event_mutex and tries to add the
new event to the list. But it can introduce complexity and a chance
for name conflicts.

To solve this simpler, this introduces trace_add_event_call_nolock()
and trace_remove_event_call_nolock() which don't acquire
event_mutex inside. synthetic event can lock event_mutex before
synth_event_mutex to solve the lock dependency issue simpler.

Link: http://lkml.kernel.org/r/154140844377.17322.13781091165954002713.stgit@devbox

Reviewed-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Tested-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 18858511 21-Oct-2018 Masami Hiramatsu <mhiramat@kernel.org>

tracing: Return -ENOENT if there is no target synthetic event

Return -ENOENT error if there is no target synthetic event.
This notices an operation failure to user as below;

# echo 'wakeup_latency u64 lat; pid_t pid;' > synthetic_events
# echo '!wakeup' >> synthetic_events
sh: write error: No such file or directory

Link: http://lkml.kernel.org/r/154013449986.25576.9487131386597290172.stgit@devbox

Acked-by: Tom Zanussi <zanussi@linux.intel.com>
Tested-by: Tom Zanussi <zanussi@linux.intel.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Rajvi Jingar <rajvi.jingar@intel.com>
Cc: stable@vger.kernel.org
Fixes: 4b147936fa50 ('tracing: Add support for 'synthetic' events')
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# a360d9e4 18-Oct-2018 Masami Hiramatsu <mhiramat@kernel.org>

tracing: Fix synthetic event to allow semicolon at end

Fix synthetic event to allow independent semicolon at end.

The synthetic_events interface accepts a semicolon after the
last word if there is no space.

# echo "myevent u64 var;" >> synthetic_events

But if there is a space, it returns an error.

# echo "myevent u64 var ;" > synthetic_events
sh: write error: Invalid argument

This behavior is difficult for users to understand. Let's
allow the last independent semicolon too.

Link: http://lkml.kernel.org/r/153986835420.18251.2191216690677025744.stgit@devbox

Cc: Shuah Khan <shuah@kernel.org>
Cc: Tom Zanussi <tom.zanussi@linux.intel.com>
Cc: stable@vger.kernel.org
Fixes: commit 4b147936fa50 ("tracing: Add support for 'synthetic' events")
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 282447ba 18-Oct-2018 Masami Hiramatsu <mhiramat@kernel.org>

tracing: Fix synthetic event to accept unsigned modifier

Fix synthetic event to accept unsigned modifier for its field type
correctly.

Currently, synthetic_events interface returns error for "unsigned"
modifiers as below;

# echo "myevent unsigned long var" >> synthetic_events
sh: write error: Invalid argument

This is because argv_split() breaks "unsigned long" into "unsigned"
and "long", but parse_synth_field() doesn't expected it.

With this fix, synthetic_events can handle the "unsigned long"
correctly like as below;

# echo "myevent unsigned long var" >> synthetic_events
# cat synthetic_events
myevent unsigned long var

Link: http://lkml.kernel.org/r/153986832571.18251.8448135724590496531.stgit@devbox

Cc: Shuah Khan <shuah@kernel.org>
Cc: Tom Zanussi <tom.zanussi@linux.intel.com>
Cc: stable@vger.kernel.org
Fixes: commit 4b147936fa50 ("tracing: Add support for 'synthetic' events")
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# bcea3f96 16-Aug-2018 Steven Rostedt (VMware) <rostedt@goodmis.org>

tracing: Add SPDX License format tags to tracing files

Add the SPDX License header to ease license compliance management.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# e0a568dc 09-Aug-2018 Steven Rostedt (VMware) <rostedt@goodmis.org>

tracing: Fix synchronizing to event changes with tracepoint_synchronize_unregister()

Now that some trace events can be protected by srcu_read_lock(tracepoint_srcu),
we need to make sure all locations that depend on this are also protected.
There were many places that did a synchronize_sched() thinking that it was
enough to protect againts access to trace events. This use to be the case,
but now that we use SRCU for _rcuidle() trace events, they may not be
protected by synchronize_sched(), as they may be called in paths that RCU is
not watching for preempt disable.

Fixes: e6753f23d961d ("tracepoint: Make rcuidle tracepoint callers use SRCU")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# cf4d418e 28-Mar-2018 Arnd Bergmann <arnd@arndb.de>

tracing: Avoid string overflow

'err' is used as a NUL-terminated string, but using strncpy() with the length
equal to the buffer size may result in lack of the termination:

kernel/trace/trace_events_hist.c: In function 'hist_err_event':
kernel/trace/trace_events_hist.c:396:3: error: 'strncpy' specified bound 256 equals destination size [-Werror=stringop-truncation]
strncpy(err, var, MAX_FILTER_STR_VAL);

This changes it to use the safer strscpy() instead.

Link: http://lkml.kernel.org/r/20180328140920.2842153-1-arnd@arndb.de

Cc: stable@vger.kernel.org
Fixes: f404da6e1d46 ("tracing: Add 'last error' error facility for hist triggers")
Acked-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 3be4c1e5 09-May-2018 Steven Rostedt (VMware) <rostedt@goodmis.org>

tracing: Allow histogram triggers to access ftrace internal events

Now that trace_marker can have triggers, including a histogram triggers, the
onmatch() and onmax() access the trace event. To do so, the search routine
to find the event file needs to use the raw __find_event_file() that does
not filter out ftrace events.

Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# dcf23457 26-Apr-2018 Tom Zanussi <tom.zanussi@linux.intel.com>

tracing: Add field modifier parsing hist error for hist triggers

If the user specifies an invalid field modifier for a hist trigger,
the current code correctly flags that as an error, but doesn't tell
the user what happened.

Fix this by invoking hist_err() with an appropriate message when
invalid modifiers are specified.

Before:

# echo 'hist:keys=pid:ts0=common_timestamp.junkusecs' >> /sys/kernel/debug/tracing/events/sched/sched_wakeup/trigger
-su: echo: write error: Invalid argument
# cat /sys/kernel/debug/tracing/events/sched/sched_wakeup/hist

After:

# echo 'hist:keys=pid:ts0=common_timestamp.junkusecs' >> /sys/kernel/debug/tracing/events/sched/sched_wakeup/trigger
-su: echo: write error: Invalid argument
# cat /sys/kernel/debug/tracing/events/sched/sched_wakeup/hist
ERROR: Invalid field modifier: junkusecs
Last command: keys=pid:ts0=common_timestamp.junkusecs

Link: http://lkml.kernel.org/r/b043c59fa79acd06a5f14a1d44dee9e5a3cd1248.1524790601.git.tom.zanussi@linux.intel.com

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 5ec432d7 26-Apr-2018 Tom Zanussi <tom.zanussi@linux.intel.com>

tracing: Add field parsing hist error for hist triggers

If the user specifies a nonexistent field for a hist trigger, the
current code correctly flags that as an error, but doesn't tell the
user what happened.

Fix this by invoking hist_err() with an appropriate message when
nonexistent fields are specified.

Before:

# echo 'hist:keys=pid:ts0=common_timestamp.usecs' >> /sys/kernel/debug/tracing/events/sched/sched_switch/trigger
-su: echo: write error: Invalid argument
# cat /sys/kernel/debug/tracing/events/sched/sched_switch/hist

After:

# echo 'hist:keys=pid:ts0=common_timestamp.usecs' >> /sys/kernel/debug/tracing/events/sched/sched_switch/trigger
-su: echo: write error: Invalid argument
# cat /sys/kernel/debug/tracing/events/sched/sched_switch/hist
ERROR: Couldn't find field: pid
Last command: keys=pid:ts0=common_timestamp.usecs

Link: http://lkml.kernel.org/r/fdc8746969d16906120f162b99dd71c741e0b62c.1524790601.git.tom.zanussi@linux.intel.com

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Reported-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 608940da 26-Apr-2018 Tom Zanussi <tom.zanussi@linux.intel.com>

tracing: Restore proper field flag printing when displaying triggers

The flag-printing code used when displaying hist triggers somehow got
dropped during refactoring of the inter-event patchset. This restores
it.

Below are a couple examples - in the first case, .usecs wasn't being
displayed properly for common_timestamps and the second illustrates
the same for other flags such as .execname.

Before:

# echo 'hist:key=common_pid.execname:val=count:sort=count' > /sys/kernel/debug/tracing/events/syscalls/sys_enter_read/trigger
# cat /sys/kernel/debug/tracing/events/syscalls/sys_enter_read/trigger
hist:keys=common_pid:vals=hitcount,count:sort=count:size=2048 [active]

# echo 'hist:keys=pid:ts0=common_timestamp.usecs if comm=="cyclictest"' >> /sys/kernel/debug/tracing/events/sched/sched_wakeup/trigger
# cat /sys/kernel/debug/tracing/events/sched/sched_wakeup/trigger
hist:keys=pid:vals=hitcount:ts0=common_timestamp:sort=hitcount:size=2048:clock=global if comm=="cyclictest" [active]

After:

# echo 'hist:key=common_pid.execname:val=count:sort=count' > /sys/kernel/debug/tracing/events/syscalls/sys_enter_read/trigger
# cat /sys/kernel/debug/tracing/events/syscalls/sys_enter_read/trigger
hist:keys=common_pid.execname:vals=hitcount,count:sort=count:size=2048 [active]

# echo 'hist:keys=pid:ts0=common_timestamp.usecs if comm=="cyclictest"' >> /sys/kernel/debug/tracing/events/sched/sched_wakeup/trigger
# cat /sys/kernel/debug/tracing/events/sched/sched_wakeup/trigger
hist:keys=pid:vals=hitcount:ts0=common_timestamp.usecs:sort=hitcount:size=2048:clock=global if comm=="cyclictest" [active]

Link: http://lkml.kernel.org/r/492bab42ff21806600af98a8ea901af10efbee0c.1524790601.git.tom.zanussi@linux.intel.com

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# b28d7b2d 28-Mar-2018 Dan Carpenter <dan.carpenter@oracle.com>

tracing: Uninitialized variable in create_tracing_map_fields()

Smatch complains that idx can be used uninitialized when we check if
(idx < 0). It has to be the first iteration through the loop and the
HIST_FIELD_FL_STACKTRACE bit has to be clear and the HIST_FIELD_FL_VAR
bit has to be set to reach the bug.

Link: http://lkml.kernel.org/r/20180328114815.GC29050@mwanda

Fixes: 30350d65ac56 ("tracing: Add variable support to hist triggers")
Acked-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# ad452870 28-Mar-2018 Tom Zanussi <tom.zanussi@linux.intel.com>

tracing: Make sure variable string fields are NULL-terminated

The strncpy() currently being used for variable string fields can
result in a lack of termination if the string length is equal to the
field size. Use the safer strscpy() instead, which will guarantee
termination.

Link: http://lkml.kernel.org/r/fb97c1e518fb358c12a4057d7445ba2c46956cd7.1522256721.git.tom.zanussi@linux.intel.com

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 48f79473 28-Mar-2018 Tom Zanussi <tom.zanussi@linux.intel.com>

tracing: Add action comparisons when testing matching hist triggers

Actions also need to be considered when checking for matching triggers
- triggers differing only by action should be allowed, but currently
aren't because the matching check ignores the action and erroneously
returns -EEXIST.

Add and call an actions_match() function to address that.

Here's an example using onmatch() actions. The first -EEXIST shouldn't
occur because the onmatch() is different in the second wakeup_latency()
param. The second -EEXIST shouldn't occur because it's a different
action (in this case, it doesn't have an action, so shouldn't be seen
as being the same and therefore rejected).

In the after case, both are correctly accepted (and trying to add one of
them again returns -EEXIST as it should).

before:

# echo 'wakeup_latency u64 lat; pid_t pid' >> /sys/kernel/debug/tracing/synthetic_events
# echo 'hist:keys=pid:ts0=common_timestamp.usecs if comm=="cyclictest"' >> /sys/kernel/debug/tracing/events/sched/sched_wakeup/trigger
# echo 'hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0 if next_comm=="cyclictest"' >> /sys/kernel/debug/tracing/events/sched/sched_switch/trigger
# echo 'hist:keys=next_pid:onmatch(sched.sched_wakeup).wakeup_latency(sched.sched_switch.$wakeup_lat,next_pid) if next_comm=="cyclictest"' >> /sys/kernel/debug/tracing/events/sched/sched_switch/trigger
# echo 'hist:keys=next_pid:onmatch(sched.sched_wakeup).wakeup_latency(sched.sched_switch.$wakeup_lat,prev_pid) if next_comm=="cyclictest"' >> /sys/kernel/debug/tracing/events/sched/sched_switch/trigger
-su: echo: write error: File exists
# echo 'hist:keys=next_pid if next_comm=="cyclictest"' >> /sys/kernel/debug/tracing/events/sched/sched_switch/trigger
-su: echo: write error: File exists

after:

# echo 'wakeup_latency u64 lat; pid_t pid' >> /sys/kernel/debug/tracing/synthetic_events
# echo 'hist:keys=pid:ts0=common_timestamp.usecs if comm=="cyclictest"' >> /sys/kernel/debug/tracing/events/sched/sched_wakeup/trigger
# echo 'hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0 if next_comm=="cyclictest"' >> /sys/kernel/debug/tracing/events/sched/sched_switch/trigger
# echo 'hist:keys=next_pid:onmatch(sched.sched_wakeup).wakeup_latency(sched.sched_switch.$wakeup_lat,next_pid) if next_comm=="cyclictest"' >> /sys/kernel/debug/tracing/events/sched/sched_switch/trigger
# echo 'hist:keys=next_pid:onmatch(sched.sched_wakeup).wakeup_latency(sched.sched_switch.$wakeup_lat,prev_pid) if next_comm=="cyclictest"' >> /sys/kernel/debug/tracing/events/sched/sched_switch/trigger
# echo 'hist:keys=next_pid if next_comm=="cyclictest"' >> /sys/kernel/debug/tracing/events/sched/sched_switch/trigger

Link: http://lkml.kernel.org/r/a7fd668b87ec10736c8f016ac4279c8480d50c2b.1522256721.git.tom.zanussi@linux.intel.com

Tested-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 76690945 28-Mar-2018 Tom Zanussi <tom.zanussi@linux.intel.com>

tracing: Don't add flag strings when displaying variable references

Variable references should never have flags appended when displayed -
prevent that from happening.

Before:

# cat /sys/kernel/debug/tracing/events/sched/sched_switch/trigger
hist:keys=next_pid:vals=hitcount:wakeup_lat=common_timestamp.usecs-$ts0.usecs:...

After:

hist:keys=next_pid:vals=hitcount:wakeup_lat=common_timestamp.usecs-$ts0:...

Link: http://lkml.kernel.org/r/913318a5610ef6b24af2522575f671fa6ee19b6b.1522256721.git.tom.zanussi@linux.intel.com

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 0ae7961e 28-Mar-2018 Tom Zanussi <tom.zanussi@linux.intel.com>

tracing: Fix display of hist trigger expressions containing timestamps

When displaying hist triggers, variable references that have the
timestamp field flag set are erroneously displayed as common_timestamp
rather than the variable reference. Additionally, timestamp
expressions are displayed in the same way. Fix this by forcing the
timestamp flag handling to follow variable reference and expression
handling.

Before:

# cat /sys/kernel/debug/tracing/events/sched/sched_switch/trigger
hist:keys=next_pid:vals=hitcount:wakeup_lat=common_timestamp.usecs:...

After:

# cat /sys/kernel/debug/tracing/events/sched/sched_switch/trigger
hist:keys=next_pid:vals=hitcount:wakeup_lat=common_timestamp.usecs-$ts0.usecs:...

Link: http://lkml.kernel.org/r/92746b06be67499c2a6217bd55395b350ad18fad.1522256721.git.tom.zanussi@linux.intel.com

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 5e4cf2bf 23-Mar-2018 Dan Carpenter <dan.carpenter@oracle.com>

tracing: Fix a potential NULL dereference

We forgot to set the error code on this path so we return ERR_PTR(0)
which is NULL. It results in a NULL dereference in the caller.

Link: http://lkml.kernel.org/r/20180323113735.GC28518@mwanda

Fixes: 100719dcef44 ("tracing: Add simple expression support to hist triggers")
Acked-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 4708abc6 07-Feb-2018 Steven Rostedt (VMware) <rostedt@goodmis.org>

tracing: Use the ring-buffer nesting to allow synthetic events to be traced

Synthetic events can be done within the recording of other events. Notify
the ring buffer via ring_buffer_nest_start() and ring_buffer_nest_end() that
this is intended and not to block it due to its recursion protection.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# a4072fe8 15-Jan-2018 Tom Zanussi <tom.zanussi@linux.intel.com>

tracing: Add a clock attribute for hist triggers

The default clock if timestamps are used in a histogram is "global".
If timestamps aren't used, the clock is irrelevant.

Use the "clock=" param only if you want to override the default
"global" clock for a histogram with timestamps.

Link: http://lkml.kernel.org/r/427bed1389c5d22aa40c3e0683e30cc3d151e260.1516069914.git.tom.zanussi@linux.intel.com

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Rajvi Jingar <rajvi.jingar@intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# f404da6e 15-Jan-2018 Tom Zanussi <tom.zanussi@linux.intel.com>

tracing: Add 'last error' error facility for hist triggers

With the addition of variables and actions, it's become necessary to
provide more detailed error information to users about syntax errors.

Add a 'last error' facility accessible via the erroring event's 'hist'
file. Reading the hist file after an error will display more detailed
information about what went wrong, if information is available. This
extended error information will be available until the next hist
trigger command for that event.

# echo xxx > /sys/kernel/debug/tracing/events/sched/sched_wakeup/trigger
echo: write error: Invalid argument

# cat /sys/kernel/debug/tracing/events/sched/sched_wakeup/hist

ERROR: Couldn't yyy: zzz
Last command: xxx

Also add specific error messages for variable and action errors.

Link: http://lkml.kernel.org/r/64e9c422fc8aeafcc2f7a3b4328c0cffe7969129.1516069914.git.tom.zanussi@linux.intel.com

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 7e8b88a3 15-Jan-2018 Tom Zanussi <tom.zanussi@linux.intel.com>

tracing: Add hist trigger support for variable reference aliases

Add support for alias=$somevar where alias can be used as
onmatch.xxx($alias).

Aliases are a way of creating a new name for an existing variable, for
flexibly in making naming more clear in certain cases. For example in
the below the user perhaps feels that using $new_lat in the synthetic
event invocation is opaque or doesn't fit well stylistically with
previous triggers, so creates an alias of $new_lat named $latency and
uses that in the call instead:

# echo 'hist:keys=next_pid:new_lat=common_timestamp.usecs' >
/sys/kernel/debug/tracing/events/sched/sched_switch/trigger

# echo 'hist:keys=pid:latency=$new_lat:
onmatch(sched.sched_switch).wake2($latency,pid)' >
/sys/kernel/debug/tracing/events/synthetic/wake1/trigger

Link: http://lkml.kernel.org/r/ef20a65d921af3a873a6f1e8c71407c926d5586f.1516069914.git.tom.zanussi@linux.intel.com

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 8b7622bf 15-Jan-2018 Tom Zanussi <tom.zanussi@linux.intel.com>

tracing: Add cpu field for hist triggers

A common key to use in a histogram is the cpuid - add a new cpu
'synthetic' field named 'cpu' for that purpose.

Link: http://lkml.kernel.org/r/89537645bfc957e0d76e2cacf5f0ada88691a6cc.1516069914.git.tom.zanussi@linux.intel.com

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# ec5ce098 15-Jan-2018 Tom Zanussi <tom.zanussi@linux.intel.com>

tracing: Allow whitespace to surround hist trigger filter

The existing code only allows for one space before and after the 'if'
specifying the filter for a hist trigger. Add code to make that more
permissive as far as whitespace goes. Specifically, we want to allow
spaces in the trigger itself now that we have additional syntax
(onmatch/onmax) where spaces are more natural e.g. spaces after commas
in param lists.

Link: http://lkml.kernel.org/r/1053090c3c308d4f431accdeb59dff4b511d4554.1516069914.git.tom.zanussi@linux.intel.com

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 50450603 15-Jan-2018 Tom Zanussi <tom.zanussi@linux.intel.com>

tracing: Add 'onmax' hist trigger action support

Add an 'onmax(var).save(field,...)' hist trigger action which is
invoked whenever an event exceeds the current maximum.

The end result is that the trace event fields or variables specified
as the onmax.save() params will be saved if 'var' exceeds the current
maximum for that hist trigger entry. This allows context from the
event that exhibited the new maximum to be saved for later reference.
When the histogram is displayed, additional fields displaying the
saved values will be printed.

As an example the below defines a couple of hist triggers, one for
sched_wakeup and another for sched_switch, keyed on pid. Whenever a
sched_wakeup occurs, the timestamp is saved in the entry corresponding
to the current pid, and when the scheduler switches back to that pid,
the timestamp difference is calculated. If the resulting latency
exceeds the current maximum latency, the specified save() values are
saved:

# echo 'hist:keys=pid:ts0=common_timestamp.usecs \
if comm=="cyclictest"' >> \
/sys/kernel/debug/tracing/events/sched/sched_wakeup/trigger

# echo 'hist:keys=next_pid:\
wakeup_lat=common_timestamp.usecs-$ts0:\
onmax($wakeup_lat).save(next_comm,prev_pid,prev_prio,prev_comm) \
if next_comm=="cyclictest"' >> \
/sys/kernel/debug/tracing/events/sched/sched_switch/trigger

When the histogram is displayed, the max value and the saved values
corresponding to the max are displayed following the rest of the
fields:

# cat /sys/kernel/debug/tracing/events/sched/sched_switch/hist

{ next_pid: 3728 } hitcount: 199 \
max: 123 next_comm: cyclictest prev_pid: 0 \
prev_prio: 120 prev_comm: swapper/3
{ next_pid: 3730 } hitcount: 1321 \
max: 15 next_comm: cyclictest prev_pid: 0 \
prev_prio: 120 prev_comm: swapper/1
{ next_pid: 3729 } hitcount: 1973\
max: 25 next_comm: cyclictest prev_pid: 0 \
prev_prio: 120 prev_comm: swapper/0

Totals:
Hits: 3493
Entries: 3
Dropped: 0

Link: http://lkml.kernel.org/r/006907f71b1e839bb059337ec3c496f84fcb71de.1516069914.git.tom.zanussi@linux.intel.com

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# c282a386 15-Jan-2018 Tom Zanussi <tom.zanussi@linux.intel.com>

tracing: Add 'onmatch' hist trigger action support

Add an 'onmatch(matching.event).<synthetic_event_name>(param list)'
hist trigger action which is invoked with the set of variables or
event fields named in the 'param list'. The result is the generation
of a synthetic event that consists of the values contained in those
variables and/or fields at the time the invoking event was hit.

As an example the below defines a simple synthetic event using a
variable defined on the sched_wakeup_new event, and shows the event
definition with unresolved fields, since the sched_wakeup_new event
with the testpid variable hasn't been defined yet:

# echo 'wakeup_new_test pid_t pid; int prio' >> \
/sys/kernel/debug/tracing/synthetic_events

# cat /sys/kernel/debug/tracing/synthetic_events
wakeup_new_test pid_t pid; int prio

The following hist trigger both defines a testpid variable and
specifies an onmatch() trace action that uses that variable along with
a non-variable field to generate a wakeup_new_test synthetic event
whenever a sched_wakeup_new event occurs, which because of the 'if
comm == "cyclictest"' filter only happens when the executable is
cyclictest:

# echo 'hist:testpid=pid:keys=$testpid:\
onmatch(sched.sched_wakeup_new).wakeup_new_test($testpid, prio) \
if comm=="cyclictest"' >> \
/sys/kernel/debug/tracing/events/sched/sched_wakeup_new/trigger

Creating and displaying a histogram based on those events is now just
a matter of using the fields and new synthetic event in the
tracing/events/synthetic directory, as usual:

# echo 'hist:keys=pid,prio:sort=pid,prio' >> \
/sys/kernel/debug/tracing/events/synthetic/wakeup_new_test/trigger

Link: http://lkml.kernel.org/r/8c2a574bcb7530c876629c901ecd23911b14afe8.1516069914.git.tom.zanussi@linux.intel.com

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Rajvi Jingar <rajvi.jingar@intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 02205a67 15-Jan-2018 Tom Zanussi <tom.zanussi@linux.intel.com>

tracing: Add support for 'field variables'

Users should be able to directly specify event fields in hist trigger
'actions' rather than being forced to explicitly create a variable for
that purpose.

Add support allowing fields to be used directly in actions, which
essentially does just that - creates 'invisible' variables for each
bare field specified in an action. If a bare field refers to a field
on another (matching) event, it even creates a special histogram for
the purpose (since variables can't be defined on an existing histogram
after histogram creation).

Here's a simple example that demonstrates both. Basically the
onmatch() action creates a list of variables corresponding to the
parameters of the synthetic event to be generated, and then uses those
values to generate the event. So for the wakeup_latency synthetic
event 'call' below the first param, $wakeup_lat, is a variable defined
explicitly on sched_switch, where 'next_pid' is just a normal field on
sched_switch, and prio is a normal field on sched_waking.

Since the mechanism works on variables, those two normal fields just
have 'invisible' variables created internally for them. In the case of
'prio', which is on another event, we actually need to create an
additional hist trigger and define the invisible variable on that, since
once a hist trigger is defined, variables can't be added to it later.

echo 'wakeup_latency u64 lat; pid_t pid; int prio' >>
/sys/kernel/debug/tracing/synthetic_events

echo 'hist:keys=pid:ts0=common_timestamp.usecs >>
/sys/kernel/debug/tracing/events/sched/sched_waking/trigger

echo 'hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0:
onmatch(sched.sched_waking).wakeup_latency($wakeup_lat,next_pid,prio)
>> /sys/kernel/debug/tracing/events/sched/sched_switch/trigger

Link: http://lkml.kernel.org/r/8e8dcdac1ea180ed7a3689e1caeeccede9dc42b3.1516069914.git.tom.zanussi@linux.intel.com

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 4b147936 15-Jan-2018 Tom Zanussi <tom.zanussi@linux.intel.com>

tracing: Add support for 'synthetic' events

Synthetic events are user-defined events generated from hist trigger
variables saved from one or more other events.

To define a synthetic event, the user writes a simple specification
consisting of the name of the new event along with one or more
variables and their type(s), to the tracing/synthetic_events file.

For instance, the following creates a new event named 'wakeup_latency'
with 3 fields: lat, pid, and prio:

# echo 'wakeup_latency u64 lat; pid_t pid; int prio' >> \
/sys/kernel/debug/tracing/synthetic_events

Reading the tracing/synthetic_events file lists all the
currently-defined synthetic events, in this case the event we defined
above:

# cat /sys/kernel/debug/tracing/synthetic_events
wakeup_latency u64 lat; pid_t pid; int prio

At this point, the synthetic event is ready to use, and a histogram
can be defined using it:

# echo 'hist:keys=pid,prio,lat.log2:sort=pid,lat' >> \
/sys/kernel/debug/tracing/events/synthetic/wakeup_latency/trigger

The new event is created under the tracing/events/synthetic/ directory
and looks and behaves just like any other event:

# ls /sys/kernel/debug/tracing/events/synthetic/wakeup_latency
enable filter format hist id trigger

Although a histogram can be defined for it, nothing will happen until
an action tracing that event via the trace_synth() function occurs.
The trace_synth() function is very similar to all the other trace_*
invocations spread throughout the kernel, except in this case the
trace_ function and its corresponding tracepoint isn't statically
generated but defined by the user at run-time.

How this can be automatically hooked up via a hist trigger 'action' is
discussed in a subsequent patch.

Link: http://lkml.kernel.org/r/c68df2284b7d172669daf9be29db62ad49bbc559.1516069914.git.tom.zanussi@linux.intel.com

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
[fix noderef.cocci warnings, sizeof pointer for kcalloc of event->fields]
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 0212e2aa 15-Jan-2018 Tom Zanussi <tom.zanussi@linux.intel.com>

tracing: Add hist trigger action hook

Add a hook for executing extra actions whenever a histogram entry is
added or updated.

The default 'action' when a hist entry is added to a histogram is to
update the set of values associated with it. Some applications may
want to perform additional actions at that point, such as generate
another event, or compare and save a maximum.

Add a simple framework for doing that; specific actions will be
implemented on top of it in later patches.

Link: http://lkml.kernel.org/r/9482ba6a3eaf5ca6e60954314beacd0e25c05b24.1516069914.git.tom.zanussi@linux.intel.com

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 067fe038 15-Jan-2018 Tom Zanussi <tom.zanussi@linux.intel.com>

tracing: Add variable reference handling to hist triggers

Add the necessary infrastructure to allow the variables defined on one
event to be referenced in another. This allows variables set by a
previous event to be referenced and used in expressions combining the
variable values saved by that previous event and the event fields of
the current event. For example, here's how a latency can be
calculated and saved into yet another variable named 'wakeup_lat':

# echo 'hist:keys=pid,prio:ts0=common_timestamp ...
# echo 'hist:keys=next_pid:wakeup_lat=common_timestamp-$ts0 ...

In the first event, the event's timetamp is saved into the variable
ts0. In the next line, ts0 is subtracted from the second event's
timestamp to produce the latency.

Further users of variable references will be described in subsequent
patches, such as for instance how the 'wakeup_lat' variable above can
be displayed in a latency histogram.

Link: http://lkml.kernel.org/r/b1d3e6975374e34d501ff417c20189c3f9b2c7b8.1516069914.git.tom.zanussi@linux.intel.com

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 19a9facd 15-Jan-2018 Tom Zanussi <tom.zanussi@linux.intel.com>

tracing: Add hist_field 'type' field

Future support for synthetic events requires hist_field 'type'
information, so add a field for that.

Also, make other hist_field attribute usage consistent (size,
is_signed, etc).

Link: http://lkml.kernel.org/r/3fd12a2e86316b05151ba0d7c68268e780af2c9d.1516069914.git.tom.zanussi@linux.intel.com

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# df35d93b 15-Jan-2018 Tom Zanussi <tom.zanussi@linux.intel.com>

tracing: Pass tracing_map_elt to hist_field accessor functions

Some accessor functions, such as for variable references, require
access to a corrsponding tracing_map_elt.

Add a tracing_map_elt param to the function signature and update the
accessor functions accordingly.

Link: http://lkml.kernel.org/r/e0f292b068e9e4948da1d5af21b5ae0efa9b5717.1516069914.git.tom.zanussi@linux.intel.com

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# af6a29bc 15-Jan-2018 Tom Zanussi <tom.zanussi@linux.intel.com>

tracing: Generalize per-element hist trigger data

Up until now, hist triggers only needed per-element support for saving
'comm' data, which was saved directly as a private data pointer.

In anticipation of the need to save other data besides 'comm', add a
new hist_elt_data struct for the purpose, and switch the current
'comm'-related code over to that.

Link: http://lkml.kernel.org/r/4502c338c965ddf5fc19fb1ec4764391e001ed4b.1516069914.git.tom.zanussi@linux.intel.com

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 100719dc 15-Jan-2018 Tom Zanussi <tom.zanussi@linux.intel.com>

tracing: Add simple expression support to hist triggers

Add support for simple addition, subtraction, and unary expressions
(-(expr) and expr, where expr = b-a, a+b, a+b+c) to hist triggers, in
order to support a minimal set of useful inter-event calculations.

These operations are needed for calculating latencies between events
(timestamp1-timestamp0) and for combined latencies (latencies over 3
or more events).

In the process, factor out some common code from key and value
parsing.

Link: http://lkml.kernel.org/r/9a9308ead4fe32a433d9c7e95921fb798394f6b2.1516069914.git.tom.zanussi@linux.intel.com

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
[kbuild test robot fix, add static to parse_atom()]
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
[ Replaced '//' comments with normal /* */ comments ]
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 2ece94fb 15-Jan-2018 Tom Zanussi <tom.zanussi@linux.intel.com>

tracing: Move get_hist_field_flags()

Move get_hist_field_flags() to make it more easily accessible for new
code (and keep the move separate from new functionality).

Link: http://lkml.kernel.org/r/32470f0a7047ec7a6e84ba5ec89d6142cc6ede7d.1516069914.git.tom.zanussi@linux.intel.com

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 1a361dfc 15-Jan-2018 Tom Zanussi <tom.zanussi@linux.intel.com>

tracing: Account for variables in named trigger compatibility

Named triggers must also have the same set of variables in order to be
considered compatible - update the trigger match test to account for
that.

The reason for this requirement is that named triggers with variables
are meant to allow one or more events to set the same variable.

Link: http://lkml.kernel.org/r/a17eae6328a99917f9d5c66129c9fcd355279ee9.1516069914.git.tom.zanussi@linux.intel.com

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 30350d65 15-Jan-2018 Tom Zanussi <tom.zanussi@linux.intel.com>

tracing: Add variable support to hist triggers

Add support for saving the value of a current event's event field by
assigning it to a variable that can be read by a subsequent event.

The basic syntax for saving a variable is to simply prefix a unique
variable name not corresponding to any keyword along with an '=' sign
to any event field.

Both keys and values can be saved and retrieved in this way:

# echo 'hist:keys=next_pid:vals=$ts0:ts0=common_timestamp ...
# echo 'hist:timer_pid=common_pid:key=$timer_pid ...'

If a variable isn't a key variable or prefixed with 'vals=', the
associated event field will be saved in a variable but won't be summed
as a value:

# echo 'hist:keys=next_pid:ts1=common_timestamp:...

Multiple variables can be assigned at the same time:

# echo 'hist:keys=pid:vals=$ts0,$b,field2:ts0=common_timestamp,b=field1 ...

Multiple (or single) variables can also be assigned at the same time
using separate assignments:

# echo 'hist:keys=pid:vals=$ts0:ts0=common_timestamp:b=field1:c=field2 ...

Variables set as above can be used by being referenced from another
event, as described in a subsequent patch.

Link: http://lkml.kernel.org/r/fc93c4944d9719dbcb1d0067be627d44e98e2adc.1516069914.git.tom.zanussi@linux.intel.com

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Baohong Liu <baohong.liu@intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 860f9f6b 15-Jan-2018 Tom Zanussi <tom.zanussi@linux.intel.com>

tracing: Add usecs modifier for hist trigger timestamps

Appending .usecs onto a common_timestamp field will cause the
timestamp value to be in microseconds instead of the default
nanoseconds. A typical latency histogram using usecs would look like
this:

# echo 'hist:keys=pid,prio:ts0=common_timestamp.usecs ...
# echo 'hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0 ...

This also adds an external trace_clock_in_ns() to trace.c for the
timestamp conversion.

Link: http://lkml.kernel.org/r/4e813705a170b3e13e97dc3135047362fb1a39f3.1516069914.git.tom.zanussi@linux.intel.com

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# b559d003 15-Jan-2018 Tom Zanussi <tom.zanussi@linux.intel.com>

tracing: Add hist_data member to hist_field

Allow hist_data access via hist_field. Some users of hist_fields
require or will require more access to the associated hist_data.

Link: http://lkml.kernel.org/r/d04cd0768f5228ebb4ac0ba4a847bc4d14d4826f.1516069914.git.tom.zanussi@linux.intel.com

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# ad42febe 15-Jan-2018 Tom Zanussi <tom.zanussi@linux.intel.com>

tracing: Add hist trigger timestamp support

Add support for a timestamp event field. This is actually a 'pseudo-'
event field in that it behaves like it's part of the event record, but
is really part of the corresponding ring buffer event.

To make use of the timestamp field, users can specify
"common_timestamp" as a field name for any histogram. Note that this
doesn't make much sense on its own either as either a key or value,
but needs to be supported even so, since follow-on patches will add
support for making use of this field in time deltas. The
common_timestamp 'field' is not a bona fide event field - so you won't
find it in the event description - but rather it's a synthetic field
that can be used like a real field.

Note that the use of this field requires the ring buffer be put into
'absolute timestamp' mode, which saves the complete timestamp for each
event rather than an offset. This mode will be enabled if and only if
a histogram makes use of the "common_timestamp" field.

Link: http://lkml.kernel.org/r/97afbd646ed146e26271f3458b4b33e16d7817c2.1516069914.git.tom.zanussi@linux.intel.com

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Baohong Liu <baohong.liu@intel.com>
[kasan use-after-free fix]
Signed-off-by: Vedang Patel <vedang.patel@intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 9b1ae035 15-Jan-2018 Tom Zanussi <tom.zanussi@linux.intel.com>

tracing: Break out hist trigger assignment parsing

This will make it easier to add variables, and makes the parsing code
cleaner regardless.

Link: http://lkml.kernel.org/r/e574b3291bbe15e35a4dfc87e5395aa715701c98.1516069914.git.tom.zanussi@linux.intel.com

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Rajvi Jingar <rajvi.jingar@intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# fbd302cb 15-Jan-2018 Tom Zanussi <tom.zanussi@linux.intel.com>

tracing: Add ring buffer event param to hist field functions

Some events such as timestamps require access to a ring_buffer_event
struct; add a param so that hist field functions can access that.

Link: http://lkml.kernel.org/r/2ff4af18e72b6002eb86b26b2a7f39cef7d1dfe4.1516069914.git.tom.zanussi@linux.intel.com

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 1ac4f51c 15-Jan-2018 Tom Zanussi <tom.zanussi@linux.intel.com>

tracing: Give event triggers access to ring_buffer_event

The ring_buffer event can provide a timestamp that may be useful to
various triggers - pass it into the handlers for that purpose.

Link: http://lkml.kernel.org/r/6de592683b59fa70ffa5d43d0109896623fc1367.1516069914.git.tom.zanussi@linux.intel.com

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# c193707d 15-Jan-2018 Vedang Patel <vedang.patel@intel.com>

tracing: Remove code which merges duplicates

We now have the logic to detect and remove duplicates in the
tracing_map hash table. The code which merges duplicates in the
histogram is redundant now. So, modify this code just to detect
duplicates. The duplication detection code is still kept to ensure
that any rare race condition which might cause duplicates does not go
unnoticed.

Link: http://lkml.kernel.org/r/55215cf59e2674391bdaf772fdafc4c393352b03.1516069914.git.tom.zanussi@linux.intel.com

Signed-off-by: Vedang Patel <vedang.patel@intel.com>
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 5819eadd 22-Sep-2017 Tom Zanussi <tom.zanussi@linux.intel.com>

tracing: Reimplement log2

log2 as currently implemented applies only to u64 trace_event_field
derived fields, and assumes that anything it's applied to is a u64
field.

To prepare for synthetic fields like latencies, log2 should be
applicable to those as well, so take the opportunity now to fix the
current problems as well as expand to more general uses.

log2 should be thought of as a chaining function rather than a field
type. To enable this as well as possible future function
implementations, add a hist_field operand array into the hist_field
definition for this purpose, and make use of it to implement the log2
'function'.

Link: http://lkml.kernel.org/r/b47f93fc0b87b36eccf716b0c018f3a71e1f1111.1506105045.git.tom.zanussi@linux.intel.com

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 85013256 22-Sep-2017 Tom Zanussi <tom.zanussi@linux.intel.com>

tracing: Add hist_field_name() accessor

In preparation for hist_fields that won't be strictly based on
trace_event_fields, add a new hist_field_name() accessor to allow that
flexibility and update associated users.

Link: http://lkml.kernel.org/r/5b5a2d36dde067cbbe2434b10f06daac27b7dbd5.1506105045.git.tom.zanussi@linux.intel.com

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 0d7a8325 22-Sep-2017 Tom Zanussi <tom.zanussi@linux.intel.com>

tracing: Clean up hist_field_flags enum

As we add more flags, specifying explicit integers for the flag values
becomes more unwieldy and error-prone - switch them over to left-shift
values.

Link: http://lkml.kernel.org/r/e644e4fb7665aec015f4a2d84a2f990d3dd5b8a1.1506105045.git.tom.zanussi@linux.intel.com

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# a15f7fc2 22-Sep-2017 Tom Zanussi <tom.zanussi@linux.intel.com>

tracing: Exclude 'generic fields' from histograms

There are a small number of 'generic fields' (comm/COMM/cpu/CPU) that
are found by trace_find_event_field() but are only meant for
filtering. Specifically, they unlike normal fields, they have a size
of 0 and thus wreak havoc when used as a histogram key.

Exclude these (return -EINVAL) when used as histogram keys.

Link: http://lkml.kernel.org/r/956154cbc3e8a4f0633d619b886c97f0f0edf7b4.1506105045.git.tom.zanussi@linux.intel.com

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 6e7a2398 22-Aug-2017 Colin Ian King <colin.king@canonical.com>

tracing: Remove redundant unread variable ret

Integer ret is being assigned but never used and hence it is
redundant. Remove it, fixes clang warning:

trace_events_hist.c:1077:3: warning: Value stored to 'ret' is never read

Link: http://lkml.kernel.org/r/20170823112309.19383-1-colin.king@canonical.com

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# b2d09103 03-Feb-2017 Ingo Molnar <mingo@kernel.org>

sched/headers: Prepare to use <linux/rcuupdate.h> instead of <linux/rculist.h> in <linux/sched.h>

We don't actually need the full rculist.h header in sched.h anymore,
we will be able to include the smaller rcupdate.h header instead.

But first update code that relied on the implicit header inclusion.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 7522c03a 29-Jun-2016 Tom Zanussi <tom.zanussi@linux.intel.com>

tracing: Fix use-after-free in hist_register_trigger()

This fixes a use-after-free case flagged by KASAN; make sure the test
happens before the potential free in this case.

Link: http://lkml.kernel.org/r/48fd74ab61bebd7dca9714386bb47d7c5ccd6a7b.1467247517.git.tom.zanussi@linux.intel.com

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 47c18569 29-Jun-2016 Steven Rostedt <rostedt@goodmis.org>

tracing: Fix use-after-free in hist_unreg_all/hist_enable_unreg_all

While running tools/testing/selftests test suite with KASAN, Dmitry
Vyukov hit the following use-after-free report:

==================================================================
BUG: KASAN: use-after-free in hist_unreg_all+0x1a1/0x1d0 at addr
ffff880031632cc0
Read of size 8 by task ftracetest/7413
==================================================================
BUG kmalloc-128 (Not tainted): kasan: bad access detected
------------------------------------------------------------------

This fixes the problem, along with the same problem in
hist_enable_unreg_all().

Link: http://lkml.kernel.org/r/c3d05b79e42555b6e36a3a99aae0e37315ee5304.1467247517.git.tom.zanussi@linux.intel.com

Cc: Dmitry Vyukov <dvyukov@google.com>
[Copied Steve's hist_enable_unreg_all() fix to hist_unreg_all()]
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 432480c5 25-Apr-2016 Tom Zanussi <tom.zanussi@linux.intel.com>

tracing: Add check for NULL event field when creating hist field

Smatch flagged create_hist_field() as possibly being able to
dereference a NULL pointer, although the current code exits in all
cases where the event field could be NULL, so it's not actually a
problem.

Still, to prevent future changes to the code from overlooking new
cases, make the NULL pointer check explicit and warn once in that
case.

Link: http://lkml.kernel.org/r/cfbc003f534a3e441b4313272fd412310aba6336.1461610073.git.tom.zanussi@linux.intel.com

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# d50c744e 08-Mar-2016 Steven Rostedt (Red Hat) <rostedt@goodmis.org>

tracing: Fix unsigned comparison to zero in hist trigger code

Fengguang Wu's bot found two comparisons of unsigned integers to zero. These
were real bugs, as it would miss error conditions returned to zero.

trace_events_hist.c:426:6-9: WARNING: Unsigned expression compared with zero: idx < 0
trace_events_hist.c:568:5-14: WARNING: Unsigned expression compared with zero: n_entries < 0

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 4b94f5b7 02-Mar-2016 Namhyung Kim <namhyung@kernel.org>

tracing: Add hist trigger 'log2' modifier

Allow users to have numeric fields displayed as log2 values in case
value range is very wide by appending '.log2' to field names.

For example,

# echo 'hist:key=bytes_req' > kmalloc/trigger
# cat kmalloc/hist

{ bytes_req: 504 } hitcount: 1
{ bytes_req: 11 } hitcount: 1
{ bytes_req: 104 } hitcount: 1
{ bytes_req: 48 } hitcount: 1
{ bytes_req: 2048 } hitcount: 1
{ bytes_req: 4096 } hitcount: 1
{ bytes_req: 240 } hitcount: 1
{ bytes_req: 392 } hitcount: 1
{ bytes_req: 13 } hitcount: 1
{ bytes_req: 28 } hitcount: 1
{ bytes_req: 12 } hitcount: 1
{ bytes_req: 64 } hitcount: 2
{ bytes_req: 128 } hitcount: 2
{ bytes_req: 32 } hitcount: 2
{ bytes_req: 8 } hitcount: 11
{ bytes_req: 10 } hitcount: 13
{ bytes_req: 24 } hitcount: 25
{ bytes_req: 160 } hitcount: 29
{ bytes_req: 16 } hitcount: 33
{ bytes_req: 80 } hitcount: 36

When using '.log2' modifier, the output looks like:

# echo 'hist:key=bytes_req.log2' > kmalloc/trigger
# cat kmalloc/hist

{ bytes_req: ~ 2^12 } hitcount: 1
{ bytes_req: ~ 2^11 } hitcount: 1
{ bytes_req: ~ 2^9 } hitcount: 2
{ bytes_req: ~ 2^6 } hitcount: 3
{ bytes_req: ~ 2^3 } hitcount: 13
{ bytes_req: ~ 2^5 } hitcount: 19
{ bytes_req: ~ 2^8 } hitcount: 49
{ bytes_req: ~ 2^7 } hitcount: 57
{ bytes_req: ~ 2^4 } hitcount: 74

Link: http://lkml.kernel.org/r/7ff396b246c6a881f46b979735fddf05a0d6c71a.1457029949.git.tom.zanussi@linux.intel.com

Cc: Tom Zanussi <tom.zanussi@linux.intel.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 5463bfda 02-Mar-2016 Tom Zanussi <tom.zanussi@linux.intel.com>

tracing: Add support for named hist triggers

Allow users to define 'named' hist triggers. All triggers created
with the same 'name=xxx' option will update the same shared histogram
data.

This expands the hist trigger syntax from this:

# echo hist:keys=xxx ... [ if filter] > event/trigger

to this:

# echo hist:name=xxx:keys=xxx ... [ if filter] > event/trigger

Named histograms must use a 'compatible' set of keys and values, which
means each event added to a set of named triggers must have the same
names and types.

Reading the 'hist' file of any of the participating events will
produce the same output as any other participating event, which is to
be expected since they share the same data.

Link: http://lkml.kernel.org/r/1dbc84ee3322a75daaf5b3ef1d0cc0a2fb682fc7.1457029949.git.tom.zanussi@linux.intel.com

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 52a7f16d 02-Mar-2016 Tom Zanussi <tom.zanussi@linux.intel.com>

tracing: Add support for multiple hist triggers per event

Allow users to define any number of hist triggers per trace event.
Any number of hist triggers may be added for a given event, which may
differ by key, value, or filter.

Reading the event's 'hist' file will display the output of all the
hist triggers defined on an event concatenated in the order they were
defined.

Link: http://lkml.kernel.org/r/48a0c8dd34c344571de880fb35e211c6d9a28961.1457029949.git.tom.zanussi@linux.intel.com

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# d0bad49b 02-Mar-2016 Tom Zanussi <tom.zanussi@linux.intel.com>

tracing: Add enable_hist/disable_hist triggers

Similar to enable_event/disable_event triggers, these triggers enable
and disable the aggregation of events into maps rather than enabling
and disabling their writing into the trace buffer.

They can be used to automatically start and stop hist triggers based
on a matching filter condition.

If there's a paused hist trigger on system:event, the following would
start it when the filter condition was hit:

# echo enable_hist:system:event [ if filter] > event/trigger

And the following would disable a running system:event hist trigger:

# echo disable_hist:system:event [ if filter] > event/trigger

See Documentation/trace/events.txt for real examples.

Link: http://lkml.kernel.org/r/f812f086e52c8b7c8ad5443487375e03c96a601f.1457029949.git.tom.zanussi@linux.intel.com

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 6a475cb1 02-Mar-2016 Tom Zanussi <tom.zanussi@linux.intel.com>

tracing: Remove restriction on string position in hist trigger keys

If we assume the maximum size for a string field, we don't have to
worry about its position. Since we only allow two keys in a compound
key and having more than one string key in a given compound key
doesn't make much sense anyway, trading a bit of extra space instead
of introducing an arbitrary restriction makes more sense.

We also need to use the event field size for static strings when
copying the contents, otherwise we get random garbage in the key.

Also, cast string return values to avoid warnings on 32-bit compiles.

Finally, rearrange the code without changing any functionality by
moving the compound key updating code into a separate function.

Link: http://lkml.kernel.org/r/8976e1ab04b66bc2700ad1ed0768a2de85ac1983.1457029949.git.tom.zanussi@linux.intel.com

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 79e577cb 02-Mar-2016 Namhyung Kim <namhyung@kernel.org>

tracing: Support string type key properly

The string in a trace event is usually recorded as dynamic array which
is variable length. But current hist code only support fixed length
array so it cannot support most strings.

This patch fixes it by checking filter_type of the field and get
proper pointer with it. With this, it can get a histogram of exec()
based on filenames like below:

# cd /sys/kernel/tracing/events/sched/sched_process_exec
# cat 'hist:key=filename' > trigger
# ps
PID TTY TIME CMD
1 ? 00:00:00 init
29 ? 00:00:00 sh
38 ? 00:00:00 ps
# ls
enable filter format hist id trigger
# cat hist
# trigger info: hist:keys=filename:vals=hitcount:sort=hitcount:size=2048 [active]

{ filename: /usr/bin/ps } hitcount: 1
{ filename: /usr/bin/ls } hitcount: 1
{ filename: /usr/bin/cat } hitcount: 1

Totals:
Hits: 3
Entries: 3
Dropped: 0

Link: http://lkml.kernel.org/r/610180d6df0cfdf11ee205452f3b241dea657233.1457029949.git.tom.zanussi@linux.intel.com

Cc: Tom Zanussi <tom.zanussi@linux.intel.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
[ Added (unsigned long) typecast to fix compile warning ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 69a0200c 02-Mar-2016 Tom Zanussi <tom.zanussi@linux.intel.com>

tracing: Add hist trigger support for stacktraces as keys

It's often useful to be able to use a stacktrace as a hash key, for
keeping a count of the number of times a particular call path resulted
in a trace event, for instance. Add a special key named 'stacktrace'
which can be used as key in a 'keys=' param for this purpose:

# echo hist:keys=stacktrace ... \
[ if filter] > event/trigger

Link: http://lkml.kernel.org/r/87515e90b3785232a874a12156174635a348edb1.1457029949.git.tom.zanussi@linux.intel.com

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 31696198 02-Mar-2016 Tom Zanussi <tom.zanussi@linux.intel.com>

tracing: Add hist trigger 'syscall' modifier

Allow users to have syscall id fields displayed as syscall names in
the output by appending '.syscall' to field names:

# echo hist:keys=aaa.syscall ... \
[ if filter] > event/trigger

Link: http://lkml.kernel.org/r/2bab1e59933d76a14b545bd2e02f80b8b08ac4d3.1457029949.git.tom.zanussi@linux.intel.com

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 6b4827ad 02-Mar-2016 Tom Zanussi <tom.zanussi@linux.intel.com>

tracing: Add hist trigger 'execname' modifier

Allow users to have common_pid field values displayed as program names
in the output by appending '.execname' to a common_pid field name:

# echo hist:keys=common_pid.execname ... \
[ if filter] > event/trigger

Link: http://lkml.kernel.org/r/e172e81f10f5b8d1f08450e3763c850f39fbf698.1457029949.git.tom.zanussi@linux.intel.com

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# c6afad49 02-Mar-2016 Tom Zanussi <tom.zanussi@linux.intel.com>

tracing: Add hist trigger 'sym' and 'sym-offset' modifiers

Allow users to have address fields displayed as symbols in the output
by appending '.sym' or 'sym-offset' to field names:

# echo hist:keys=aaa.sym,bbb.sym-offset ... \
[ if filter] > event/trigger

Link: http://lkml.kernel.org/r/87d4935821491c0275513f0fbfb9bab8d3d3f079.1457029949.git.tom.zanussi@linux.intel.com

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 0c4a6b46 02-Mar-2016 Tom Zanussi <tom.zanussi@linux.intel.com>

tracing: Add hist trigger 'hex' modifier for displaying numeric fields

Allow users to have numeric fields displayed as hex values in the
output by appending '.hex' to field names:

# echo hist:keys=aaa,bbb.hex:vals=ccc.hex ... \
[ if filter] > event/trigger

Link: http://lkml.kernel.org/r/67bd431edda2af5798d7694818f7e8d71b6b3463.1457029949.git.tom.zanussi@linux.intel.com

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# e86ae9ba 02-Mar-2016 Tom Zanussi <tom.zanussi@linux.intel.com>

tracing: Add hist trigger support for clearing a trace

Allow users to append 'clear' to an existing trigger in order to have
the hash table cleared.

This expands the hist trigger syntax from this:
# echo hist:keys=xxx:vals=yyy:sort=zzz.descending:pause/cont \
[ if filter] >> event/trigger

to this:

# echo hist:keys=xxx:vals=yyy:sort=zzz.descending:pause/cont/clear \
[ if filter] >> event/trigger

Link: http://lkml.kernel.org/r/ae15dd0d9b2f7af07a37c1ff682063e2dbcdf160.1457029949.git.tom.zanussi@linux.intel.com

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 83e99914 02-Mar-2016 Tom Zanussi <tom.zanussi@linux.intel.com>

tracing: Add hist trigger support for pausing and continuing a trace

Allow users to append 'pause' or 'continue' to an existing trigger in
order to have it paused or to have a paused trace continue.

This expands the hist trigger syntax from this:
# echo hist:keys=xxx:vals=yyy:sort=zzz.descending \
[ if filter] >> event/trigger

to this:

# echo hist:keys=xxx:vals=yyy:sort=zzz.descending:pause or cont \
[ if filter] >> event/trigger

Link: http://lkml.kernel.org/r/b672a92c14702cb924cdf6fc27ea1809bed04907.1457029949.git.tom.zanussi@linux.intel.com

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# e62347d2 02-Mar-2016 Tom Zanussi <tom.zanussi@linux.intel.com>

tracing: Add hist trigger support for user-defined sorting ('sort=' param)

Allow users to specify keys and/or values to sort on. With this
addition, keys and values specified using the 'keys=' and 'vals='
keywords can be used to sort the hist trigger output via a new 'sort='
keyword. If multiple sort keys are specified, the output will be
sorted using the second key as a secondary sort key, etc. The default
sort order is ascending; if the user wants a different sort order,
'.descending' can be appended to the specific sort key. Before this
addition, output was always sorted by 'hitcount' in ascending order.

This expands the hist trigger syntax from this:

# echo hist:keys=xxx:vals=yyy \
[ if filter] > event/trigger

to this:

# echo hist:keys=xxx:vals=yyy:sort=zzz.descending \
[ if filter] > event/trigger

Link: http://lkml.kernel.org/r/b30a41db66ba486979c4f987aff5fab500ea53b3.1457029949.git.tom.zanussi@linux.intel.com

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 76a3b0c8 02-Mar-2016 Tom Zanussi <tom.zanussi@linux.intel.com>

tracing: Add hist trigger support for compound keys

Allow users to specify multiple trace event fields to use in keys by
allowing multiple fields in the 'keys=' keyword. With this addition,
any unique combination of any of the fields named in the 'keys'
keyword will result in a new entry being added to the hash table.

Link: http://lkml.kernel.org/r/0cfa24e6ac3b0dcece7737d94aa1f322ae3afc4b.1457029949.git.tom.zanussi@linux.intel.com

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# f2606835 02-Mar-2016 Tom Zanussi <tom.zanussi@linux.intel.com>

tracing: Add hist trigger support for multiple values ('vals=' param)

Allow users to specify trace event fields to use in aggregated sums
via a new 'vals=' keyword. Before this addition, the only aggregated
sum supported was the implied value 'hitcount'. With this addition,
'hitcount' is also supported as an explicit value field, as is any
numeric trace event field.

This expands the hist trigger syntax from this:

# echo hist:keys=xxx [ if filter] > event/trigger

to this:

# echo hist:keys=xxx:vals=yyy [ if filter] > event/trigger

Link: http://lkml.kernel.org/r/2a5d1adb5ba6c65d7bb2148e379f2fed47f29a68.1457029949.git.tom.zanussi@linux.intel.com

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 7ef224d1 02-Mar-2016 Tom Zanussi <tom.zanussi@linux.intel.com>

tracing: Add 'hist' event trigger command

'hist' triggers allow users to continually aggregate trace events,
which can then be viewed afterwards by simply reading a 'hist' file
containing the aggregation in a human-readable format.

The basic idea is very simple and boils down to a mechanism whereby
trace events, rather than being exhaustively dumped in raw form and
viewed directly, are automatically 'compressed' into meaningful tables
completely defined by the user.

This is done strictly via single-line command-line commands and
without the aid of any kind of programming language or interpreter.

A surprising number of typical use cases can be accomplished by users
via this simple mechanism. In fact, a large number of the tasks that
users typically do using the more complicated script-based tracing
tools, at least during the initial stages of an investigation, can be
accomplished by simply specifying a set of keys and values to be used
in the creation of a hash table.

The Linux kernel trace event subsystem happens to provide an extensive
list of keys and values ready-made for such a purpose in the form of
the event format files associated with each trace event. By simply
consulting the format file for field names of interest and by plugging
them into the hist trigger command, users can create an endless number
of useful aggregations to help with investigating various properties
of the system. See Documentation/trace/events.txt for examples.

hist triggers are implemented on top of the existing event trigger
infrastructure, and as such are consistent with the existing triggers
from a user's perspective as well.

The basic syntax follows the existing trigger syntax. Users start an
aggregation by writing a 'hist' trigger to the event of interest's
trigger file:

# echo hist:keys=xxx [ if filter] > event/trigger

Once a hist trigger has been set up, by default it continually
aggregates every matching event into a hash table using the event key
and a value field named 'hitcount'.

To view the aggregation at any point in time, simply read the 'hist'
file in the same directory as the 'trigger' file:

# cat event/hist

The detailed syntax provides additional options for user control, and
is described exhaustively in Documentation/trace/events.txt and in the
virtual tracing/README file in the tracing subsystem.

Link: http://lkml.kernel.org/r/72d263b5e1853fe9c314953b65833c3aa75479f2.1457029949.git.tom.zanussi@linux.intel.com

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>