History log of /linux-master/include/linux/uprobes.h
Revision Date Author Comments
# 1a59d1b8 27-May-2019 Thomas Gleixner <tglx@linutronix.de>

treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 156

Based on 1 normalized pattern(s):

this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license as published by
the free software foundation either version 2 of the license or at
your option any later version this program is distributed in the
hope that it will be useful but without any warranty without even
the implied warranty of merchantability or fitness for a particular
purpose see the gnu general public license for more details you
should have received a copy of the gnu general public license along
with this program if not write to the free software foundation inc
59 temple place suite 330 boston ma 02111 1307 usa

extracted by the scancode license scanner the SPDX license identifier

GPL-2.0-or-later

has been chosen to replace the boilerplate/reference in 1334 file(s).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Allison Randal <allison@lohutok.net>
Reviewed-by: Richard Fontana <rfontana@redhat.com>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190527070033.113240726@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# aad42dd4 26-Apr-2019 Nadav Amit <namit@vmware.com>

uprobes: Initialize uprobes earlier

In order to have a separate address space for text poking, we need to
duplicate init_mm early during start_kernel(). This, however, introduces
a problem since uprobes functions are called from dup_mmap(), but
uprobes is still not initialized in this early stage.

Since uprobes initialization is necassary for fork, and since all the
dependant initialization has been done when fork is initialized (percpu
and vmalloc), move uprobes initialization to fork_init(). It does not
seem uprobes introduces any security problem for the poking_mm.

Crash and burn if uprobes initialization fails, similarly to other early
initializations. Change the init_probes() name to probes_init() to match
other early initialization functions name convention.

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Nadav Amit <namit@vmware.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Rick Edgecombe <rick.p.edgecombe@intel.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: akpm@linux-foundation.org
Cc: ard.biesheuvel@linaro.org
Cc: deneen.t.dock@intel.com
Cc: kernel-hardening@lists.openwall.com
Cc: kristen@linux.intel.com
Cc: linux_dti@icloud.com
Cc: will.deacon@arm.com
Link: https://lkml.kernel.org/r/20190426232303.28381-6-nadav.amit@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 1cc33161 19-Aug-2018 Ravi Bangoria <ravi.bangoria@linux.ibm.com>

uprobes: Support SDT markers having reference count (semaphore)

Userspace Statically Defined Tracepoints[1] are dtrace style markers
inside userspace applications. Applications like PostgreSQL, MySQL,
Pthread, Perl, Python, Java, Ruby, Node.js, libvirt, QEMU, glib etc
have these markers embedded in them. These markers are added by developer
at important places in the code. Each marker source expands to a single
nop instruction in the compiled code but there may be additional
overhead for computing the marker arguments which expands to couple of
instructions. In case the overhead is more, execution of it can be
omitted by runtime if() condition when no one is tracing on the marker:

if (reference_counter > 0) {
Execute marker instructions;
}

Default value of reference counter is 0. Tracer has to increment the
reference counter before tracing on a marker and decrement it when
done with the tracing.

Implement the reference counter logic in core uprobe. User will be
able to use it from trace_uprobe as well as from kernel module. New
trace_uprobe definition with reference counter will now be:

<path>:<offset>[(ref_ctr_offset)]

where ref_ctr_offset is an optional field. For kernel module, new
variant of uprobe_register() has been introduced:

uprobe_register_refctr(inode, offset, ref_ctr_offset, consumer)

No new variant for uprobe_unregister() because it's assumed to have
only one reference counter for one uprobe.

[1] https://sourceware.org/systemtap/wiki/UserSpaceProbeImplementation

Note: 'reference counter' is called as 'semaphore' in original Dtrace
(or Systemtap, bcc and even in ELF) documentation and code. But the
term 'semaphore' is misleading in this context. This is just a counter
used to hold number of tracers tracing on a marker. This is not really
used for any synchronization. So we are calling it a 'reference counter'
in kernel / perf code.

Link: http://lkml.kernel.org/r/20180820044250.11659-2-ravi.bangoria@linux.ibm.com

Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
[Only trace_uprobe.c]
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Song Liu <songliubraving@fb.com>
Tested-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 6d43743e 08-Aug-2018 Ravi Bangoria <ravi.bangoria@linux.ibm.com>

Uprobe: Additional argument arch_uprobe to uprobe_write_opcode()

Add addition argument 'arch_uprobe' to uprobe_write_opcode().
We need this in later set of patches.

Link: http://lkml.kernel.org/r/20180809041856.1547-3-ravi.bangoria@linux.ibm.com

Reviewed-by: Song Liu <songliubraving@fb.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# e8f4aa60 12-Oct-2016 Allen Pais <allen.pais@oracle.com>

sparc64:Support User Probes for sparc

Signed-off-by: Eric Saint Etienne <eric.saint.etienne@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 90eec103 16-Nov-2015 Peter Zijlstra <peterz@infradead.org>

treewide: Remove old email address

There were still a number of references to my old Red Hat email
address in the kernel source. Remove these while keeping the
Red Hat copyright notices intact.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# db087ef6 21-Jul-2015 Oleg Nesterov <oleg@redhat.com>

uprobes/x86: Make arch_uretprobe_is_alive(RP_CHECK_CALL) more clever

The previous change documents that cleanup_return_instances()
can't always detect the dead frames, the stack can grow. But
there is one special case which imho worth fixing:
arch_uretprobe_is_alive() can return true when the stack didn't
actually grow, but the next "call" insn uses the already
invalidated frame.

Test-case:

#include <stdio.h>
#include <setjmp.h>

jmp_buf jmp;
int nr = 1024;

void func_2(void)
{
if (--nr == 0)
return;
longjmp(jmp, 1);
}

void func_1(void)
{
setjmp(jmp);
func_2();
}

int main(void)
{
func_1();
return 0;
}

If you ret-probe func_1() and func_2() prepare_uretprobe() hits
the MAX_URETPROBE_DEPTH limit and "return" from func_2() is not
reported.

When we know that the new call is not chained, we can do the
more strict check. In this case "sp" points to the new ret-addr,
so every frame which uses the same "sp" must be dead. The only
complication is that arch_uretprobe_is_alive() needs to know was
it chained or not, so we add the new RP_CHECK_CHAIN_CALL enum
and change prepare_uretprobe() to pass RP_CHECK_CALL only if
!chained.

Note: arch_uretprobe_is_alive() could also re-read *sp and check
if this word is still trampoline_vaddr. This could obviously
improve the logic, but I would like to avoid another
copy_from_user() especially in the case when we can't avoid the
false "alive == T" positives.

Tested-by: Pratyush Anand <panand@redhat.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Acked-by: Anton Arapov <arapov@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20150721134028.GA4786@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 86dcb702 21-Jul-2015 Oleg Nesterov <oleg@redhat.com>

uprobes: Add the "enum rp_check ctx" arg to arch_uretprobe_is_alive()

arch/x86 doesn't care (so far), but as Pratyush Anand pointed
out other architectures might want why arch_uretprobe_is_alive()
was called and use different checks depending on the context.
Add the new argument to distinguish 2 callers.

Tested-by: Pratyush Anand <panand@redhat.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Acked-by: Anton Arapov <arapov@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20150721134026.GA4779@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 7b868e48 21-Jul-2015 Oleg Nesterov <oleg@redhat.com>

uprobes/x86: Reimplement arch_uretprobe_is_alive()

Add the x86 specific version of arch_uretprobe_is_alive()
helper. It returns true if the stack frame mangled by
prepare_uretprobe() is still on stack. So if it returns false,
we know that the probed function has already returned.

We add the new return_instance->stack member and change the
generic code to initialize it in prepare_uretprobe, but it
should be equally useful for other architectures.

TODO: this assumes that the probed application can't use
multiple stacks (say sigaltstack). We will try to improve
this logic later.

Tested-by: Pratyush Anand <panand@redhat.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Acked-by: Anton Arapov <arapov@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20150721134018.GA4766@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 97da8976 21-Jul-2015 Oleg Nesterov <oleg@redhat.com>

uprobes: Export 'struct return_instance', introduce arch_uretprobe_is_alive()

Add the new "weak" helper, arch_uretprobe_is_alive(), used by
the next patches. It should return true if this return_instance
is still valid. The arch agnostic version just always returns
true.

The patch exports "struct return_instance" for the architectures
which want to override this hook. We can also cleanup
prepare_uretprobe() if we pass the new return_instance to
arch_uretprobe_hijack_return_addr().

Tested-by: Pratyush Anand <panand@redhat.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Acked-by: Anton Arapov <arapov@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20150721134016.GA4762@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 271a9c35 13-Oct-2014 Bjorn Helgaas <bhelgaas@google.com>

uprobes: Remove "weak" from function declarations

For the following interfaces:

set_swbp()
set_orig_insn()
is_swbp_insn()
is_trap_insn()
uprobe_get_swbp_addr()
arch_uprobe_ignore()
arch_uprobe_copy_ixol()

kernel/events/uprobes.c provides default definitions explicitly marked
"weak". Some architectures provide their own definitions intended to
override the defaults, but the "weak" attribute on the declarations applied
to the arch definitions as well, so the linker chose one based on link
order (see 10629d711ed7 ("PCI: Remove __weak annotation from
pcibios_get_phb_of_node decl")).

Remove the "weak" attribute from the declarations so we always prefer a
non-weak definition over the weak one, independent of link order.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
CC: Victor Kamensky <victor.kamensky@linaro.org>
CC: Oleg Nesterov <oleg@redhat.com>
CC: David A. Long <dave.long@linaro.org>
CC: Ananth N Mavinakayanahalli <ananth@in.ibm.com>


# 72e6ae28 28-Apr-2014 Victor Kamensky <victor.kamensky@linaro.org>

ARM: 8043/1: uprobes need icache flush after xol write

After instruction write into xol area, on ARM V7
architecture code need to flush dcache and icache to sync
them up for given set of addresses. Having just
'flush_dcache_page(page)' call is not enough - it is
possible to have stale instruction sitting in icache
for given xol area slot address.

Introduce arch_uprobe_ixol_copy weak function
that by default calls uprobes copy_to_page function and
than flush_dcache_page function and on ARM define new one
that handles xol slot copy in ARM specific way

flush_uprobe_xol_access function shares/reuses implementation
with/of flush_ptrace_access function and takes care of writing
instruction to user land address space on given variety of
different cache types on ARM CPUs. Because
flush_uprobe_xol_access does not have vma around
flush_ptrace_access was split into two parts. First that
retrieves set of condition from vma and common that receives
those conditions as flags.

Note ARM cache flush function need kernel address
through which instruction write happened, so instead
of using uprobes copy_to_page function changed
code to explicitly map page and do memcpy.

Note arch_uprobe_copy_ixol function, in similar way as
copy_to_user_page function, has preempt_disable/preempt_enable.

Signed-off-by: Victor Kamensky <victor.kamensky@linaro.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: David A. Long <dave.long@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>


# b02ef20a 12-May-2014 Oleg Nesterov <oleg@redhat.com>

uprobes/x86: Fix the wrong ->si_addr when xol triggers a trap

If the probed insn triggers a trap, ->si_addr = regs->ip is technically
correct, but this is not what the signal handler wants; we need to pass
the address of the probed insn, not the address of xol slot.

Add the new arch-agnostic helper, uprobe_get_trap_addr(), and change
fill_trap_info() and math_error() to use it. !CONFIG_UPROBES case in
uprobes.h uses a macro to avoid include hell and ensure that it can be
compiled even if an architecture doesn't define instruction_pointer().

Test-case:

#include <signal.h>
#include <stdio.h>
#include <unistd.h>

extern void probe_div(void);

void sigh(int sig, siginfo_t *info, void *c)
{
int passed = (info->si_addr == probe_div);
printf(passed ? "PASS\n" : "FAIL\n");
_exit(!passed);
}

int main(void)
{
struct sigaction sa = {
.sa_sigaction = sigh,
.sa_flags = SA_SIGINFO,
};

sigaction(SIGFPE, &sa, NULL);

asm (
"xor %ecx,%ecx\n"
".globl probe_div; probe_div:\n"
"idiv %ecx\n"
);

return 0;
}

it fails if probe_div() is probed.

Note: show_unhandled_signals users should probably use this helper too,
but we need to cleanup them first.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>


# 6fe50a28 03-Feb-2014 David A. Long <dave.long@linaro.org>

uprobes: allow ignoring of probe hits

Allow arches to decided to ignore a probe hit. ARM will use this to
only call handlers if the conditions to execute a conditionally executed
instruction are satisfied.

Signed-off-by: David A. Long <dave.long@linaro.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>


# c912dae6 09-Nov-2013 Oleg Nesterov <oleg@redhat.com>

uprobes: Cleanup !CONFIG_UPROBES decls, unexport xol_area

1. Don't include asm/uprobes.h unconditionally, we only need
it if CONFIG_UPROBES.

2. Move the definition of "struct xol_area" into uprobes.c.

Perhaps we should simply kill struct uprobes_state, it buys
nothing.

3. Kill the dummy definition of uprobe_get_swbp_addr(), nobody
except handle_swbp() needs it.

4. Purely cosmetic, but move the decl of uprobe_get_swbp_addr()
up, close to other __weak helpers.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>


# 32473431 08-Nov-2013 Oleg Nesterov <oleg@redhat.com>

uprobes: Add uprobe_task->dup_xol_work/dup_xol_addr

uprobe_task->vaddr is a bit strange. The generic code uses it only
to pass the additional argument to arch_uprobe_pre_xol(), and since
it is always equal to instruction_pointer() this looks even more
strange.

And both utask->vaddr and and utask->autask have the same scope,
they only have the meaning when the task executes the probed insn
out-of-line, so it is safe to reuse both in UTASK_RUNNING state.

This all means that logically ->vaddr belongs to arch_uprobe_task
and we should probably move it there, arch_uprobe_pre_xol() can
record instruction_pointer() itself.

OTOH, it is also used by uprobe_copy_process() and dup_xol_work()
for another purpose, this doesn't look clean and doesn't allow to
move this member into arch_uprobe_task.

This patch adds the union with 2 anonymous structs into uprobe_task.

The first struct is autask + vaddr, this way we "almost" move vaddr
into autask.

The second struct has 2 new members for uprobe_copy_process() paths:
->dup_xol_addr which can be used instead ->vaddr, and ->dup_xol_work
which can be used to avoid kmalloc() and simplify the code.

Note that this union will likely have another member(s), we need
something like "private_data_for_handlers" so that the tracing
handlers could use it to communicate with call_fetch() methods.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>


# f72d41fa 05-Nov-2013 Oleg Nesterov <oleg@redhat.com>

uprobes: Export write_opcode() as uprobe_write_opcode()

set_swbp() and set_orig_insn() are __weak, but this is pointless
because write_opcode() is static.

Export write_opcode() as uprobe_write_opcode() for the upcoming
arm port, this way it can actually override set_swbp() and use
__opcode_to_mem_arm(bpinsn) instead if UPROBE_SWBP_INSN.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>


# 3820b4d2 15-Oct-2013 David A. Long <dave.long@linaro.org>

uprobes: Move function declarations out of arch

Move the function declarations from the arch headers to the common
header, since only the function bodies are architecture-specific.
These changes are from Vincent Rabin's uprobes patch.

[ oleg: update arch/powerpc/include/asm/uprobes.h ]

Signed-off-by: Rabin Vincent <rabin@rab.in>
Signed-off-by: David A. Long <dave.long@linaro.org>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>


# 3ab67966 16-Oct-2013 Oleg Nesterov <oleg@redhat.com>

uprobes: Teach uprobe_copy_process() to handle CLONE_VFORK

uprobe_copy_process() does nothing if the child shares ->mm with
the forking process, but there is a special case: CLONE_VFORK.
In this case it would be more correct to do dup_utask() but avoid
dup_xol(). This is not that important, the child should not unwind
its stack too much, this can corrupt the parent's stack, but at
least we need this to allow to ret-probe __vfork() itself.

Note: in theory, it would be better to check task_pt_regs(p)->sp
instead of CLONE_VFORK, we need to dup_utask() if and only if the
child can return from the function called by the parent. But this
needs the arch-dependant helper, and I think that nobody actually
does clone(same_stack, CLONE_VM).

Reported-by: Martin Cermak <mcermak@redhat.com>
Reported-by: David Smith <dsmith@redhat.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>


# c2d3f25d 09-Oct-2013 Ralf Baechle <ralf@linux-mips.org>

uprobes: Remove the wrong __weak attribute

linux/uprobes.h declares arch_uprobe_skip_sstep() as a weak function.
But as there is no definition of generic version so when trying to build
uprobes for an architecture that doesn't yet have a arch_uprobe_skip_sstep()
implementation, the vmlinux will try to call arch_uprobe_skip_sstep()
somehwere in Stupidhistan leading to a system crash. We rather want a
proper link error so remove arch_uprobe_skip_sstep().

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>


# ded49c55 03-Apr-2013 Anton Arapov <anton@redhat.com>

uretprobes: Limit the depth of return probe nestedness

Unlike the kretprobes we can't trust userspace, thus must have
protection from user space attacks. User-space have "unlimited"
stack, and this patch limits the return probes nestedness as a
simple remedy for it.

Note that this implementation leaks return_instance on siglongjmp
until exit()/exec().

The intention is to have KISS and bare minimum solution for the
initial implementation in order to not complicate the uretprobes
code.

In the future we may come up with more sophisticated solution that
remove this depth limitation. It is not easy task and lays beyond
this patchset.

Signed-off-by: Anton Arapov <anton@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>


# 0dfd0eb8 03-Apr-2013 Anton Arapov <anton@redhat.com>

uretprobes: Return probe entry, prepare_uretprobe()

When a uprobe with return probe consumer is hit, prepare_uretprobe()
function is invoked. It creates return_instance, hijacks return address
and replaces it with the trampoline.

* Return instances are kept as stack per uprobed task.
* Return instance is chained, when the original return address is
trampoline's page vaddr (e.g. recursive call of the probed function).

Signed-off-by: Anton Arapov <anton@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>


# ea024870 03-Apr-2013 Anton Arapov <anton@redhat.com>

uretprobes: Introduce uprobe_consumer->ret_handler()

Enclose return probes implementation, introduce ->ret_handler() and update
existing code to rely on ->handler() *and* ->ret_handler() for uprobe and
uretprobe respectively.

Signed-off-by: Anton Arapov <anton@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>


# 0908ad6e 22-Mar-2013 Ananth N Mavinakayanahalli <ananth@in.ibm.com>

uprobes: Add trap variant helper

Some architectures like powerpc have multiple variants of the trap
instruction. Introduce an additional helper is_trap_insn() for run-time
handling of non-uprobe traps on such architectures.

While there, change is_swbp_at_addr() to is_trap_at_addr() for reading
clarity.

With this change, the uprobe registration path will supercede any trap
instruction inserted at the requested location, while taking care of
delivering the SIGTRAP for cases where the trap notification came in
for an address without a uprobe. See [1] for a more detailed explanation.

[1] https://lists.ozlabs.org/pipermail/linuxppc-dev/2013-March/104771.html

This change was suggested by Oleg Nesterov.

Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>


# bdf8647c 03-Feb-2013 Oleg Nesterov <oleg@redhat.com>

uprobes: Introduce uprobe_apply()

Currently it is not possible to change the filtering constraints after
uprobe_register(), so a consumer can not, say, start to trace a task/mm
which was previously filtered out, or remove the no longer needed bp's.

Introduce uprobe_apply() which simply does register_for_each_vma() again
to consult uprobe_consumer->filter() and install/remove the breakpoints.
The only complication is that register_for_each_vma() can no longer
assume that uprobe->consumers should be consulter if is_register == T,
so we change it to accept "struct uprobe_consumer *new" instead.

Unlike uprobe_register(), uprobe_apply(true) doesn't do "unregister" if
register_for_each_vma() fails, it is up to caller to handle the error.

Note: we probably need to cleanup the current interface, it is strange
that uprobe_apply/unregister need inode/offset. We should either change
uprobe_register() to return "struct uprobe *", or add a private ->uprobe
member in uprobe_consumer. And in the long term uprobe_apply() should
take a single argument, uprobe or consumer, even "bool add" should go
away.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>


# da1816b1 29-Dec-2012 Oleg Nesterov <oleg@redhat.com>

uprobes: Teach handler_chain() to filter out the probed task

Currrently the are 2 problems with pre-filtering:

1. It is not possible to add/remove a task (mm) after uprobe_register()

2. A forked child inherits all breakpoints and uprobe_consumer can not
control this.

This patch does the first step to improve the filtering. handler_chain()
removes the breakpoints installed by this uprobe from current->mm if all
handlers return UPROBE_HANDLER_REMOVE.

Note that handler_chain() relies on ->register_rwsem to avoid the race
with uprobe_register/unregister which can add/del a consumer, or even
remove and then insert the new uprobe at the same address.

Perhaps we will add uprobe_apply_mm(uprobe, mm, is_register) and teach
copy_mm() to do filter(UPROBE_FILTER_FORK), but I think this change makes
sense anyway.

Note: instead of checking the retcode from uc->handler, we could add
uc->filter(UPROBE_FILTER_BPHIT). But I think this is not optimal to
call 2 hooks in a row. This buys nothing, and if handler/filter do
something nontrivial they will probably do the same work twice.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>


# 8a7f2fa0 28-Dec-2012 Oleg Nesterov <oleg@redhat.com>

uprobes: Reintroduce uprobe_consumer->filter()

Finally add uprobe_consumer->filter() and change consumer_filter()
to actually call this method.

Note that ->filter() accepts mm_struct, not task_struct. Because:

1. We do not have for_each_mm_user(mm, task).

2. Even if we implement for_each_mm_user(), ->filter() can
use it itself.

3. It is not clear who will actually need this interface to
do the "nontrivial" filtering.

Another argument is "enum uprobe_filter_ctx", consumer->filter() can
use it to figure out why/where it was called. For example, perhaps
we can add UPROBE_FILTER_PRE_REGISTER used by build_map_info() to
quickly "nack" the unwanted mm's. In this case consumer should know
that it is called under ->i_mmap_mutex.

See the previous discussion at http://marc.info/?t=135214229700002
Perhaps we should pass more arguments, vma/vaddr?

Note: this patch obviously can't help to filter out the child created
by fork(), this will be addressed later.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>


# fe20d71f 21-Nov-2012 Oleg Nesterov <oleg@redhat.com>

uprobes: Kill uprobe_consumer->filter()

uprobe_consumer->filter() is pointless in its current form, kill it.

We will add it back, but with the different signature/semantics. Perhaps
we will even re-introduce the callsite in handler_chain(), but not to
just skip uc->handler().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>


# 32cdba1e 14-Nov-2012 Oleg Nesterov <oleg@redhat.com>

uprobes: Use percpu_rw_semaphore to fix register/unregister vs dup_mmap() race

This was always racy, but 268720903f87e0b84b161626c4447b81671b5d18
"uprobes: Rework register_for_each_vma() to make it O(n)" should be
blamed anyway, it made everything worse and I didn't notice.

register/unregister call build_map_info() and then do install/remove
breakpoint for every mm which mmaps inode/offset. This can obviously
race with fork()->dup_mmap() in between and we can miss the child.

uprobe_register() could be easily fixed but unregister is much worse,
the new mm inherits "int3" from parent and there is no way to detect
this if uprobe goes away.

So this patch simply adds percpu_down_read/up_read around dup_mmap(),
and percpu_down_write/up_write into register_for_each_vma().

This adds 2 new hooks into dup_mmap() but we can kill uprobe_dup_mmap()
and fold it into uprobe_end_dup_mmap().

Reported-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>


# 19f5ee27 28-Oct-2012 Oleg Nesterov <oleg@redhat.com>

uprobes: Kill arch_uprobe_enable/disable_step() hooks

Kill arch_uprobe_enable/disable_step() hooks, they do nothing and
nobody needs them.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>


# cb9a19fe 30-Sep-2012 Oleg Nesterov <oleg@redhat.com>

uprobes: Introduce prepare_uprobe()

Preparation. Extract the copy_insn/arch_uprobe_analyze_insn code
from install_breakpoint() into the new helper, prepare_uprobe().

And move uprobe->flags defines from uprobes.h to uprobes.c, nobody
else can use them anyway.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>


# 1b08e907 14-Sep-2012 Oleg Nesterov <oleg@redhat.com>

uprobes: Kill UTASK_BP_HIT state

Kill UTASK_BP_HIT state, it buys nothing but complicates the code.
It is only used in uprobe_notify_resume() to decide who should be
called, we can check utask->active_uprobe != NULL instead. And this
allows us to simplify handle_swbp(), no need to clear utask->state.

Likewise we could kill UTASK_SSTEP, but UTASK_BP_HIT is worse and
imho should die. The problem is, it creates the special case when
task->utask is NULL, we can't distinguish RUNNING and BP_HIT. With
this patch utask == NULL always means RUNNING.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>


# 9d778782 07-Aug-2012 Sebastian Andrzej Siewior <bigeasy@linutronix.de>

uprobes: Introduce arch_uprobe_enable/disable_step()

As Oleg pointed out in [0] uprobe should not use the ptrace interface
for enabling/disabling single stepping.

[0] http://lkml.kernel.org/r/20120730141638.GA5306@redhat.com

Add the new "__weak arch" helpers which simply call user_*_single_step()
as a preparation. This is only needed to not break the powerpc port, we
will fold this logic into arch_uprobe_pre/post_xol() hooks later.

We should also change handle_singlestep(), _disable_step(&uprobe->arch)
should be called before put_uprobe().

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>


# ded86e7c 08-Aug-2012 Oleg Nesterov <oleg@redhat.com>

uprobes: Remove "verify" argument from set_orig_insn()

Nobody does set_orig_insn(verify => false), and I think nobody will.
Remove this argument. IIUC set_orig_insn(verify => false) was needed
to single-step without xol area.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>


# 61559a81 08-Aug-2012 Oleg Nesterov <oleg@redhat.com>

uprobes: Fold uprobe_reset_state() into uprobe_dup_mmap()

Now that we have uprobe_dup_mmap() we can fold uprobe_reset_state()
into the new hook and remove it. mmput()->uprobe_clear_state() can't
be called before dup_mmap().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>


# f8ac4ec9 08-Aug-2012 Oleg Nesterov <oleg@redhat.com>

uprobes: Introduce MMF_HAS_UPROBES

Add the new MMF_HAS_UPROBES flag. It is set by install_breakpoint()
and it is copied by dup_mmap(), uprobe_pre_sstep_notifier() checks
it to avoid the slow path if the task was never probed. Perhaps it
makes sense to check it in valid_vma(is_register => false) as well.

This needs the new dup_mmap()->uprobe_dup_mmap() hook. We can't use
uprobe_reset_state() or put MMF_HAS_UPROBES into MMF_INIT_MASK, we
need oldmm->mmap_sem to avoid the race with uprobe_register() or
mmap() from another thread.

Currently we never clear this bit, it can be false-positive after
uprobe_unregister() or uprobe_munmap() or if dup_mmap() hits the
probed VM_DONTCOPY vma. But this is fine correctness-wise and has
no effect unless the task hits the non-uprobe breakpoint.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>


# 647c42df 06-Aug-2012 Oleg Nesterov <oleg@redhat.com>

uprobes: Kill uprobes_state->count

uprobes_state->count is only needed to avoid the slow path in
uprobe_pre_sstep_notifier(). It is also checked in uprobe_munmap()
but ironically its only goal to decrement this counter. However,
it is very broken. Just some examples:

- uprobe_mmap() can race with uprobe_unregister() and wrongly
increment the counter if it hits the non-uprobe "int3". Note
that install_breakpoint() checks ->consumers first and returns
-EEXIST if it is NULL.

"atomic_sub() if error" in uprobe_mmap() looks obviously wrong
too.

- uprobe_munmap() can race with uprobe_register() and wrongly
decrement the counter by the same reason.

- Suppose an appication tries to increase the mmapped area via
sys_mremap(). vma_adjust() does uprobe_munmap(whole_vma) first,
this can nullify the counter temporarily and race with another
thread which can hit the bp, the application will be killed by
SIGTRAP.

- Suppose an application mmaps 2 consecutive areas in the same file
and one (or both) of these areas has uprobes. In the likely case
mmap_region()->vma_merge() suceeds. Like above, this leads to
uprobe_munmap/uprobe_mmap from vma_merge()->vma_adjust() but then
mmap_region() does another uprobe_mmap(resulting_vma) and doubles
the counter.

This patch only removes this counter and fixes the compile errors,
then we will try to cleanup the changed code and add something else
instead.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>


# cbc91f71 11-Apr-2012 Srikar Dronamraju <srikar@linux.vnet.ibm.com>

uprobes/core: Decrement uprobe count before the pages are unmapped

Uprobes has a callback (uprobe_munmap()) in the unmap path to
maintain the uprobes count.

In the exit path this callback gets called in unlink_file_vma().
However by the time unlink_file_vma() is called, the pages would
have been unmapped (in unmap_vmas()) and the task->rss_stat counts
accounted (in zap_pte_range()).

If the exiting process has probepoints, uprobe_munmap() checks if
the breakpoint instruction was around before decrementing the probe
count.

This results in a file backed page being reread by uprobe_munmap()
and hence it does not find the breakpoint.

This patch fixes this problem by moving the callback to
unmap_single_vma(). Since unmap_single_vma() may not unmap the
complete vma, add start and end parameters to uprobe_munmap().

This bug became apparent courtesy of commit c3f0327f8e9d
("mm: add rss counters consistency check").

Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Jim Keniston <jkenisto@linux.vnet.ibm.com>
Cc: Linux-mm <linux-mm@kvack.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Anton Arapov <anton@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20120411103527.23245.9835.sendpatchset@srdronam.in.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 682968e0 30-Mar-2012 Srikar Dronamraju <srikar@linux.vnet.ibm.com>

uprobes/core: Optimize probe hits with the help of a counter

Maintain a per-mm counter: number of uprobes that are inserted
on this process address space.

This counter can be used at probe hit time to determine if we
need a lookup in the uprobes rbtree. Everytime a probe gets
inserted successfully, the probe count is incremented and
everytime a probe gets removed, the probe count is decremented.

The new uprobe_munmap hook ensures the count is correct on a
unmap or remap of a region. We expect that once a
uprobe_munmap() is called, the vma goes away. So
uprobe_unregister() finding a probe to unregister would either
mean unmap event hasnt occurred yet or a mmap event on the same
executable file occured after a unmap event.

Additionally, uprobe_mmap hook now also gets called:

a. on every executable vma that is COWed at fork.
b. a vma of interest is newly mapped; breakpoint insertion also
happens at the required address.

On process creation, make sure the probes count in the child is
set correctly.

Special cases that are taken care include:

a. mremap
b. VM_DONTCOPY vmas on fork()
c. insertion/removal races in the parent during fork().

Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Jim Keniston <jkenisto@linux.vnet.ibm.com>
Cc: Linux-mm <linux-mm@kvack.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Anton Arapov <anton@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20120330182646.10018.85805.sendpatchset@srdronam.in.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# d4b3b638 30-Mar-2012 Srikar Dronamraju <srikar@linux.vnet.ibm.com>

uprobes/core: Allocate XOL slots for uprobes use

Uprobes executes the original instruction at a probed location
out of line. For this, we allocate a page (per mm) upon the
first uprobe hit, in the process user address space, divide it
into slots that are used to store the actual instructions to be
singlestepped. These slots are known as xol (execution out of
line) slots.

Care is taken to ensure that the allocation is in an unmapped
area as close to the top of the user address space as possible,
with appropriate permission settings to keep selinux like
frameworks happy.

Upon a uprobe hit, a free slot is acquired, and is released
after the singlestep completes.

Lots of improvements courtesy suggestions/inputs from Peter and
Oleg.

[ Folded a fix for build issue on powerpc fixed and reported by
Stephen Rothwell. ]

Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Jim Keniston <jkenisto@linux.vnet.ibm.com>
Cc: Linux-mm <linux-mm@kvack.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Anton Arapov <anton@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20120330182631.10018.48175.sendpatchset@srdronam.in.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 0326f5a9 13-Mar-2012 Srikar Dronamraju <srikar@linux.vnet.ibm.com>

uprobes/core: Handle breakpoint and singlestep exceptions

Uprobes uses exception notifiers to get to know if a thread hit
a breakpoint or a singlestep exception.

When a thread hits a uprobe or is singlestepping post a uprobe
hit, the uprobe exception notifier sets its TIF_UPROBE bit,
which will then be checked on its return to userspace path
(do_notify_resume() ->uprobe_notify_resume()), where the
consumers handlers are run (in task context) based on the
defined filters.

Uprobe hits are thread specific and hence we need to maintain
information about if a task hit a uprobe, what uprobe was hit,
the slot where the original instruction was copied for xol so
that it can be singlestepped with appropriate fixups.

In some cases, special care is needed for instructions that are
executed out of line (xol). These are architecture specific
artefacts, such as handling RIP relative instructions on x86_64.

Since the instruction at which the uprobe was inserted is
executed out of line, architecture specific fixups are added so
that the thread continues normal execution in the presence of a
uprobe.

Postpone the signals until we execute the probed insn.
post_xol() path does a recalc_sigpending() before return to
user-mode, this ensures the signal can't be lost.

Uprobes relies on DIE_DEBUG notification to notify if a
singlestep is complete.

Adds x86 specific uprobe exception notifiers and appropriate
hooks needed to determine a uprobe hit and subsequent post
processing.

Add requisite x86 fixups for xol for uprobes. Specific cases
needing fixups include relative jumps (x86_64), calls, etc.

Where possible, we check and skip singlestepping the
breakpointed instructions. For now we skip single byte as well
as few multibyte nop instructions. However this can be extended
to other instructions too.

Credits to Oleg Nesterov for suggestions/patches related to
signal, breakpoint, singlestep handling code.

Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Jim Keniston <jkenisto@linux.vnet.ibm.com>
Cc: Linux-mm <linux-mm@kvack.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20120313180011.29771.89027.sendpatchset@srdronam.in.ibm.com
[ Performed various cleanliness edits ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 5cb4ac3a 12-Mar-2012 Srikar Dronamraju <srikar@linux.vnet.ibm.com>

uprobes/core: Rename bkpt to swbp

bkpt doesnt seem to be a correct abbrevation for breakpoint.
Choice was between bp and breakpoint. Since bp can refer to
things other than breakpoint, use swbp to refer to breakpoints.

This is pure cleanup, no functional change intended.

Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Jim Keniston <jkenisto@linux.vnet.ibm.com>
Cc: Linux-mm <linux-mm@kvack.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20120312092545.5379.91251.sendpatchset@srdronam.in.ibm.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# e3343e6a 12-Mar-2012 Srikar Dronamraju <srikar@linux.vnet.ibm.com>

uprobes/core: Make order of function parameters consistent across functions

If a function takes struct uprobe or struct arch_uprobe, then it
is passed as the first parameter.

This is pure cleanup, no functional change intended.

Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Jim Keniston <jkenisto@linux.vnet.ibm.com>
Cc: Linux-mm <linux-mm@kvack.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20120312092530.5379.18394.sendpatchset@srdronam.in.ibm.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 900771a4 12-Mar-2012 Srikar Dronamraju <srikar@linux.vnet.ibm.com>

uprobes/core: Make macro names consistent

Rename macros that refer to individual uprobe to start with
UPROBE_ instead of UPROBES_.

This is pure cleanup, no functional change intended.

Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Jim Keniston <jkenisto@linux.vnet.ibm.com>
Cc: Linux-mm <linux-mm@kvack.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20120312092514.5379.36595.sendpatchset@srdronam.in.ibm.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 35aa621b 22-Feb-2012 Ingo Molnar <mingo@elte.hu>

uprobes: Update copyright notices

Add Peter Zijlstra's copyright to the uprobes code, whose
contributions to the uprobes code are not visible in the Git
history, because they were backmerged.

Also update existing copyright notices to the year 2012.

Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Jim Keniston <jkenisto@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/n/tip-vjqxst502pc1efz7ah8cyht4@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 3ff54efd 22-Feb-2012 Srikar Dronamraju <srikar@linux.vnet.ibm.com>

uprobes/core: Move insn to arch specific structure

Few cleanups suggested by Ingo Molnar.

- Rename struct uprobe_arch_info to struct arch_uprobe.
- Move insn from struct uprobe to struct arch_uprobe.
- Make arch specific uprobe functions to accept struct arch_uprobe
instead of struct uprobe.
- Move struct uprobe to kernel/uprobes.c from include/linux/uprobes.h

Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Anton Arapov <anton@redhat.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Jim Keniston <jkenisto@linux.vnet.ibm.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Josh Stone <jistone@redhat.com>
Link: http://lkml.kernel.org/r/20120222091602.15880.40249.sendpatchset@srdronam.in.ibm.com
[ Made various small improvements ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 96379f60 22-Feb-2012 Srikar Dronamraju <srikar@linux.vnet.ibm.com>

uprobes/core: Remove uprobe_opcode_sz

uprobe_opcode_sz refers to the smallest instruction size for
that architecture. UPROBES_BKPT_INSN_SIZE refers to the size of
the breakpoint instruction for that architecture.

For now we are assuming that both uprobe_opcode_sz and
UPROBES_BKPT_INSN_SIZE are the same for all archs and hence
removing uprobe_opcode_sz in favour of UPROBES_BKPT_INSN_SIZE.

However if we have to support architectures where the smallest
instruction size is different from the size of breakpoint
instruction, we may have to re-introduce uprobe_opcode_sz.

Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Anton Arapov <anton@redhat.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Jim Keniston <jkenisto@linux.vnet.ibm.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Josh Stone <jistone@redhat.com>
Link: http://lkml.kernel.org/r/20120222091549.15880.67020.sendpatchset@srdronam.in.ibm.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 7b2d81d4 17-Feb-2012 Ingo Molnar <mingo@elte.hu>

uprobes/core: Clean up, refactor and improve the code

Make the uprobes code readable to me:

- improve the Kconfig text so that a mere mortal gets some idea
what CONFIG_UPROBES=y is really about

- do trivial renames to standardize around the uprobes_*() namespace

- clean up and simplify various code flow details

- separate basic blocks of functionality

- line break artifact and white space related removal

- use standard local varible definition blocks

- use vertical spacing to make things more readable

- remove unnecessary volatile

- restructure comment blocks to make them more uniform and
more readable in general

Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Jim Keniston <jkenisto@us.ibm.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Anton Arapov <anton@redhat.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Link: http://lkml.kernel.org/n/tip-ewbwhb8o6navvllsauu7k07p@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 2b144498 09-Feb-2012 Srikar Dronamraju <srikar@linux.vnet.ibm.com>

uprobes, mm, x86: Add the ability to install and remove uprobes breakpoints

Add uprobes support to the core kernel, with x86 support.

This commit adds the kernel facilities, the actual uprobes
user-space ABI and perf probe support comes in later commits.

General design:

Uprobes are maintained in an rb-tree indexed by inode and offset
(the offset here is from the start of the mapping). For a unique
(inode, offset) tuple, there can be at most one uprobe in the
rb-tree.

Since the (inode, offset) tuple identifies a unique uprobe, more
than one user may be interested in the same uprobe. This provides
the ability to connect multiple 'consumers' to the same uprobe.

Each consumer defines a handler and a filter (optional). The
'handler' is run every time the uprobe is hit, if it matches the
'filter' criteria.

The first consumer of a uprobe causes the breakpoint to be
inserted at the specified address and subsequent consumers are
appended to this list. On subsequent probes, the consumer gets
appended to the existing list of consumers. The breakpoint is
removed when the last consumer unregisters. For all other
unregisterations, the consumer is removed from the list of
consumers.

Given a inode, we get a list of the mms that have mapped the
inode. Do the actual registration if mm maps the page where a
probe needs to be inserted/removed.

We use a temporary list to walk through the vmas that map the
inode.

- The number of maps that map the inode, is not known before we
walk the rmap and keeps changing.
- extending vm_area_struct wasn't recommended, it's a
size-critical data structure.
- There can be more than one maps of the inode in the same mm.

We add callbacks to the mmap methods to keep an eye on text vmas
that are of interest to uprobes. When a vma of interest is mapped,
we insert the breakpoint at the right address.

Uprobe works by replacing the instruction at the address defined
by (inode, offset) with the arch specific breakpoint
instruction. We save a copy of the original instruction at the
uprobed address.

This is needed for:

a. executing the instruction out-of-line (xol).
b. instruction analysis for any subsequent fixups.
c. restoring the instruction back when the uprobe is unregistered.

We insert or delete a breakpoint instruction, and this
breakpoint instruction is assumed to be the smallest instruction
available on the platform. For fixed size instruction platforms
this is trivially true, for variable size instruction platforms
the breakpoint instruction is typically the smallest (often a
single byte).

Writing the instruction is done by COWing the page and changing
the instruction during the copy, this even though most platforms
allow atomic writes of the breakpoint instruction. This also
mirrors the behaviour of a ptrace() memory write to a PRIVATE
file map.

The core worker is derived from KSM's replace_page() logic.

In essence, similar to KSM:

a. allocate a new page and copy over contents of the page that
has the uprobed vaddr
b. modify the copy and insert the breakpoint at the required
address
c. switch the original page with the copy containing the
breakpoint
d. flush page tables.

replace_page() is being replicated here because of some minor
changes in the type of pages and also because Hugh Dickins had
plans to improve replace_page() for KSM specific work.

Instruction analysis on x86 is based on instruction decoder and
determines if an instruction can be probed and determines the
necessary fixups after singlestep. Instruction analysis is done
at probe insertion time so that we avoid having to repeat the
same analysis every time a probe is hit.

A lot of code here is due to the improvement/suggestions/inputs
from Peter Zijlstra.

Changelog:

(v10):
- Add code to clear REX.B prefix as suggested by Denys Vlasenko
and Masami Hiramatsu.

(v9):
- Use insn_offset_modrm as suggested by Masami Hiramatsu.

(v7):

Handle comments from Peter Zijlstra:

- Dont take reference to inode. (expect inode to uprobe_register to be sane).
- Use PTR_ERR to set the return value.
- No need to take reference to inode.
- use PTR_ERR to return error value.
- register and uprobe_unregister share code.

(v5):

- Modified del_consumer as per comments from Peter.
- Drop reference to inode before dropping reference to uprobe.
- Use i_size_read(inode) instead of inode->i_size.
- Ensure uprobe->consumers is NULL, before __uprobe_unregister() is called.
- Includes errno.h as recommended by Stephen Rothwell to fix a build issue
on sparc defconfig
- Remove restrictions while unregistering.
- Earlier code leaked inode references under some conditions while
registering/unregistering.
- Continue the vma-rmap walk even if the intermediate vma doesnt
meet the requirements.
- Validate the vma found by find_vma before inserting/removing the
breakpoint
- Call del_consumer under mutex_lock.
- Use hash locks.
- Handle mremap.
- Introduce find_least_offset_node() instead of close match logic in
find_uprobe
- Uprobes no more depends on MM_OWNER; No reference to task_structs
while inserting/removing a probe.
- Uses read_mapping_page instead of grab_cache_page so that the pages
have valid content.
- pass NULL to get_user_pages for the task parameter.
- call SetPageUptodate on the new page allocated in write_opcode.
- fix leaking a reference to the new page under certain conditions.
- Include Instruction Decoder if Uprobes gets defined.
- Remove const attributes for instruction prefix arrays.
- Uses mm_context to know if the application is 32 bit.

Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Also-written-by: Jim Keniston <jkenisto@us.ibm.com>
Reviewed-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Roland McGrath <roland@hack.frob.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Anton Arapov <anton@redhat.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Denys Vlasenko <vda.linux@googlemail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linux-mm <linux-mm@kvack.org>
Link: http://lkml.kernel.org/r/20120209092642.GE16600@linux.vnet.ibm.com
[ Made various small edits to the commit log ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>