History log of /freebsd-current/sys/kern/subr_syscall.c
Revision Date Author Comments
# 05296a0f 06-Apr-2024 Jake Freeland <jfree@FreeBSD.org>

ktrace: Record syscall violations with KTR_CAPFAIL

Report syscalls that are not allowed in capability mode with
CAPFAIL_SYSCALL.

Reviewed by: markj
Approved by: markj (mentor)
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D40678


# 29363fb4 23-Nov-2023 Warner Losh <imp@FreeBSD.org>

sys: Remove ancient SCCS tags.

Remove ancient SCCS tags from the tree, automated scripting, with two
minor fixup to keep things compiling. All the common forms in the tree
were removed with a perl script.

Sponsored by: Netflix


# 39024a89 25-Sep-2023 Konstantin Belousov <kib@FreeBSD.org>

syscalls: fix missing SIGSYS for several ENOSYS errors

In particular, when the syscall number is too large, or when syscall is
dynamic. For that, add nosys_sysent structure to pass fake sysent to
syscall top code.

Reviewed by: dchagin, markj
Discussed with: jhb
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D41976


# 685dc743 16-Aug-2023 Warner Losh <imp@FreeBSD.org>

sys: Remove $FreeBSD$: one-line .c pattern

Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/


# f0592b3c 30-Nov-2022 Konstantin Belousov <kib@FreeBSD.org>

Add a thread debugging flag TDB_BOUNDARY

It indicates to a debugger that the thread is stopped at the
kernel->user exit path.

Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D37590


# b53133a7 12-Feb-2022 Mateusz Guzik <mjg@FreeBSD.org>

proc: load/store p_cowgen using atomic primitives


# 626d6992 26-Dec-2021 Edward Tomasz Napierala <trasz@FreeBSD.org>

Move fork_rfppwait() check into ast()

This will always sleep at least once, so it's a slow path by definition.

Reviewed By: kib
Sponsored By: EPSRC
Differential Revision: https://reviews.freebsd.org/D33387


# 8bbc0600 30-Oct-2021 Edward Tomasz Napierala <trasz@FreeBSD.org>

linux: Add additional ptracestop only if the debugger is Linux

In 6e66030c4c0, additional ptracestop was added in order
to implement PTRACE_EVENT_EXEC. Make it only apply to cases
where the debugger is a Linux processes; native FreeBSD
debuggers can trace Linux processes too, but they don't
expect that additonal ptracestop.

Fixes: 6e66030c4c0
Reported By: kib
Reviewed By: kib
Sponsored By: EPSRC
Differential Revision: https://reviews.freebsd.org/D32726


# 6e66030c 23-Oct-2021 Edward Tomasz Napierala <trasz@FreeBSD.org>

linux: implement PTRACE_EVENT_EXEC

This fixes strace(1) from Ubuntu Focal.

Reviewed By: jhb
Sponsored By: EPSRC
Differential Revision: https://reviews.freebsd.org/D32367


# a0558fe9 28-Apr-2021 Mateusz Guzik <mjg@FreeBSD.org>

Retire code added to support CloudABI

CloudABI was removed in cf0ee8738e31aa9e6fbf4dca4dac56d89226a71a


# cf98bc28 10-Jul-2021 David Chisnall <theraven@FreeBSD.org>

Pass the syscall number to capsicum permission-denied signals

The syscall number is stored in the same register as the syscall return
on amd64 (and possibly other architectures) and so it is impossible to
recover in the signal handler after the call has returned. This small
tweak delivers it in the `si_value` field of the signal, which is
sufficient to catch capability violations and emulate them with a call
to a more-privileged process in the signal handler.

This reapplies 3a522ba1bc852c3d4660a4fa32e4a94999d09a47 with a fix for
the static assertion failure on i386.

Approved by: markj (mentor)

Reviewed by: kib, bcr (manpages)

Differential Revision: https://reviews.freebsd.org/D29185


# d2b55828 10-Jul-2021 David Chisnall <theraven@FreeBSD.org>

Revert "Pass the syscall number to capsicum permission-denied signals"

This broke the i386 build.

This reverts commit 3a522ba1bc852c3d4660a4fa32e4a94999d09a47.


# 3a522ba1 10-Jul-2021 David Chisnall <theraven@FreeBSD.org>

Pass the syscall number to capsicum permission-denied signals

The syscall number is stored in the same register as the syscall return
on amd64 (and possibly other architectures) and so it is impossible to
recover in the signal handler after the call has returned. This small
tweak delivers it in the `si_value` field of the signal, which is
sufficient to catch capability violations and emulate them with a call
to a more-privileged process in the signal handler.

Approved by: markj (mentor)

Reviewed by: kib, bcr (manpages)

Differential Revision: https://reviews.freebsd.org/D29185


# 441eb16a 13-Nov-2020 Konstantin Belousov <kib@FreeBSD.org>

Allow some VOPs to return ERELOOKUP to indicate VFS operation restart at top level.

Restart syscalls and some sync operations when filesystem indicated
ERELOOKUP condition, mostly for VOPs operating on metdata. In
particular, lookup results cached in the inode/v_data is no longer
valid and needs recalculating. Right now this should be nop.

Assert that ERELOOKUP is catched everywhere and not returned to
userspace, by asserting that td_errno != ERELOOKUP on syscall return
path.

In collaboration with: pho
Reviewed by: mckusick (previous version), markj
Tested by: markj (syzkaller), pho
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D26136


# a1bd83fe 08-Nov-2020 Edward Tomasz Napierala <trasz@FreeBSD.org>

Move syscall_thread_{enter,exit}() into the slow path. This is only
needed for syscalls from unloadable modules.

Reviewed by: kib
MFC after: 2 weeks
Sponsored by: EPSRC
Differential Revision: https://reviews.freebsd.org/D26988


# da45ea6b 07-Nov-2020 Edward Tomasz Napierala <trasz@FreeBSD.org>

Move TDB_USERWR check under 'if (traced)'.

If we hadn't been traced in the first place when syscallenter()
started executing, we can ignore TDB_USERWR. TDB_USERWR can get set,
sure, but if it does, it's because the debugger raced with the syscall,
and it cannot depend on winning that race.

Reviewed by: kib
MFC after: 2 weeks
Sponsored by: EPSRC
Differential Revision: https://reviews.freebsd.org/D26585


# bdc0cb4e 28-Oct-2020 Edward Tomasz Napierala <trasz@FreeBSD.org>

Add local variable to store the sysent pointer. Just a cleanup,
no functional changes.

Reviewed by: kib (earlier version)
MFC after: 2 weeks
Sponsored by: EPSRC
Differential Revision: https://reviews.freebsd.org/D26977


# 275c821d 24-Oct-2020 Kyle Evans <kevans@FreeBSD.org>

audit: correct reporting of *execve(2) success

r326145 corrected do_execve() to return EJUSTRETURN upon success so that
important registers are not clobbered. This had the side effect of tapping
out 'failures' for all *execve(2) audit records, which is less than useful
for auditing purposes.

Audit exec returns earlier, where we can know for sure that EJUSTRETURN
translates to success. Note that this unsets TDP_AUDITREC as we commit the
audit record, so the usual audit in the syscall return path will do nothing.

PR: 249179
Reported by: Eirik Oeverby <ltning-freebsd anduin net>
Reviewed by: csjp, kib
MFC after: 1 week
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D26922


# 4c6f466c 01-Oct-2020 Edward Tomasz Napierala <trasz@FreeBSD.org>

Only clear TDP_NERRNO when needed, ie when it's previously been set.

Reviewed by: kib
Tested by: pho
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D26612


# 34098649 29-Sep-2020 Edward Tomasz Napierala <trasz@FreeBSD.org>

Use the 'traced' variable instead of comparing p->p_flag again.

Reviewed by: kib
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D26577


# 1e2521ff 27-Sep-2020 Edward Tomasz Napierala <trasz@FreeBSD.org>

Get rid of sa->narg. It serves no purpose; use sa->callp->sy_narg instead.

Reviewed by: kib
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D26458


# 59838c1a 01-Apr-2020 John Baldwin <jhb@FreeBSD.org>

Retire procfs-based process debugging.

Modern debuggers and process tracers use ptrace() rather than procfs
for debugging. ptrace() has a supserset of functionality available
via procfs and new debugging features are only added to ptrace().
While the two debugging services share some fields in struct proc,
they each use dedicated fields and separate code. This results in
extra complexity to support a feature that hasn't been enabled in the
default install for several years.

PR: 244939 (exp-run)
Reviewed by: kib, mjg (earlier version)
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D23837


# 46994ec2 28-Feb-2020 Mark Johnston <markj@FreeBSD.org>

Fix standalone builds of systrace.ko after r357912.

Sponsored by: The FreeBSD Foundation


# a113b17f 20-Feb-2020 Konstantin Belousov <kib@FreeBSD.org>

Do not read sigfastblock word on syscall entry.

On machines with SMAP, fueword executes two serializing instructions
which can be seen in microbenchmarks.

As a measure to restore microbenchmark numbers, only read the word on
the attempt to deliver signal in ast(). If the word is set, signal is
not delivered and word is kept, preventing interruption of
interruptible sleeps by signals until userspace calls
sigfastblock(UNBLOCK) which clears the word.

This way, the spurious EINTR that userspace can see while in critical
section is on first interruptible sleep, if a signal is pending, and
on signal posting. It is believed that it is not important for rtld
and lbithr critical sections. It might be visible for the application
code e.g. for the callback of dl_iterate_phdr(3), but again the belief
is that the non-compliance is acceptable. Most important is that the
retry of the sleeping syscall does not interrupt unless additional
signal is posted.

For now I added the knob kern.sigfastblock_fetch_always to enable the
word read on syscall entry to be able to diagnose possible issues due
to spurious EINTR.

While there, do some code restructuting to have all sigfastblock()
handling located in kern_sig.c.

Reviewed by: jeff
Discussed with: mjg
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D23622


# 2f729243 14-Feb-2020 Mateusz Guzik <mjg@FreeBSD.org>

Merge audit and systrace checks

This further shortens the syscall routine by not having to re-check after
the system call.


# 0e84a878 14-Feb-2020 Mateusz Guzik <mjg@FreeBSD.org>

Annotate branches in the syscall path

This in particular significantly shortens amd64_syscall, which otherwise
keeps jumping forward over 2KB of code in total.

Note some of these branches should be either eliminated altogether or
coalesced.


# 146fc63f 09-Feb-2020 Konstantin Belousov <kib@FreeBSD.org>

Add a way to manage thread signal mask using shared word, instead of syscall.

A new syscall sigfastblock(2) is added which registers a uint32_t
variable as containing the count of blocks for signal delivery. Its
content is read by kernel on each syscall entry and on AST processing,
non-zero count of blocks is interpreted same as the signal mask
blocking all signals.

The biggest downside of the feature that I see is that memory
corruption that affects the registered fast sigblock location, would
cause quite strange application misbehavior. For instance, the process
would be immune to ^C (but killable by SIGKILL).

With consumers (rtld and libthr added), benchmarks do not show a
slow-down of the syscalls in micro-measurements, and macro benchmarks
like buildworld do not demonstrate a difference. Part of the reason is
that buildworld time is dominated by compiler, and clang already links
to libthr. On the other hand, small utilities typically used by shell
scripts have the total number of syscalls cut by half.

The syscall is not exported from the stable libc version namespace on
purpose. It is intended to be used only by our C runtime
implementation internals.

Tested by: pho
Disscussed with: cem, emaste, jilles
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D12773


# c18ca749 15-Jul-2019 John Baldwin <jhb@FreeBSD.org>

Don't pass error from syscallenter() to syscallret().

syscallret() doesn't use error anymore. Fix a few other places to permit
removing the return value from syscallenter() entirely.
- Remove a duplicated assertion from arm's syscall().
- Use td_errno for amd64_syscall_ret_flush_l1d.

Reviewed by: kib
MFC after: 1 month
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D2090


# 1af9474b 15-Jul-2019 John Baldwin <jhb@FreeBSD.org>

Always set td_errno to the error value of a system call.

Early errors prior to a system call did not set td_errno. This commit
sets td_errno for all errors during syscallenter(). As a result,
syscallret() can now always use td_errno without checking TDP_NERRNO.

Reviewed by: kib
MFC after: 1 month
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D20898


# c26541e3 09-Jul-2019 John Baldwin <jhb@FreeBSD.org>

Use 'retval' label for first error in syscallenter().

This is more consistent with the rest of the function and lets us
unindent most of the function.

Reviewed by: kib
MFC after: 1 month
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D20897


# 7d065d87 19-Dec-2018 Mateusz Guzik <mjg@FreeBSD.org>

Deinline vfork handling out of the syscall return path.

vfork is rarely called (comparatively to other syscalls) and it avoidably
pollutes the fast path.

Sponsored by: The FreeBSD Foundation


# e272bf47 28-Nov-2018 Mateusz Guzik <mjg@FreeBSD.org>

Annotate td_cowgen check as unlikely.

Sponsored by: The FreeBSD Foundation


# e3d3e828 22-Nov-2018 Mateusz Guzik <mjg@FreeBSD.org>

Revert "fork: fix use-after-free with vfork"

This unreliably breaks libc handling of vfork where forking succeded,
but execve did not.

vfork code in libc performs waitpid with WNOHANG in case of failed exec.
With the fix exit codepath was waking up the parent before the child
fully transitioned to a zombie. Woken up parent would waitpid, which
could find a not-yet-zombie child and fail to reap it due to the WNOHANG
flag.

While removing the flag fixes the problem, it is not an option due to older
releases which would still suffer from the kernel change.

Revert the fix until a solution can be worked out.

Note that while use-after-free which gets back due to the revert is a real
bug, it's side-effects are limited due to the fact that struct proc memory
is never released by UMA.


# adce2419 22-Nov-2018 Mateusz Guzik <mjg@FreeBSD.org>

Annotate TDP_RFPPWAIT as unlikely.

The flag is only set on vfork, but is tested for *all* syscalls.
On amd64 this shortens common-case (not vfork) code.


# b00b27e9 22-Nov-2018 Mateusz Guzik <mjg@FreeBSD.org>

fork: fix use-after-free with vfork

The pointer to the child is stored without any reference held. Then it is
blindly used to wait until P_PPWAIT is cleared. However, if the child is
autoreaped it could have exited and get freed before the parent started
waiting.

Use the existing hold mechanism to mitigate the problem. Most common case
of doing exec remains unchanged. The corner case of doing exit performs
wake up before waiting for holds to clear.

Reviewed by: kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D18295


# 9d68f774 27-Apr-2018 Mateusz Guzik <mjg@FreeBSD.org>

systrace: track it like sdt probes

While here predict false.

Note the code is wrong (regardless of this change). Dereference of the
pointer can race with module unload. A fix would set the probe to a
nop stub instead of NULL.


# df57947f 18-Nov-2017 Pedro F. Giffuni <pfg@FreeBSD.org>

spdx: initial adoption of licensing ID tags.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.

Initially, only tag files that use BSD 4-Clause "Original" license.

RelNotes: yes
Differential Revision: https://reviews.freebsd.org/D13133


# 2d88da2f 12-Jun-2017 Konstantin Belousov <kib@FreeBSD.org>

Move struct syscall_args syscall arguments parameters container into
struct thread.

For all architectures, the syscall trap handlers have to allocate the
structure on the stack. The structure takes 88 bytes on 64bit arches
which is not negligible. Also, it cannot be easily found by other
code, which e.g. caused duplication of some members of the structure
to struct thread already. The change removes td_dbg_sc_code and
td_dbg_sc_nargs which were directly copied from syscall_args.

The structure is put into the copied on fork part of the struct thread
to make the syscall arguments information correct in the child after
fork.

This move will also allow several more uses shortly.

Reviewed by: jhb (previous version)
Sponsored by: The FreeBSD Foundation
MFC after: 3 weeks
X-Differential revision: https://reviews.freebsd.org/D11080


# 83c9dea1 17-Apr-2017 Gleb Smirnoff <glebius@FreeBSD.org>

- Remove 'struct vmmeter' from 'struct pcpu', leaving only global vmmeter
in place. To do per-cpu stats, convert all fields that previously were
maintained in the vmmeters that sit in pcpus to counter(9).
- Since some vmmeter stats may be touched at very early stages of boot,
before we have set up UMA and we can do counter_u64_alloc(), provide an
early counter mechanism:
o Leave one spare uint64_t in struct pcpu, named pc_early_dummy_counter.
o Point counter(9) fields of vmmeter to pcpu[0].pc_early_dummy_counter,
so that at early stages of boot, before counters are allocated we already
point to a counter that can be safely written to.
o For sparc64 that required a whole dummy pcpu[MAXCPU] array.

Further related changes:
- Don't include vmmeter.h into pcpu.h.
- vm.stats.vm.v_swappgsout and vm.stats.vm.v_swappgsin changed to 64-bit,
to match kernel representation.
- struct vmmeter hidden under _KERNEL, and only vmstat(1) is an exclusion.

This is based on benno@'s 4-year old patch:
https://lists.freebsd.org/pipermail/freebsd-arch/2013-July/014471.html

Reviewed by: kib, gallatin, marius, lidl
Differential Revision: https://reviews.freebsd.org/D10156


# fef09913 17-Apr-2017 Gleb Smirnoff <glebius@FreeBSD.org>

Typo!


# 9ed01c32 17-Apr-2017 Gleb Smirnoff <glebius@FreeBSD.org>

All these files need sys/vmmeter.h, but now they got it implicitly
included via sys/pcpu.h.


# 82a4538f 20-Feb-2017 Eric Badger <badger@FreeBSD.org>

Defer ptracestop() signals that cannot be delivered immediately

When a thread is stopped in ptracestop(), the ptrace(2) user may request
a signal be delivered upon resumption of the thread. Heretofore, those signals
were discarded unless ptracestop()'s caller was issignal(). Fix this by
modifying ptracestop() to queue up signals requested by the ptrace user that
will be delivered when possible. Take special care when the signal is SIGKILL
(usually generated from a PT_KILL request); no new stop events should be
triggered after a PT_KILL.

Add a number of tests for the new functionality. Several tests were authored
by jhb.

PR: 212607
Reviewed by: kib
Approved by: kib (mentor)
MFC after: 2 weeks
Sponsored by: Dell EMC
In collaboration with: jhb
Differential Revision: https://reviews.freebsd.org/D9260


# 643f6f47 21-Sep-2016 Konstantin Belousov <kib@FreeBSD.org>

Add PROC_TRAPCAP procctl(2) controls and global sysctl kern.trap_enocap.

Both can be used to cause processes in capability mode to receive
SIGTRAP when ENOTCAPABLE or ECAPMODE errors are returned from
syscalls.

Idea by: emaste
Reviewed by: oshogbo (previous version), emaste
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D7965


# fc4f075a 18-Jul-2016 John Baldwin <jhb@FreeBSD.org>

Add PTRACE_VFORK to trace vfork events.

First, PL_FLAG_FORKED events now also set a PL_FLAG_VFORKED flag when
the new child was created via vfork() rather than fork(). Second, a
new PL_FLAG_VFORK_DONE event can now be enabled via the PTRACE_VFORK
event mask. This new stop is reported after the vfork parent resumes
due to the child calling exit or exec. Debuggers can use this stop to
reinsert breakpoints in the vfork parent process before it resumes.

Reviewed by: kib
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D7045


# 8d570f64 15-Jul-2016 John Baldwin <jhb@FreeBSD.org>

Add a mask of optional ptrace() events.

ptrace() now stores a mask of optional events in p_ptevents. Currently
this mask is a single integer, but it can be expanded into an array of
integers in the future.

Two new ptrace requests can be used to manipulate the event mask:
PT_GET_EVENT_MASK fetches the current event mask and PT_SET_EVENT_MASK
sets the current event mask.

The current set of events include:
- PTRACE_EXEC: trace calls to execve().
- PTRACE_SCE: trace system call entries.
- PTRACE_SCX: trace syscam call exits.
- PTRACE_FORK: trace forks and auto-attach to new child processes.
- PTRACE_LWP: trace LWP events.

The S_PT_SCX and S_PT_SCE events in the procfs p_stops flags have
been replaced by PTRACE_SCE and PTRACE_SCX. PTRACE_FORK replaces
P_FOLLOW_FORK and PTRACE_LWP replaces P2_LWP_EVENTS.

The PT_FOLLOW_FORK and PT_LWP_EVENTS ptrace requests remain for
compatibility but now simply toggle corresponding flags in the
event mask.

While here, document that PT_SYSCALL, PT_TO_SCE, and PT_TO_SCX both
modify the event mask and continue the traced process.

Reviewed by: kib
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D7044


# 8ff6d9dd 16-Dec-2015 Mark Johnston <markj@FreeBSD.org>

Support an arbitrary number of arguments to DTrace syscall probes.

Rather than pushing all eight possible arguments into dtrace_probe()'s
stack frame, make the syscall_args struct for the current syscall available
via the current thread. Using a custom getargval method for the systrace
provider, this allows any syscall argument to be fetched, even in kernels
that have modified the maximum number of system call arguments.

Sponsored by: EMC / Isilon Storage Division


# aff57357 22-Oct-2015 Ed Schouten <ed@FreeBSD.org>

Add a way to distinguish between forking and thread creation in schedtail.

For CloudABI we need to initialize the registers of new threads
differently based on whether the thread got created through a fork or
through simple thread creation.

Add a flag, TDP_FORKING, that is set by do_fork() and cleared by
fork_exit(). This can be tested against in schedtail.

Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D3973


# 189ac973 06-Oct-2015 John Baldwin <jhb@FreeBSD.org>

Fix various edge cases related to system call tracing.
- Always set td_dbg_sc_* when P_TRACED is set on system call entry
even if the debugger is not tracing system call entries. This
ensures the fields are valid when reporting other stops that
occur at system call boundaries such as for PT_FOLLOW_FORKS or
when only tracing system call exits.
- Set TDB_SCX when reporting the stop for a new child process in
fork_return(). This causes the event to be reported as a system
call exit.
- Report a system call exit event in fork_return() for new threads in
a traced process.
- Copy td_dbg_sc_* to new threads instead of zeroing. This ensures
that td_dbg_sc_code in particular will report the system call that
created the new thread or process when it reports a system call
exit event in fork_return().
- Add new ptrace tests to verify that new child processes and threads
report system call exit events with a valid pl_syscall_code via
PT_LWPINFO.

Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D3822


# bdd64116 16-Sep-2015 John Baldwin <jhb@FreeBSD.org>

Always clear TDB_USERWR before fetching system call arguments. The
TDB_USERWR flag may still be set after a debugger detaches from a
process via PT_DETACH. Previously the flag would never be cleared
forcing a double fetch of the system call arguments for each system
call. Note that the flag cannot be cleared at PT_DETACH time in case
one of the threads in the process is currently stopped in
syscallenter() and the debugger has modified the arguments for that
pending system call before detaching.

Reviewed by: kib
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D3678


# ded3e7f0 01-Sep-2015 John Baldwin <jhb@FreeBSD.org>

The 'sa' argument to syscallret() is not unused.


# 183b68f7 01-Sep-2015 John Baldwin <jhb@FreeBSD.org>

Export current system call code and argument count for system call entry
and exit events. procfs stop events for system call tracing report these
values (argument count for system call entry and code for system call exit),
but ptrace() does not provide this information. (Note that while the system
call code can be determined in an ABI-specific manner during system call
entry, it is not generally available during system call exit.)

The values are exported via new fields at the end of struct ptrace_lwpinfo
available via PT_LWPINFO.

Reviewed by: kib
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D3536


# 4ea6a9a2 10-Jun-2015 Mateusz Guzik <mjg@FreeBSD.org>

Generalised support for copy-on-write structures shared by threads.

Thread credentials are maintained as follows: each thread has a pointer to
creds and a reference on them. The pointer is compared with proc's creds on
userspace<->kernel boundary and updated if needed.

This patch introduces a counter which can be compared instead, so that more
structures can use this scheme without adding more comparisons on the boundary.


# 8638fe7b 08-Dec-2014 Konstantin Belousov <kib@FreeBSD.org>

Thread waiting for the vfork(2)-ed child to exec or exit, must allow
for the suspension.

Currently, the loop performs uninterruptible cv_wait(9) call, which
prevents suspension until child allows further execution of parent.
If child is stopped, suspension or single-threading is delayed
indefinitely.

Create a helper thread_suspend_check_needed() to identify the need for
a call to thread_suspend_check(). It is required since call to the
thread_suspend_check() cannot be safely done while owning the child
(p2) process lock. Only when suspension is needed, drop p2 lock and
call thread_suspend_check(). Perform wait for cv with timeout, in
case suspend is requested after wait started; I do not see a better
way to interrupt the wait.

Reported and tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week


# 4a144410 16-Mar-2014 Robert Watson <rwatson@FreeBSD.org>

Update kernel inclusions of capability.h to use capsicum.h instead; some
further refinement is required as some device drivers intended to be
portable over FreeBSD versions rely on __FreeBSD_version to decide whether
to include capability.h.

MFC after: 3 weeks


# 54366c0b 25-Nov-2013 Attilio Rao <attilio@FreeBSD.org>

- For kernel compiled only with KDTRACE_HOOKS and not any lock debugging
option, unbreak the lock tracing release semantic by embedding
calls to LOCKSTAT_PROFILE_RELEASE_LOCK() direclty in the inlined
version of the releasing functions for mutex, rwlock and sxlock.
Failing to do so skips the lockstat_probe_func invokation for
unlocking.
- As part of the LOCKSTAT support is inlined in mutex operation, for
kernel compiled without lock debugging options, potentially every
consumer must be compiled including opt_kdtrace.h.
Fix this by moving KDTRACE_HOOKS into opt_global.h and remove the
dependency by opt_kdtrace.h for all files, as now only KDTRACE_FRAMES
is linked there and it is only used as a compile-time stub [0].

[0] immediately shows some new bug as DTRACE-derived support for debug
in sfxge is broken and it was never really tested. As it was not
including correctly opt_kdtrace.h before it was never enabled so it
was kept broken for a while. Fix this by using a protection stub,
leaving sfxge driver authors the responsibility for fixing it
appropriately [1].

Sponsored by: EMC / Isilon storage division
Discussed with: rstone
[0] Reported by: rstone
[1] Discussed with: philip


# 7fc3ae51 27-Dec-2012 Oleksandr Tymoshenko <gonzo@FreeBSD.org>

Fix build on ARM (and probably other platforms)


# 4c44811c 19-Dec-2012 Jeff Roberson <jeff@FreeBSD.org>

- Add new machine parsable KTR macros for timing events.
- Use this new format to automatically handle syscalls and VOPs. This
changes the earlier format but is still human readable.

Sponsored by: EMC / Isilon Storage Division


# 16cbf13b 08-Sep-2012 Attilio Rao <attilio@FreeBSD.org>

Move the checks for td_pinned, td_critnest, TDP_NOFAULTING and
TDP_NOSLEEPING leaking from syscallret() to userret() so that also
trap handling is covered. Also, the check on td_locks is not duplicated
between the two functions.

Reported by: avg
Reviewed by: kib
MFC after: 1 week


# 7e690c1f 22-Aug-2012 John Baldwin <jhb@FreeBSD.org>

Assert that system calls do not leak a pinned thread (via sched_pin()) to
userland.


# 6c5d7af1 30-May-2012 Konstantin Belousov <kib@FreeBSD.org>

Assert that TDP_NOFAULTING and TDP_NOSPEEPING thread flags do not leak
when thread returns from a syscall to usermode.

Tested by: pho
MFC after: 1 week


# 2dd9ea6f 12-Apr-2012 Konstantin Belousov <kib@FreeBSD.org>

Add thread-private flag to indicate that error value is already placed
in td_errno. Flag is supposed to be used by syscalls returning
EJUSTRETURN because errno was already placed into the usermode frame
by a call to set_syscall_retval(9). Both ktrace and dtrace get errno
value from td_errno if the flag is set.

Use the flag to fix sigsuspend(2) error return ktrace records.

Requested by: bde
MFC after: 1 week


# 1d7ca9bb 27-Feb-2012 Konstantin Belousov <kib@FreeBSD.org>

Currently, the debugger attached to the process executing vfork() does
not get syscall exit notification until the child performed exec of
exit. Swap the order of doing ptracestop() and waiting for P_PPWAIT
clearing, by postponing the wait into syscallret after ptracestop()
notification is done.

Reported, tested and reviewed by: Dmitry Mikulin <dmitrym juniper net>
MFC after: 2 weeks


# 343b391f 11-Feb-2012 Konstantin Belousov <kib@FreeBSD.org>

The PTRACESTOP() macro is used only once. Inline the only use and remove
the macro.

MFC after: 1 week


# 6ad1ff09 30-Jan-2012 Konstantin Belousov <kib@FreeBSD.org>

A debugger which requested PT_FOLLOW_FORK should get the notification
about new child not only when doing PT_TO_SCX, but also for PT_CONTINUE.
If TDB_FORK flag is set, always issue a stop, the same as is done for
TDB_EXEC.

Reported by: Dmitry Mikulin <dmitrym juniper net>
MFC after: 1 week


# b2f1a8f2 29-Oct-2011 Marcel Moolenaar <marcel@FreeBSD.org>

Revert rev. 226893: subr_syscall.c is being included from C files and
on amd64 with FREEBSD32 enabled, this means that systrace_probe_func
gets defined twice.


# 056f0ec7 28-Oct-2011 Marcel Moolenaar <marcel@FreeBSD.org>

Define systrace_probe_func in subr_syscall.c where it's used, instead
of defining it in MD code. This eliminates porting to other architectures.


# ce8bd78b 27-Sep-2011 Konstantin Belousov <kib@FreeBSD.org>

Do not deliver SIGTRAP on exec as the normal signal, use ptracestop() on
syscall exit path. Otherwise, if SIGTRAP is ignored, that tdsendsignal()
do not want to deliver the signal, and debugger never get a notification
of exec.

Found and tested by: Anton Yuzhaninov <citrin citrin ru>
Discussed with: jhb
MFC after: 2 weeks


# 26ccf4f1 11-Sep-2011 Konstantin Belousov <kib@FreeBSD.org>

Inline the syscallenter() and syscallret(). This reduces the time measured
by the syscall entry speed microbenchmarks by ~10% on amd64.

Submitted by: jhb
Approved by: re (bz)
MFC after: 2 weeks