History log of /freebsd-current/sys/amd64/linux32/linux32_sysvec.c
Revision Date Author Comments
# be707ee0 09-Feb-2024 Konstantin Belousov <kib@FreeBSD.org>

amd64/linux*: mark brandlists as static

Sponsored by: The FreeBSD Foundation
MFC after: 3 days


# 5bdd74cc 09-Oct-2023 Dmitry Chagin <dchagin@FreeBSD.org>

linux(4): Drop the outdated comments about sixth register on i386 int0x80

This is well documented in the Linux syscall(2).

MFC after: 1 week


# 7acc4240 25-Sep-2023 Konstantin Belousov <kib@FreeBSD.org>

linuxolator: fix nosys() to not send SIGSYS

Reviewed by: dchagin, markj
Discussed with: jhb
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D41976


# ba90a31d 11-Sep-2023 Dmitry Chagin <dchagin@FreeBSD.org>

linux(4): Cleanup includes under amd64/linux32

No functional changes.

MFC after: 1 week


# 3460fab5 18-Aug-2023 Dmitry Chagin <dchagin@FreeBSD.org>

linux(4): Remove sys/cdefs.h inclusion where it's not needed due to 685dc743


# 685dc743 16-Aug-2023 Warner Losh <imp@FreeBSD.org>

sys: Remove $FreeBSD$: one-line .c pattern

Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/


# 4281dab8 28-Jul-2023 Dmitry Chagin <dchagin@FreeBSD.org>

linux(4): Add elf_hwcap2 to x86

On x86 Linux via AT_HWCAP2 the user controlled (by tunables) processor
capabilities are exposed.

Reviewed by:
Differential Revision: https://reviews.freebsd.org/D41165
MFC after: 2 weeks


# fd745e1d 29-May-2023 Dmitry Chagin <dchagin@FreeBSD.org>

linux(4): Use pwd_altroot() to tell namei() about ABI root path

PR: 72920
Differential Revision: https://reviews.freebsd.org/D40090
MFC after: 2 month


# 7d8c9839 22-Apr-2023 Dmitry Chagin <dchagin@FreeBSD.org>

linux(4): Deduplicate linux_copyout_auxargs()

Export default MINSIGSTKSZ value for the x86 until we do not preserve AVX
registers in the signal context.

Differential Revision: https://reviews.freebsd.org/D39644
MFC after: 1 month


# 2456a459 14-Feb-2023 Dmitry Chagin <dchagin@FreeBSD.org>

linux(4): Cleanup includes under amd64/linux

Cleanup unneeded includes, sort the rest according to style(9).
No functional changes.

MFC after: 2 weeks


# 10d16789 12-Feb-2023 Dmitry Chagin <dchagin@FreeBSD.org>

linux(4): Get rid of the opt_compat.h include.

Since e013e369 COMPAT_LINUX, COMPAT_LINUX32 build options are removed,
so include of opt_compat.h is no more needed.

MFC after: 2 weeks


# 95b86034 02-Feb-2023 Dmitry Chagin <dchagin@FreeBSD.org>

linux(4): Deduplicate linux_trans_osrel().

MFC after: 1 week


# 9e550625 02-Feb-2023 Dmitry Chagin <dchagin@FreeBSD.org>

linux(4): Deduplicate linux_fixup_elf().

Use native routines to fixup initial process stack. On Arm64 linux_elf_fixup() is
noop, as it do the stack fixup (room for argc) in the linux_copyout_strings().

MFC after: 1 week


# 74465145 02-Feb-2023 Dmitry Chagin <dchagin@FreeBSD.org>

linux(4): Microoptimize linux_elf.h for future use.

In order to reduce code duplication move coredump support definitions
into the appropriate header and hide private definitions.

MFC after: 1 week


# 1da65dcb 28-Oct-2022 Mitchell Horne <mhorne@FreeBSD.org>

linux: populate sv_syscallnames in each sysentvec

This allows the syscallname() function to give a usable result for Linux
ABIs.

Reported by: jrtc27
Reviewed by: jrtc27, markj, jhb
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D37199


# 361971fb 02-Jun-2022 Kornel Dulęba <kd@FreeBSD.org>

Rework how shared page related data is stored

Store the shared page address in struct vmspace.
Also instead of storing absolute addresses of various shared page
segments save their offsets with respect to the shared page address.
This will be more useful when the shared page address is randomized.

Approved by: mw(mentor)
Sponsored by: Stormshield
Obtained from: Semihalf
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D35393


# 4a6c2d07 30-May-2022 Dmitry Chagin <dchagin@FreeBSD.org>

linux(4): Properly restore the thread signal mask after signal delivery on i386

Replace sigframe sf_extramask by native sigset_t and use it to
store/restore the thread signal mask without conversion to/from
Linux signal mask.

Pointy hat to: dchagin
MFC after: 2 weeks


# 9016ec05 23-May-2022 Dmitry Chagin <dchagin@FreeBSD.org>

linux(4): Deduplicate bsd_to_linux_trapcode()

As bsd_to_linux_trapcode() is common for x86 Linuxulators,
move it under x86/linux.

MFC after: 2 weeks


# 2434137f 23-May-2022 Dmitry Chagin <dchagin@FreeBSD.org>

linux(4): Deduplicate translate_traps()

As translate_traps() is common for x86 Linuxulators,
move it under x86/linux.

MFC after: 2 weeks


# eca368ec 20-May-2022 Dmitry Chagin <dchagin@FreeBSD.org>

Retire sv_transtrap

Call translate_traps directly from sendsig().

MFC after: 2 weeks


# 8f9635dc 15-May-2022 Dmitry Chagin <dchagin@FreeBSD.org>

linux(4): Retire handmade DWARF annotations from signal trampolines

The Linux exports __kernel_sigreturn and __kernel_rt_sigreturn from the
vdso. Modern glibc's sigaction sets the sa_restorer field of sigaction
to the corresponding vdso __sigreturn, and sets the SA_RESTORER.
Our signal trampolines uses the FreeBSD-way to call a signal handler,
so does not use the sigaction's sa_restorer.

However, as glibc's runtime linker depends on the existment of the vdso
__sigreturn symbols, for all Linuxulators was added separate trampolines
named __sigcode with DWARF anotations and left separate __sigreturn
methods, which are exported.

MFC after: 2 weeks


# 6e826d27 15-May-2022 Dmitry Chagin <dchagin@FreeBSD.org>

linux(4): Better naming for ucontext field of struct rt_sigframe

To reduce sendsig code difference and to avoid confusing me,
rename sf_sc to sf_uc to match the content.

MFC after: 2 weeks


# 21f24617 15-May-2022 Dmitry Chagin <dchagin@FreeBSD.org>

linux(4): Move sigframe definitions to separate headers

The signal trampoine-related definitions are used only in the MD part
of code, wherefore moved from everywhere used linux.h to separate MD
headers.

MFC after: 2 weeks


# ba279bcd 15-May-2022 Dmitry Chagin <dchagin@FreeBSD.org>

linux(4): Cleanup signal trampolines

This is the first stage of a signal trampolines refactoring.

From trampolines retired emulation of the 'call' instruction, which is
replaced by direct call of a signal handler. The signal handler address
is in the register.

The previous trampoline implemenatation used semi-Linux-way to call
a signal handler via the 'jmp' instruction. Wherefore the trampoline
emulated a 'call' instruction to into the stack the return address for
signal handler's 'ret' instruction. Wherefore handmade DWARD annotations
was used.

While here rephrased and removed excessive comments.

MFC after: 2 weeks


# 0b5d5dc3 15-May-2022 Dmitry Chagin <dchagin@FreeBSD.org>

linux(4): Retire unneeded initialization

Both uc_flags and uc_link are zeroed above. On amd64 and i386 the
uc_link field is not used at all. The UC_FP_XSTATE bit should be set
in the uc_flags if OS xsave knob is turned on (and xsave is implemented).

MFC after: 2 weeks


# 5a6a4fb2 08-May-2022 Dmitry Chagin <dchagin@FreeBSD.org>

linux(4): Implement vdso getcpu for x86.

This is modeled after f2395455 (by kib@).

MFC after: 2 weeks


# d5dc757e 31-Mar-2022 Dmitry Chagin <dchagin@FreeBSD.org>

linux(4): Add compat.linux32.emulate_i386 knob.

Historically 32-bit Linuxulator under amd64 emulated the real i386
behavior. Since 3d8dd983 the old i386 Linux world can't be used under
amd64 Linuxulator as it don't know anything about amd64 machine (which
is returned now by newuname() syscall). So, add a knob to allow to swith
the behavior and use i386 Linux binaries on amd64.
Set knob to the new behavior as I think this is common to the modern
Linux distros.

Reviewed by: Pau Amma (doc), emaste
Differential revision: https://reviews.freebsd.org/D34708
MFC after: 2 weeks


# 6bea696a 04-Feb-2022 John Baldwin <jhb@FreeBSD.org>

linux_copyout_strings: Use PROC_PS_STRINGS().

Reviewed by: markj
Obtained from: CheriBSD
Differential Revision: https://reviews.freebsd.org/D34173


# 3fc21fdd 17-Jan-2022 Mark Johnston <markj@FreeBSD.org>

sysent: Add a sv_psstringssz field to struct sysentvec

The size of the ps_strings structure varies between ABIs, so this is
useful for computing the address of the ps_strings structure relative to
the top of the stack when stack address randomization is enabled.

Reviewed by: kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D33704


# f04a0960 30-Dec-2021 Mark Johnston <markj@FreeBSD.org>

exec: Simplify sv_copyout_strings implementations a bit

Simplify control flow around handling of the execpath length and signal
trampoline. Cache the sysentvec pointer in a local variable.

No functional change intended.

Reviewed by: kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D33703


# a42d362b 14-Sep-2021 Konstantin Belousov <kib@FreeBSD.org>

amd64: centralize definitions of CS_SECURE and EFL_SECURE

Requested by markj
Reviewed by: jhb, markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D31954


# 0a4b664a 12-Aug-2021 Dmitry Chagin <dchagin@FreeBSD.org>

linux(4): Add struct clone_args for future clone3 system call.

In preparation for clone3 system call add struct clone_args and use it in
clone implementation.
Move all of clone related bits to the newly created linux_fork.h header.

Differential revision: https://reviews.freebsd.org/D31474
MFC after: 2 weeks


# de8374df 12-Aug-2021 Dmitry Chagin <dchagin@FreeBSD.org>

fork: Allow ABI to specify fork return values for child.

At least Linux x86 ABI's does not use carry bit and expects that the dx register
is preserved. For this add a new sv_set_fork_retval hook and call it from cpu_fork().

Add a short comment about touching dx in x86_set_fork_retval(), for more details
see phab comments from kib@ and imp@.

Reviewed by: kib
Differential revision: https://reviews.freebsd.org/D31472
MFC after: 2 weeks


# cf8d74e3 20-Jul-2021 Dmitry Chagin <dchagin@FreeBSD.org>

linux(4): Allow musl brand to use FUTEX_REQUEUE op.

Initial patch from submitter was adapted by me to prevent unconditional
FUTEX_REQUEUE use.

PR: 255947
Submitted by: Philippe Michaud-Boudreault
Differential Revision: https://reviews.freebsd.org/D30332


# ae8330b4 20-Jul-2021 Dmitry Chagin <dchagin@FreeBSD.org>

linux(4): Add arch name to the some printfs.

Reviewed by: emaste
Differential revision: https://reviews.freebsd.org/D30904
MFC after: 2 weeks


# 09cffde9 20-Jul-2021 Dmitry Chagin <dchagin@FreeBSD.org>

linux(4): Fixup the vDSO initialization order.

The vDSO initialisation order should be as follows:
- native abi init via exec_sysvec_init();
- vDSO symbols queued to the linux_vdso_syms list;
- linux_vdso_install();
- linux_exec_sysvec_init();

As the exec_sysvec_init() called with SI_ORDER_ANY (last) at SI_SUB_EXEC
order, move linux_vdso_install() and linux_exec_sysvec_init() to the
SI_SUB_EXEC+1 order.

Reviewed by: trasz
Differential Revision: https://reviews.freebsd.org/D30902
MFC after 2 weeks


# a543556c 20-Jul-2021 Dmitry Chagin <dchagin@FreeBSD.org>

linux(4): Constify vdso install/deinstall.

In order to reduce diff between arches constify vdso install/deinstall
functions like arm64.

Reviewed by: emaste
Differential revision: https://reviews.freebsd.org/D30901
MFC after: 2 weeks


# 9931033b 20-Jul-2021 Dmitry Chagin <dchagin@FreeBSD.org>

linux(4); Almost complete the vDSO.

The vDSO (virtual dynamic shared object) is a small shared library that the
kernel maps R/O into the address space of all Linux processes on image
activation. The vDSO is a fully formed ELF image, shared by all processes
with the same ABI, has no process private data.

The primary purpose of the vDSO:
- non-executable stack, signal trampolines not copied to the stack;
- signal trampolines unwind, mandatory for the NPTL;
- to avoid contex-switch overhead frequently used system calls can be
implemented in the vDSO: for now gettimeofday, clock_gettime.

The first two have been implemented, so add the implementation of system
calls.

System calls implemenation based on a native timekeeping code with some
limitations:
- ifunc can't be used, as vDSO r/o mapped to the process VA and rtld
can't relocate symbols;
- reading HPET memory is not implemented for now (TODO).

In case on any error vDSO system calls fallback to the kernel system
calls. For unimplemented vDSO system calls added prototypes which call
corresponding kernel system call.

Tested by: trasz (arm64)
Differential revision: https://reviews.freebsd.org/D30900
MFC after: 2 weeks


# 5fd9cd53 20-Jul-2021 Dmitry Chagin <dchagin@FreeBSD.org>

linux(4): Modify sv_onexec hook to return an error.

Temporary add stubs to the Linux emulation layer which calls the existing hook.

Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D30911
MFC after: 2 weeks


# cf98bc28 10-Jul-2021 David Chisnall <theraven@FreeBSD.org>

Pass the syscall number to capsicum permission-denied signals

The syscall number is stored in the same register as the syscall return
on amd64 (and possibly other architectures) and so it is impossible to
recover in the signal handler after the call has returned. This small
tweak delivers it in the `si_value` field of the signal, which is
sufficient to catch capability violations and emulate them with a call
to a more-privileged process in the signal handler.

This reapplies 3a522ba1bc852c3d4660a4fa32e4a94999d09a47 with a fix for
the static assertion failure on i386.

Approved by: markj (mentor)

Reviewed by: kib, bcr (manpages)

Differential Revision: https://reviews.freebsd.org/D29185


# d2b55828 10-Jul-2021 David Chisnall <theraven@FreeBSD.org>

Revert "Pass the syscall number to capsicum permission-denied signals"

This broke the i386 build.

This reverts commit 3a522ba1bc852c3d4660a4fa32e4a94999d09a47.


# 3a522ba1 10-Jul-2021 David Chisnall <theraven@FreeBSD.org>

Pass the syscall number to capsicum permission-denied signals

The syscall number is stored in the same register as the syscall return
on amd64 (and possibly other architectures) and so it is impossible to
recover in the signal handler after the call has returned. This small
tweak delivers it in the `si_value` field of the signal, which is
sufficient to catch capability violations and emulate them with a call
to a more-privileged process in the signal handler.

Approved by: markj (mentor)

Reviewed by: kib, bcr (manpages)

Differential Revision: https://reviews.freebsd.org/D29185


# 447636e4 30-Jun-2021 Edward Tomasz Napierala <trasz@FreeBSD.org>

linux(4): implement coredump support

Implement dumping core for Linux binaries on amd64, for both
32- and 64-bit executables. Some bits are still missing.

This is based on a prototype by chuck@.

Reviewed By: kib
Sponsored By: EPSRC
Differential Revision: https://reviews.freebsd.org/D30019


# 435754a5 29-Jun-2021 Edward Tomasz Napierala <trasz@FreeBSD.org>

Add infrastructure required for Linux coredump support

This adds `sv_elf_core_osabi`, `sv_elf_core_abi_vendor`,
and `sv_elf_core_prepare_notes` fields to `struct sysentvec`,
and modifies imgact_elf.c to make use of them instead
of hardcoding FreeBSD-specific values. It also updates all
of the ABI definitions to preserve current behaviour.

This makes it possible to implement non-native ELF coredump
support without unnecessary code duplication. It will be used
for Linux coredumps.

Reviewed By: kib
Sponsored By: EPSRC
Differential Revision: https://reviews.freebsd.org/D30921


# 4aae1334 25-Jun-2021 Dmitry Chagin <dchagin@FreeBSD.org>

linux(4): Make vDSO defines private.

Hide the vDSO defines to the linux32_sysvec as they are not intended to
be used outside of it. Fix LINUX32_PS_STRINGS, use the size of
struct linux32_ps_strings instead of a numeric constant.

MFC after: 2 weeks


# c1da89fe 21-Jun-2021 Dmitry Chagin <dchagin@FreeBSD.org>

linux(4): Retire linux_kplatform.

Assuming we can't run on i486, i586 class cpu, retire linux_kplatform var
and use hardcoded 'machine' value in linux_newuname().

I have added linux_kplatform for consistency with linux_platform which is
placed in to vdso to avoid excess copyout it on stack for AT_PLATFORM at
exec time.

This is the first stage of Linuxulator's vdso revision.

Reviewed by: trasz, imp
Differential Revision: https://reviews.freebsd.org/D30774
MFC after: 2 weeks


# 870e197d 05-Jun-2021 Konstantin Belousov <kib@FreeBSD.org>

Add quirks for Linux ABI signals handling

Require queueing of the signals with default action, and disable
dequeueing SIGCHLD on wait for live process.

Reported and tested by: dchagin
Reviewed by: dchagin, markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D30675


# 598f6fb4 14-Jan-2021 Konstantin Belousov <kib@FreeBSD.org>

linuxolator: Add compat.linux.setid_allowed knob

PR: 21463
Reported by: kris
Reviewed by: dchagin
Tested by: trasz
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D28154


# f4e80108 06-Jun-2021 Dmitry Chagin <dchagin@FreeBSD.org>

linux(4): optimize ksiginfo to siginfo conversion.

Retire ksiginfo_to_lsiginfo function, use siginfo_to_lsiginfo instead.
Convert rt_sigtimedwait siginfo variables to well known names.

MFC after: 2 weeks


# ca6e1fa3 12-Apr-2021 Edward Tomasz Napierala <trasz@FreeBSD.org>

linux: adjust ordering of Linux auxv and add dummy AT_HWCAP2

This should be a no-op; the purpose of this is to reduce
a spurious difference between Linuxulator and Linux, to make
debugging core dumps slightly easier.

Note that AT_HWCAP2 we pass to Linux binaries is always 0,
instead of being equal to 'cpu_feature2'. This matches what
I've observed under Ubuntu Focal VM.

Reviewed By: chuck, dchagin
Sponsored By: EPSRC
Differential Revision: https://reviews.freebsd.org/D29609


# 94172aff 09-Apr-2021 Konstantin Belousov <kib@FreeBSD.org>

amd64: clear debug registers on execing 32bit Linux binary

Reviewed by: jhb
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D29687


# 0fc8a796 16-Feb-2021 Mark Johnston <markj@FreeBSD.org>

linux: Unmap the VDSO page when unloading

linux_shared_page_init() creates an object and grabs and maps a single
page to back the VDSO. When destroying the VDSO object, we failed to
destroy the mapping and free KVA. Fix this.

Reviewed by: kib
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D28696


# 4815f175 23-Nov-2020 Konstantin Belousov <kib@FreeBSD.org>

Linuxolator: Replace use of eventhandlers by sysent hooks.

Reviewed by: markj
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D27309


# 866b1f51 26-Oct-2020 Edward Tomasz Napierala <trasz@FreeBSD.org>

Fix misnomer - linux_to_bsd_errno() does the exact opposite.

Reported by: arichardson
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26965


# 1e2521ff 27-Sep-2020 Edward Tomasz Napierala <trasz@FreeBSD.org>

Get rid of sa->narg. It serves no purpose; use sa->callp->sy_narg instead.

Reviewed by: kib
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D26458


# 70890254 17-Sep-2020 Edward Tomasz Napierala <trasz@FreeBSD.org>

Get rid of sv_errtbl and SV_ABI_ERRNO().

Reviewed by: kib
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D26388


# c26391f4 15-Sep-2020 Edward Tomasz Napierala <trasz@FreeBSD.org>

Move SV_ABI_ERRNO translation into linux-specific code, to simplify
the syscall path and declutter it a bit. No functional changes intended.

Reviewed by: kib (earlier version)
MFC after: 2 weeks
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D26378


# 543769bf 01-Sep-2020 Mateusz Guzik <mjg@FreeBSD.org>

amd64: clean up empty lines in .c and .h files


# b24e6ac8 16-Apr-2020 Brooks Davis <brooks@FreeBSD.org>

Convert canary, execpathp, and pagesizes to pointers.

Use AUXARGS_ENTRY_PTR to export these pointers. This is a followup to
r359987 and r359988.

Reviewed by: jhb
Obtained from: CheriBSD
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D24446


# 397df744 15-Apr-2020 Brooks Davis <brooks@FreeBSD.org>

Make ps_strings in struct image_params into a pointer.

This is a prepratory commit for D24407.

Reviewed by: kib
Obtained from: CheriBSD
Sponsored by: DARPA


# 7029da5c 26-Feb-2020 Pawel Biernacki <kaktus@FreeBSD.org>

Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many)

r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.

This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.

Mark all obvious cases as MPSAFE. All entries that haven't been marked
as MPSAFE before are by default marked as NEEDGIANT

Approved by: kib (mentor, blanket)
Commented by: kib, gallatin, melifaro
Differential Revision: https://reviews.freebsd.org/D23718


# b5f20658 16-Dec-2019 Edward Tomasz Napierala <trasz@FreeBSD.org>

Add compat.linux.emul_path, so it can be set to something other
than "/compat/linux". Useful when you have several compat directories
with different Linux versions and you don't want to clash with files
installed by linux-c7 packages.

Reviewed by: bcr (manpages)
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D22574


# 23a5b4ed 09-Dec-2019 John Baldwin <jhb@FreeBSD.org>

Use 4 byte stack alignment instead of 8 byte.

This was an old bug prior to r355373 and mostly harmless as it would
waste at most a handful of bytes on the stack.


# d8010b11 09-Dec-2019 John Baldwin <jhb@FreeBSD.org>

Copy out aux args after the argument and environment vectors.

Partially revert r354741 and r354754 and go back to allocating a
fixed-size chunk of stack space for the auxiliary vector. Keep
sv_copyout_auxargs but change it to accept the address at the end of
the environment vector as an input stack address and no longer
allocate room on the stack. It is now called at the end of
copyout_strings after the argv and environment vectors have been
copied out.

This should fix a regression in r354754 that broke the stack alignment
for newer Linux amd64 binaries (and probably broke Linux arm64 as
well).

Reviewed by: kib
Tested on: amd64 (native, linux64 (only linux-base-c7), and i386)
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D22695


# 31174518 03-Dec-2019 John Baldwin <jhb@FreeBSD.org>

Use uintptr_t instead of register_t * for the stack base.

- Use ustringp for the location of the argv and environment strings
and allow destp to travel further down the stack for the stackgap
and auxv regions.
- Update the Linux copyout_strings variants to move destp down the
stack as was done for the native ABIs in r263349.
- Stop allocating a space for a stack gap in the Linux ABIs. This
used to hold translated system call arguments, but hasn't been used
since r159992.

Reviewed by: kib
Tested on: md64 (amd64, i386, linux64), i386 (i386, linux)
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D22501


# 03b0d68c 18-Nov-2019 John Baldwin <jhb@FreeBSD.org>

Check for errors from copyout() and suword*() in sv_copyout_args/strings.

Reviewed by: brooks, kib
Tested on: amd64 (amd64, i386, linux64), i386 (i386, linux)
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D22401


# 5caa67fa 15-Nov-2019 John Baldwin <jhb@FreeBSD.org>

Use a sv_copyout_auxargs hook in the Linux ELF ABIs.

Reviewed by: emaste
Tested on: amd64 (linux64 only), i386
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D22356


# a161fba9 17-Oct-2019 Yuri Pankov <yuripv@FreeBSD.org>

linux: futex_mtx should follow futex_list

Move futex_mtx to linux_common.ko for amd64 and aarch64 along
with respective list/mutex init/destroy.

PR: 240989
Reported by: Alex S <iwtcex@gmail.com>


# c5156c77 13-May-2019 Dmitry Chagin <dchagin@FreeBSD.org>

Linuxulator depends on a fundamental kernel settings such as SMP. Many
of them listed in opt_global.h which is not generated while building
modules outside of a kernel and such modules never match real cofigured
kernel.

So, we should prevent our users from building obviously defective modules.

Therefore, remove the root cause of the building of modules outside of a
kernel - the possibility of building modules with DEBUG or KTR flags.
And remove all of DEBUG printfs as it is incomplete and in threaded
programms not informative, also a half of system call does not have DEBUG
printf. For debuging Linux programms we have dtrace, ktr and ktrace ability.

PR: 222861
Reviewed by: trasz
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D20178


# 1699546d 01-Mar-2019 Edward Tomasz Napierala <trasz@FreeBSD.org>

Remove sv_pagesize, originally introduced with r100384.

In all of the architectures we have today, we always use PAGE_SIZE.
While in theory one could define different things, none of the
current architectures do, even the ones that have transitioned from
32-bit to 64-bit like i386 and arm. Some ancient mips binaries on
other systems used 8k instead of 4k, but we don't support running
those and likely never will due to their age and obscurity.

Reviewed by: imp (who also contributed the commit message)
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D19280


# 628888f0 19-Dec-2018 Mateusz Guzik <mjg@FreeBSD.org>

Remove iBCS2, part2: general kernel

Reviewed by: kib (previous version)
Sponsored by: The FreeBSD Foundation


# 02bf7e5e 06-Nov-2018 Tijl Coosemans <tijl@FreeBSD.org>

Fix builds with COMPAT_LINUX32 in the kernel config.

MFC after: 3 days


# 8fc08087 06-Nov-2018 Tijl Coosemans <tijl@FreeBSD.org>

On amd64 both Linux compat modules, linux.ko and linux64.ko, provide
linux_ioctl_(un)register_handler that allows other driver modules to
register ioctl handlers. The ioctl syscall implementation in each Linux
compat module iterates over the list of handlers and forwards the call to
the appropriate driver. Because the registration functions have the same
name in each module it is not possible for a driver to support both 32 and
64 bit linux compatibility.

Move the list of ioctl handlers to linux_common.ko so it is shared by
both Linux modules and all drivers receive both 32 and 64 bit ioctl calls
with one registration. These ioctl handlers normally forward the call
to the FreeBSD ioctl handler which can handle both 32 and 64 bit.

Keep the special COMPAT_LINUX32 ioctl handlers in linux.ko in a separate
list for now and let the ioctl syscall iterate over that list first.
Later, COMPAT_LINUX32 support can be added to the 64 bit ioctl handlers
via a runtime check for ILP32 like is done for COMPAT_FREEBSD32 and then
this separate list would disappear again. That is a much bigger effort
however and this commit is meant to be MFCable.

This enables linux64 support in x11/nvidia-driver*.

PR: 206711
Reviewed by: kib
MFC after: 3 days


# 35755049 21-Jun-2018 Chuck Tuffli <chuck@FreeBSD.org>

Fix the Linux kernel version number calculation

The Linux compatibility code was converting the version number (e.g.
2.6.32) in two different ways and then comparing the results.

The linux_map_osrel() function converted MAJOR.MINOR.PATCH similar to
what FreeBSD does natively. I.e. where major=v0, minor=v1, and patch=v2
v = v0 * 1000000 + v1 * 1000 + v2;

The LINUX_KERNVER() macro, on the other hand, converted the value with
bit shifts. I.e. where major=a, minor=b, and patch=c
v = (((a) << 16) + ((b) << 8) + (c))

The Linux kernel uses the later format via the KERNEL_VERSION() macro in
include/generated/uapi/linux/version.h

Fix is to use the LINUX_KERNVER() macro in linux_map_osrel() as well as
in the .trans_osrel functions.

PR: 229209
Reviewed by: emaste, cem, imp (mentor)
Approved by: imp (mentor)
Differential Revision: https://reviews.freebsd.org/D15952


# 6362b1a6 12-Jun-2018 Jung-uk Kim <jkim@FreeBSD.org>

Fix number of auxargs entries to copy out for 32-bit Linuxulator.

PR: 228790


# cbf7e0cb 29-May-2018 Brooks Davis <brooks@FreeBSD.org>

Correct pointer subtraction in KASSERT().

The assertion would never fire without truly spectacular future
programming errors.

Reported by: Coverity
CID: 1391370
Sponsored by: DARPA, AFRL


# 5f77b8a8 24-May-2018 Brooks Davis <brooks@FreeBSD.org>

Avoid two suword() calls per auxarg entry.

Instead, construct an auxargs array and copy it out all at once.

Use an array of Elf_Auxinfo rather than pairs of Elf_Addr * to represent
the array. This is the correct type where pairs of words just happend
to work. To reduce the size of the diff, AUXARGS_ENTRY is altered to act
on this array rather than introducing a new macro.

Return errors on copyout() and suword() failures and handle them in the
caller.

Incidentally fixes AT_RANDOM and AT_EXECFN in 32-bit linux on amd64
which incorrectly used AUXARG_ENTRY instead of AUXARGS_ENTRY_32
(now removed due to the use of proper types).

Reviewed by: kib
Comments from: emaste, jhb
Obtained from: CheriBSD
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D15485


# 73c8686e 19-Apr-2018 John Baldwin <jhb@FreeBSD.org>

Simplify the code to allocate stack for auxv, argv[], and environment vectors.

Remove auxarg_size as it was only used once right after a confusing
assignment in each of the variants of exec_copyout_strings().

Reviewed by: emaste
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D15123


# 7c5d1690 12-Apr-2018 Konstantin Belousov <kib@FreeBSD.org>

Fix PSL_T inheritance on exec for x86.

The miscellaneous x86 sysent->sv_setregs() implementations tried to
migrate PSL_T from the previous program to the new executed one, but
they evaluated regs->tf_eflags after the whole regs structure was
bzeroed. Make this functional by saving PSL_T value before zeroing.

Note that if the debugger is not attached, executing the first
instruction in the new program with PSL_T set results in SIGTRAP, and
since all intercepted signals are reset to default dispostion on
exec(2), this means that non-debugged process gets killed immediately
if PSL_T is inherited. In particular, since suid images drop
P_TRACED, attempt to set PSL_T for execution of such program would
kill the process.

Another issue with userspace PSL_T handling is that it is reset by
trap(). It is reasonable to clear PSL_T when entering SIGTRAP
handler, to allow the signal to be handled without recursion or
delivery of blocked fault. But it is not reasonable to return back to
the normal flow with PSL_T cleared. This is too late to change, I
think.

Discussed with: bde, Ali Mashtizadeh
Sponsored by: The FreeBSD Foundation
MFC after: 3 weeks
Differential revision: https://reviews.freebsd.org/D14995


# c7fb0e1d 09-Apr-2018 Ed Maste <emaste@FreeBSD.org>

linuxulator: add else case braces to reduce diffs between archs

Sponsored by: Turing Robotic Industries Inc.


# b267239d 09-Apr-2018 Ed Maste <emaste@FreeBSD.org>

linuxulator: deduplicate linux_exec_imgact_try

Previously linuxulator had three identical copies of
linux_exec_imgact_try. Deduplicate before adding another arch to
linuxulator.

Sponsored by: Turing Robotic Industries Inc
Differential Revision: https://reviews.freebsd.org/D14856


# 6469bdcd 06-Apr-2018 Brooks Davis <brooks@FreeBSD.org>

Move most of the contents of opt_compat.h to opt_global.h.

opt_compat.h is mentioned in nearly 180 files. In-progress network
driver compabibility improvements may add over 100 more so this is
closer to "just about everywhere" than "only some files" per the
guidance in sys/conf/options.

Keep COMPAT_LINUX32 in opt_compat.h as it is confined to a subset of
sys/compat/linux/*.c. A fake _COMPAT_LINUX option ensure opt_compat.h
is created on all architectures.

Move COMPAT_LINUXKPI to opt_dontuse.h as it is only used to control the
set of compiled files.

Reviewed by: kib, cem, jhb, jtl
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D14941


# d41e41f9 27-Mar-2018 John Baldwin <jhb@FreeBSD.org>

Remove very old and unused signal information codes.

These have been supplanted by the MI signal information codes in
<sys/signal.h> since 7.0. The FPE_*_TRAP ones were deprecated even
earlier in 1999.

PR: 226579 (exp-run)
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D14637


# f8268d4d 23-Mar-2018 Ed Maste <emaste@FreeBSD.org>

Remove redundant cast from Linuxulator SYSINITs


# ad448975 23-Mar-2018 Ed Maste <emaste@FreeBSD.org>

Fixup return style(9) in amd64 linux*_sysvec.c

Sponsored by: Turing Robotic Industries Inc.


# c0aa0e2c 23-Mar-2018 Ed Maste <emaste@FreeBSD.org>

Sort headers in MD Linuxulator files

Bring #includes closer to style(9) and reduce differences between the
(three) MD versions of linux_machdep.c and linux_sysvec.c.

Sponsored by: Turing Robotic Industries Inc.


# 1ac2776b 21-Mar-2018 Ed Maste <emaste@FreeBSD.org>

Share Linux errno table with libsysdecode

Requested by: jhb
Reviewed by: jhb
Sponsored by: Turing Robotic Industries Inc.


# dc858467 19-Mar-2018 Ed Maste <emaste@FreeBSD.org>

Rename linuxulator functions with linux_ prefix

It's preferable to have a consistent prefix. This also reduces
differences between the three linux*_sysvec.c files.

Sponsored by: Turing Robotic Industries Inc.


# 9bec2ea6 19-Mar-2018 Ed Maste <emaste@FreeBSD.org>

linux*_sysvec.c: rationalize whitespace and comments

There's a fair amount of duplication between MD linuxulator files.
Make indentation and comments consistent between the three versions of
linux_sysvec.c to reduce diffs when comparing them.

Sponsored by: Turing Robotic Industries Inc.


# 6e481f83 16-Mar-2018 Ed Maste <emaste@FreeBSD.org>

Share a single bsd-linux errno table across MD consumers

Three copies of the linuxulator linux_sysvec.c contained identical
BSD to Linux errno translation tables, and future work to support other
architectures will also use the same table. Move the table to a common
file to be used by all. Make it 'const int' to place it in .rodata.

(Some existing Linux architectures use MD errno values, but x86 and Arm
share the generic set.)

This change should introduce no functional change; a followup will add
missing errno values.

MFC after: 3 weeks
Sponsored by: Turing Robotic Industries Inc.
Differential Revision: https://reviews.freebsd.org/D14665


# 7b194b3d 14-Mar-2018 Ed Maste <emaste@FreeBSD.org>

Remove stray ; at end of linux_vdso_deinstall()


# a95659f7 13-Mar-2018 Ed Maste <emaste@FreeBSD.org>

Use C99 boolean type for translate_osrel

Migrate to modern types before creating MD Linuxolator bits for new
architectures.

Reviewed by: cem
Sponsored by: Turing Robotic Industries Inc.
Differential Revision: https://reviews.freebsd.org/D14676


# 4ba25759 12-Mar-2018 Ed Maste <emaste@FreeBSD.org>

Apply some style(9) to Linuxulator linux_sysvec.c comments


# eae594f7 21-Feb-2018 Ed Maste <emaste@FreeBSD.org>

Correct proper nouns in the Linuxulator

- Capitalize Linux
- Spell FreeBSD out in full
- Address some style(9) on changed lines

Sponsored by: Turing Robotic Industries Inc.


# 132f90c6 05-Feb-2018 Ed Maste <emaste@FreeBSD.org>

Linuxolator whitespace cleanup

A version of each of the MD files by necessity exists for each CPU
architecture supported by the Linuxolator. Clean these up so that new
architectures do not inherit whitespace issues.

Clean up shared Linuxolator files while here.

Sponsored by: Turing Robotic Industries Inc.


# cd76ee1e 27-Nov-2017 Fedor Uporov <fsu@FreeBSD.org>

Remap ENOATTR to ENODATA in the linuxulator.
In the linux ENOADATA is frequently #defined as ENOATTR.
The change is required for an xattrs support implementation.

MFC after: 1 week
Discussed with: netchild
Approved by: pfg

Differential Revision: https://reviews.freebsd.org/D13221


# c49761dd 27-Nov-2017 Pedro F. Giffuni <pfg@FreeBSD.org>

sys/amd64: further adoption of SPDX licensing ID tags.

Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.


# 814629dd 24-Nov-2017 Ed Schouten <ed@FreeBSD.org>

Don't let cpu_set_syscall_retval() clobber exec_setregs().

Upon successful completion, the execve() system call invokes
exec_setregs() to initialize the registers of the initial thread of the
newly executed process. What is weird is that when execve() returns, it
still goes through the normal system call return path, clobbering the
registers with the system call's return value (td->td_retval).

Though this doesn't seem to be problematic for x86 most of the times (as
the value of eax/rax doesn't matter upon startup), this can be pretty
frustrating for architectures where function argument and return
registers overlap (e.g., ARM). On these systems, exec_setregs() also
needs to initialize td_retval.

Even worse are architectures where cpu_set_syscall_retval() sets
registers to values not derived from td_retval. On these architectures,
there is no way cpu_set_syscall_retval() can set registers to the way it
wants them to be upon the start of execution.

To get rid of this madness, let sys_execve() return EJUSTRETURN. This
will cause cpu_set_syscall_retval() to leave registers intact. This
makes process execution easier to understand. It also eliminates the
difference between execution of the initial process and successive ones.
The initial call to sys_execve() is not performed through a system call
context.

Reviewed by: kib, jhibbits
Differential Revision: https://reviews.freebsd.org/D13180


# d95498d4 19-Oct-2017 Mateusz Guzik <mjg@FreeBSD.org>

amd64: avoid acquiring dt lock if possible (which is the common case)

Discussed with: kib
MFC after: 1 week


# c151945c 30-Jul-2017 Dmitry Chagin <dchagin@FreeBSD.org>

Avoid using [LINUX_]SHAREDPAGE constant directly in the vdso code.
This is needed for https://reviews.freebsd.org/D11780.

Reported by: kib@


# a0c59c7a 03-Jul-2017 Dmitry Chagin <dchagin@FreeBSD.org>

Add support for musl consumers to the Linuxulator.

PR: 213809
Submitted by: Yonas Yanfa
Reported by: Yonas Yanfa
MFC after: 1 week
Relnotes: yes


# 2d88da2f 12-Jun-2017 Konstantin Belousov <kib@FreeBSD.org>

Move struct syscall_args syscall arguments parameters container into
struct thread.

For all architectures, the syscall trap handlers have to allocate the
structure on the stack. The structure takes 88 bytes on 64bit arches
which is not negligible. Also, it cannot be easily found by other
code, which e.g. caused duplication of some members of the structure
to struct thread already. The change removes td_dbg_sc_code and
td_dbg_sc_nargs which were directly copied from syscall_args.

The structure is put into the copied on fork part of the struct thread
to make the syscall arguments information correct in the child after
fork.

This move will also allow several more uses shortly.

Reviewed by: jhb (previous version)
Sponsored by: The FreeBSD Foundation
MFC after: 3 weeks
X-Differential revision: https://reviews.freebsd.org/D11080


# b66bb393 22-Apr-2016 Pedro F. Giffuni <pfg@FreeBSD.org>

Cleanup redundant parenthesis from existing howmany()/roundup() macro uses.


# ea24b056 19-Apr-2016 Pedro F. Giffuni <pfg@FreeBSD.org>

X86: use our nitems() macro when it is avaliable through param.h.

No functional change, only trivial cases are done in this sweep,

Discussed in: freebsd-current


# b6348be7 05-Apr-2016 Baptiste Daroussin <bapt@FreeBSD.org>

Add kern.features flags for linux and linux64 modules

kern.features.linux: 1 meaning linux 32 bits binaries are supported
kern.features.linux64: 1 meaning linux 64 bits binaries are supported

The goal here is to help 3rd party applications (including ports) to determine
if the host do support linux emulation

Reviewed by: dchagin
MFC after: 1 week
Relnotes: yes
Differential Revision: D5830


# aa949be5 27-Jan-2016 John Baldwin <jhb@FreeBSD.org>

Convert ss_sp in stack_t and sigstack to void *.

POSIX requires these members to be of type void * rather than the
char * inherited from 4BSD. NetBSD and OpenBSD both changed their
fields to void * back in 1998. No new build failures were reported
via an exp-run.

PR: 206503 (exp-run)
Reviewed by: kib
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D5092


# 669414e4 27-Jan-2016 Xin LI <delphij@FreeBSD.org>

Implement AT_SECURE properly.

AT_SECURE auxv entry has been added to the Linux 2.5 kernel to pass a
boolean flag indicating whether secure mode should be enabled. 1 means
that the program has changes its credentials during the execution.
Being exported AT_SECURE used by glibc issetugid() call.

Submitted by: imp, dchagin
Security: FreeBSD-SA-16:10.linux
Security: CVE-2016-1883


# 038c7205 09-Jan-2016 Dmitry Chagin <dchagin@FreeBSD.org>

Implement vsyscall hack. Prior to 2.13 glibc uses vsyscall
instead of vdso. An upcoming linux_base-c6 needs it.

Differential Revision: https://reviews.freebsd.org/D1090

Reviewed by: kib, trasz
MFC after: 1 week


# 724f4b62 28-Nov-2015 Konstantin Belousov <kib@FreeBSD.org>

Remove sv_prepsyscall, sv_sigsize and sv_sigtbl members of the struct
sysent.

sv_prepsyscall is unused.

sv_sigsize and sv_sigtbl translate signal number from the FreeBSD
namespace into the ABI domain. It is only utilized on i386 for iBCS2
binaries. The issue with this approach is that signals for iBCS2 were
delivered with the FreeBSD signal frame layout, which does not follow
iBCS2. The same note is true for any other potential user if
sv_sigtbl. In other words, if ABI needs signal number translation, it
really needs custom sv_sendsig method instead.

Sponsored by: The FreeBSD Foundation


# 6cea44a7 29-Oct-2015 John Baldwin <jhb@FreeBSD.org>

Fix build with DEBUG defined.

Reported by: hselasky


# 2f99bcce 22-Oct-2015 John Baldwin <jhb@FreeBSD.org>

Rename remaining linux32 symbols such as linux_sysent[] and
linux_syscallnames[] from linux_* to linux32_* to avoid conflicts with
linux64.ko. While here, add support for linux64 binaries to systrace.
- Update NOPROTO entries in amd64/linux/syscalls.master to match the
main table to fix systrace build.
- Add a special case for union l_semun arguments to the systrace
generation.
- The systrace_linux32 module now only builds the systrace_linux32.ko.
module on amd64.
- Add a new systrace_linux module that builds on both i386 and amd64.
For i386 it builds the existing systrace_linux.ko. For amd64 it
builds a systrace_linux.ko for 64-bit binaries.

Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D3954


# 5047105b 22-Oct-2015 John Baldwin <jhb@FreeBSD.org>

Merge r289055 to amd64/linux32:

linux: fix handling of out-of-bounds syscall attempts

Due to an off by one the code would read an entry past the table, as
opposed to the last entry which contains the nosys handler.


# 4ab7403b 24-May-2015 Dmitry Chagin <dchagin@FreeBSD.org>

Rework signal code to allow using it by other modules, like linprocfs:

1. Linux sigset always 64 bit on all platforms. In order to move Linux
sigset code to the linux_common module define it as 64 bit int. Move
Linux sigset manipulation routines to the MI path.

2. Move Linux signal number definitions to the MI path. In general, they
are the same on all platforms except for a few signals.

3. Map Linux RT signals to the FreeBSD RT signals and hide signal conversion
tables to avoid conversion errors.

4. Emulate Linux SIGPWR signal via FreeBSD SIGRTMIN signal which is outside
of allowed on Linux signal numbers.

PR: 197216


# fcdffc03 24-May-2015 Dmitry Chagin <dchagin@FreeBSD.org>

Call nosys in case when the incorrect syscall number is specified.

Reported by: trinity


# 26c68e1f 24-May-2015 Dmitry Chagin <dchagin@FreeBSD.org>

Use the BSD_TO_LINUX_SIGNAL() wherever there is no need
to check the ABI as it is known.

Differential Revision: https://reviews.freebsd.org/D1086


# 4048f59c 24-May-2015 Dmitry Chagin <dchagin@FreeBSD.org>

Add AT_RANDOM and AT_EXECFN auxiliary vector entries which are used by
glibc. At list since glibc version 2.16 using AT_RANDOM is mandatory.

Differential Revision: https://reviews.freebsd.org/D1080


# bc273677 24-May-2015 Dmitry Chagin <dchagin@FreeBSD.org>

Refund the proc emuldata struct for future use. For now move flags from
thread emuldata to proc emuldata as it was originally intended.

As we can have both 64 & 32 bit Linuxulator running any eventhandler
can be called twice for us. To prevent this move eventhandlers code
from linux_emul.c to the linux_common.ko module.

Differential Revision: https://reviews.freebsd.org/D1073


# 67d39748 24-May-2015 Dmitry Chagin <dchagin@FreeBSD.org>

Introduce a new module linux_common.ko which is intended for the
following primary purposes:

1. Remove the dependency of linsysfs and linprocfs modules from linux.ko,
which will be architecture specific on amd64.

2. Incorporate into linux_common.ko general code for platforms on which
we'll support two Linuxulator modules (for both instruction set - 32 & 64 bit).

3. Move malloc(9) declaration to linux_common.ko, to enable getting memory
usage statistics properly.

Currently linux_common.ko incorporates a code from linux_mib.c and linux_util.c
and linprocfs, linsysfs and linux kernel modules depend on linux_common.ko.

Temporarily remove dtrace garbage from linux_mib.c and linux_util.c

Differential Revision: https://reviews.freebsd.org/D1072
In collaboration with: Vassilis Laganakos.

Reviewed by: trasz


# 26cf41d6 24-May-2015 Dmitry Chagin <dchagin@FreeBSD.org>

Remove stale comment about a signal trampoline which
is moved to the shared page at r219609.

Differential Revision: https://reviews.freebsd.org/D1063
Reviewed by: trasz


# 0020bdf1 24-May-2015 Dmitry Chagin <dchagin@FreeBSD.org>

Put linux_platform into the vdso to avoid copying it onto the stack at
every exec.

Differential Revision: https://reviews.freebsd.org/D1062
Reviewed by: trasz


# bdc37934 24-May-2015 Dmitry Chagin <dchagin@FreeBSD.org>

Implement vdso - virtual dynamic shared object. Through vdso Linux
exposes functions from kernel with proper DWARF CFI information so that
it becomes easier to unwind through them.
Using vdso is a mandatory for a thread cancelation && cleanup
on a modern glibc.

Differential Revision: https://reviews.freebsd.org/D1060


# af682d48 24-May-2015 Dmitry Chagin <dchagin@FreeBSD.org>

Some style(9) && whitespaces fixes. No functional changes.

Differential Revision: https://reviews.freebsd.org/D1041
Reviewed by: emaste


# 81338031 24-May-2015 Dmitry Chagin <dchagin@FreeBSD.org>

Switch linuxulator to use the native 1:1 threads.

The reasons:
1. Get rid of the stubs/quirks with process dethreading,
process reparent when the process group leader exits and close
to this problems on wait(), waitpid(), etc.
2. Reuse our kernel code instead of writing excessive thread
managment routines in Linuxulator.

Implementation details:

1. The thread is created via kern_thr_new() in the clone() call with
the CLONE_THREAD parameter. Thus, everything else is a process.
2. The test that the process has a threads is done via P_HADTHREADS
bit p_flag of struct proc.
3. Per thread emulator state data structure is now located in the
struct thread and freed in the thread_dtor() hook.
Mandatory holdig of the p_mtx required when referencing emuldata
from the other threads.
4. PID mangling has changed. Now Linux pid is the native tid
and Linux tgid is the native pid, with the exception of the first
thread in the process where tid and pid are one and the same.

Ugliness:

In case when the Linux thread is the initial thread in the thread
group thread id is equal to the process id. Glibc depends on this
magic (assert in pthread_getattr_np.c). So for system calls that
take thread id as a parameter we should use the special method
to reference struct thread.

Differential Revision: https://reviews.freebsd.org/D1039


# 2dedc128 16-Jun-2014 Dmitry Chagin <dchagin@FreeBSD.org>

Revert r266925 as it can lead to instant panic at fexecve():

To allow to run the interpreter itself add a new ELF branding type.

Pointed out by: kib, mjg


# 5f56da18 31-May-2014 Dmitry Chagin <dchagin@FreeBSD.org>

To allow to run the interpreter itself add a new ELF branding type.
Allow Linux ABI to run ELF interpreter.

MFC after: 3 days


# 3d271aaa 14-Nov-2013 Ed Maste <emaste@FreeBSD.org>

x86: Allow users to change PSL_RF via ptrace(PT_SETREGS...)

Debuggers may need to change PSL_RF. Note that tf_eflags is already stored
in the signal context during signal handling and PSL_RF previously could be
modified via sigreturn, so this change should not provide any new ability
to userspace.

For background see the thread at:
http://lists.freebsd.org/pipermail/freebsd-i386/2007-September/005910.html

Reviewed by: jhb, kib
Sponsored by: DARPA, AFRL


# d127f153 09-May-2013 Dmitry Chagin <dchagin@FreeBSD.org>

Retire write-only PCB_GS32BIT pcb flag on amd64.


# d825ce0a 29-Jan-2013 John Baldwin <jhb@FreeBSD.org>

Reduce duplication between i386/linux/linux.h and amd64/linux32/linux.h
by moving bits that are MI out into headers in compat/linux.

Reviewed by: Chagin Dmitry dmitry | gmail
MFC after: 2 weeks


# 9823d527 10-Oct-2012 Kevin Lo <kevlo@FreeBSD.org>

Revert previous commit...

Pointyhat to: kevlo (myself)


# a10cee30 09-Oct-2012 Kevin Lo <kevlo@FreeBSD.org>

Prefer NULL over 0 for pointers


# 9a14aa01 15-Jan-2012 Ulrich Spörlein <uqs@FreeBSD.org>

Convert files to UTF-8


# 6472ac3d 07-Nov-2011 Ed Schouten <ed@FreeBSD.org>

Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs.

The SYSCTL_NODE macro defines a list that stores all child-elements of
that node. If there's no SYSCTL_DECL macro anywhere else, there's no
reason why it shouldn't be static.


# acface68 26-Mar-2011 Dmitry Chagin <dchagin@FreeBSD.org>

Export the correct AT_PLATFORM value.
Since signal trampolines are copied to the shared page do not need to
leave place on the stack for it. Forgotten in the previous commit.

MFC after: 1 Week


# 8f1e49a6 13-Mar-2011 Dmitry Chagin <dchagin@FreeBSD.org>

Enable shared page use for amd64/linux32 and i386/linux binaries.
Move signal trampoline code from the top of the stack to the shared page.

MFC after: 2 Weeks


# e5d81ef1 08-Mar-2011 Dmitry Chagin <dchagin@FreeBSD.org>

Extend struct sysvec with new method sv_schedtail, which is used for an
explicit process at fork trampoline path instead of eventhadler(schedtail)
invocation for each child process.

Remove eventhandler(schedtail) code and change linux ABI to use newly added
sysvec method.

While here replace explicit comparing of module sysentvec structure with the
newly created process sysentvec to detect the linux ABI.

Discussed with: kib

MFC after: 2 Week


# fde63162 13-Feb-2011 Dmitry Chagin <dchagin@FreeBSD.org>

Sort include files in the alphabetical order.


# bdbf2db5 14-Jan-2011 Jung-uk Kim <jkim@FreeBSD.org>

Remove redundant, bogus, and even harmful uses of setting TS bit in CR0.
It is done from fpstate_drop() when it is really necessary.

Reviewed by: kib
MFC after: 1 week


# e6c006d9 21-Dec-2010 Jung-uk Kim <jkim@FreeBSD.org>

Improve PCB flags handling and make it more robust. Add two new functions
for manipulating pcb_flags. These inline functions are very similar to
atomic_set_char(9) and atomic_clear_char(9) but without unnecessary LOCK
prefix for SMP. Add comments about the rationale[1]. Use these functions
wherever possible. Although there are some places where it is not strictly
necessary (e.g., a PCB is copied to create a new PCB), it is done across
the board for sake of consistency. Turn pcb_full_iret into a PCB flag as
it is safe now. Move rarely used fields before pcb_flags and reduce size
of pcb_flags to one byte. Fix some style(9) nits in pcb.h while I am in
the neighborhood.

Reviewed by: kib
Submitted by: kib[1]
MFC after: 2 months


# 1b3c3256 06-Dec-2010 Konstantin Belousov <kib@FreeBSD.org>

Update some comments related to use of amd64 full context switch.
In exec_linux_setregs(), use locally cached pointer to pcb to set
pcb_full_iret.
In set_regs(), note that full return is needed when code that sets
segment registers is enabled.

MFC after: 1 week


# 0f0170e6 06-Dec-2010 Konstantin Belousov <kib@FreeBSD.org>

Retire write-only PCB_FULLCTX pcb flag on amd64.

Reminded by: Petr Salinger <Petr.Salinger seznam cz>
Tested by: pho
MFC after: 1 week


# a7d5f7eb 19-Oct-2010 Jamie Gritton <jamie@FreeBSD.org>

A new jail(8) with a configuration file, to replace the work currently done
by /etc/rc.d/jail.


# 78ae4338 12-Oct-2010 Konstantin Belousov <kib@FreeBSD.org>

Add macro DECLARE_MODULE_TIED to denote a module as requiring the
kernel of exactly the same __FreeBSD_version as the headers module was
compiled against.

Mark our in-tree ABI emulators with DECLARE_MODULE_TIED. The modules
use kernel interfaces that the Release Engineering Team feel are not
stable enough to guarantee they will not change during the life cycle
of a STABLE branch. In particular, the layout of struct sysentvec is
declared to be not part of the STABLE KBI.

Discussed with: bz, rwatson
Approved by: re (bz, kensmith)
MFC after: 2 weeks


# a14a9498 27-Jul-2010 Alan Cox <alc@FreeBSD.org>

The interpreter name should no longer be treated as a buffer that can be
overwritten. (This change should have been included in r210545.)

Submitted by: kib


# afe1a688 23-May-2010 Konstantin Belousov <kib@FreeBSD.org>

Reorganize syscall entry and leave handling.

Extend struct sysvec with three new elements:
sv_fetch_syscall_args - the method to fetch syscall arguments from
usermode into struct syscall_args. The structure is machine-depended
(this might be reconsidered after all architectures are converted).
sv_set_syscall_retval - the method to set a return value for usermode
from the syscall. It is a generalization of
cpu_set_syscall_retval(9) to allow ABIs to override the way to set a
return value.
sv_syscallnames - the table of syscall names.

Use sv_set_syscall_retval in kern_sigsuspend() instead of hardcoding
the call to cpu_set_syscall_retval().

The new functions syscallenter(9) and syscallret(9) are provided that
use sv_*syscall* pointers and contain the common repeated code from
the syscall() implementations for the architecture-specific syscall
trap handlers.

Syscallenter() fetches arguments, calls syscall implementation from
ABI sysent table, and set up return frame. The end of syscall
bookkeeping is done by syscallret().

Take advantage of single place for MI syscall handling code and
implement ptrace_lwpinfo pl_flags PL_FLAG_SCE, PL_FLAG_SCX and
PL_FLAG_EXEC. The SCE and SCX flags notify the debugger that the
thread is stopped at syscall entry or return point respectively. The
EXEC flag augments SCX and notifies debugger that the process address
space was changed by one of exec(2)-family syscalls.

The i386, amd64, sparc64, sun4v, powerpc and ia64 syscall()s are
changed to use syscallenter()/syscallret(). MIPS and arm are not
converted and use the mostly unchanged syscall() implementation.

Reviewed by: jhb, marcel, marius, nwhitehorn, stas
Tested by: marcel (ia64), marius (sparc64), nwhitehorn (powerpc),
stas (mips)
MFC after: 1 month


# 4ccf64eb 06-Apr-2010 Nathan Whitehorn <nwhitehorn@FreeBSD.org>

MFC r205014,205015:

Provide groundwork for 32-bit binary compatibility on non-x86 platforms,
for upcoming 64-bit PowerPC and MIPS support. This renames the COMPAT_IA32
option to COMPAT_FREEBSD32, removes some IA32-specific code from MI parts
of the kernel and enhances the freebsd32 compatibility code to support
big-endian platforms.

This MFC is required for MFCs of later changes to the freebsd32
compatibility from HEAD.

Requested by: kib


# a107d8aa 25-Mar-2010 Nathan Whitehorn <nwhitehorn@FreeBSD.org>

Change the arguments of exec_setregs() so that it receives a pointer
to the image_params struct instead of several members of that struct
individually. This makes it easier to expand its arguments in the future
without touching all platforms.

Reviewed by: jhb


# 841c0c7e 11-Mar-2010 Nathan Whitehorn <nwhitehorn@FreeBSD.org>

Provide groundwork for 32-bit binary compatibility on non-x86 platforms,
for upcoming 64-bit PowerPC and MIPS support. This renames the COMPAT_IA32
option to COMPAT_FREEBSD32, removes some IA32-specific code from MI parts
of the kernel and enhances the freebsd32 compatibility code to support
big-endian platforms.

Reviewed by: kib, jhb


# 43ba7803 19-Dec-2009 Konstantin Belousov <kib@FreeBSD.org>

MFC r198507:
Use kern_sigprocmask() instead of direct manipulation of td_sigmask to
reschedule newly blocked signals.

MFC r198590:
Trapsignal() calls kern_sigprocmask() when delivering catched signal
with proc lock held.

MFC r198670:
For trapsignal() and postsig(), kern_sigprocmask() is called with
both process lock and curproc->p_sigacts->ps_mtx locked. Prevent lock
recursion on ps_mtx in reschedule_signals().


# d6e029ad 27-Oct-2009 Konstantin Belousov <kib@FreeBSD.org>

In r197963, a race with thread being selected for signal delivery
while in kernel mode, and later changing signal mask to block the
signal, was fixed for sigprocmask(2) and ptread_exit(3). The same race
exists for sigreturn(2), setcontext(2) and swapcontext(2) syscalls.

Use kern_sigprocmask() instead of direct manipulation of td_sigmask to
reschedule newly blocked signals, closing the race.

Reviewed by: davidxu
Tested by: pho
MFC after: 1 month


# ac63e409 27-Aug-2009 Bjoern A. Zeeb <bz@FreeBSD.org>

MFC r196512:

Fix handling of .note.ABI-tag section for GNU systems [1].
Handle GNU/Linux according to LSB Core Specification 4.0,
Chapter 11. Object Format, 11.8. ABI note tag.

Also check the first word of desc, not only name, according to
glibc abi-tags specification to distinguish between Linux and
kFreeBSD.

Add explicit handling for Debian GNU/kFreeBSD, which runs
on our kernels as well [2].

In {amd64,i386}/trap.c, when checking osrel of the current process,
also check the ABI to not change the signal behaviour for Linux
binary processes, now that we save an osrel version for all three
from the lists above in struct proc [2].

These changes make it possible to run FreeBSD, Debian GNU/kFreeBSD
and Linux binaries on the same machine again for at least i386 and
amd64, and no longer break kFreeBSD which was detected as GNU(/Linux).

PR: kern/135468
Submitted by: dchagin [1] (initial patch)
Suggested by: kib [2]
Tested by: Petr Salinger (Petr.Salinger seznam.cz) for kFreeBSD
Reviewed by: kib
Approved by: re (kensmith)


# 89ffc202 24-Aug-2009 Bjoern A. Zeeb <bz@FreeBSD.org>

Fix handling of .note.ABI-tag section for GNU systems [1].
Handle GNU/Linux according to LSB Core Specification 4.0,
Chapter 11. Object Format, 11.8. ABI note tag.

Also check the first word of desc, not only name, according to
glibc abi-tags specification to distinguish between Linux and
kFreeBSD.

Add explicit handling for Debian GNU/kFreeBSD, which runs
on our kernels as well [2].

In {amd64,i386}/trap.c, when checking osrel of the current process,
also check the ABI to not change the signal behaviour for Linux
binary processes, now that we save an osrel version for all three
from the lists above in struct proc [2].

These changes make it possible to run FreeBSD, Debian GNU/kFreeBSD
and Linux binaries on the same machine again for at least i386 and
amd64, and no longer break kFreeBSD which was detected as GNU(/Linux).

PR: kern/135468
Submitted by: dchagin [1] (initial patch)
Suggested by: kib [2]
Tested by: Petr Salinger (Petr.Salinger seznam.cz) for kFreeBSD
Reviewed by: kib
MFC after: 3 days


# a2622e5d 09-Jul-2009 Konstantin Belousov <kib@FreeBSD.org>

Restore the segment registers and segment base MSRs for amd64 syscall
return path only when neither thread was context switched while
executing syscall code nor syscall explicitely modified LDT or MSRs.

Save segment registers in trap handlers before interrupts are enabled,
to not allow context switches to happen before registers are saved.
Use separated byte in pcb for indication of fast/full return, since
pcb_flags are not synchronized with context switches.

The change puts back syscall microbenchmark numbers that were slowed
down after commit of the support for LDT on amd64.

Reviewed by: jeff
Tested (and tested, and tested ...) by: pho
Approved by: re (kensmith)


# 8d30f381 10-May-2009 Dmitry Chagin <dchagin@FreeBSD.org>

Do not export AT_CLKTCK when emulating Linux kernel prior
to 2.4.0, as it has appeared in the 2.4.0-rc7 first time.
Being exported, AT_CLKTCK is returned by sysconf(_SC_CLK_TCK),
glibc falls back to the hard-coded CLK_TCK value when aux entry
is not present.

Glibc versions prior to 2.2.1 always use hard-coded CLK_TCK value.

For older applications/libc's which depends on hard-coded CLK_TCK
value user should set compat.linux.osrelease less than 2.4.0.

Approved by: kib (mentor)


# 1ca16454 10-May-2009 Dmitry Chagin <dchagin@FreeBSD.org>

Rework r189362, r191883.
The frequency of the statistics clock is given by stathz.
Use stathz if it is available, otherwise use hz.

Pointed out by: bde

Approved by: kib (mentor)


# 7ae27ff4 07-May-2009 Jamie Gritton <jamie@FreeBSD.org>

Move the per-prison Linux MIB from a private one-off pointer to the new
OSD-based jail extensions. This allows the Linux MIB to accessed via
jail_set and jail_get, and serves as a demonstration of adding jail support
to a module.

Reviewed by: dchagin, kib
Approved by: bz (mentor)


# d789bfd5 02-May-2009 Dmitry Chagin <dchagin@FreeBSD.org>

Move extern variable definitions to the header file.

Approved by: kib (mentor)
MFC after: 1 month


# 79262bf1 01-May-2009 Dmitry Chagin <dchagin@FreeBSD.org>

Reimplement futexes.
Old implemention used Giant to protect the kernel data structures,
but at the same time called malloc(M_WAITOK), that could cause the
calling thread to sleep and lost Giant protection. User-visible
result was the missed wakeup.

New implementation uses one sx lock per futex. The sx protects
the futex structures and allows to sleep while copyin or copyout
are performed.

Unlike linux, we return EINVAL when FUTEX_CMP_REQUEUE operation
is requested and either caller specified futexes are equial or
second futex already exists. This is acceptable since the situation
can only occur from the application error, and glibc falls back to
old FUTEX_WAKE operation when FUTEX_CMP_REQUEUE returns an error.

Approved by: kib (mentor)
MFC after: 1 month


# cd899aad 05-Apr-2009 Dmitry Chagin <dchagin@FreeBSD.org>

Fix KBI breakage by r190520 which affects older linux.ko binaries:

1) Move the new field (brand_note) to the end of the Brandinfo structure.
2) Add a new flag BI_BRAND_NOTE that indicates that the brand_note pointer
is valid.
3) Use the brand_note field if the flag BI_BRAND_NOTE is set and as old
modules won't have the flag set, so the new field brand_note would be
ignored.

Suggested by: jhb
Reviewed by: jhb
Approved by: kib (mentor)
MFC after: 6 days


# 2c66ccca 01-Apr-2009 Konstantin Belousov <kib@FreeBSD.org>

Save and restore segment registers on amd64 when entering and leaving
the kernel on amd64. Fill and read segment registers for mcontext and
signals. Handle traps caused by restoration of the
invalidated selectors.

Implement user-mode creation and manipulation of the process-specific
LDT descriptors for amd64, see sysarch(2).

Implement support for TSS i/o port access permission bitmap for amd64.

Context-switch LDT and TSS. Do not save and restore segment registers on
the context switch, that is handled by kernel enter/leave trampolines
now. Remove segment restore code from the signal trampolines for
freebsd/amd64, freebsd/ia32 and linux/i386 for the same reason.

Implement amd64-specific compat shims for sysarch.

Linuxolator (temporary ?) switched to use gsbase for thread_area pointer.

TODO:
Currently, gdb is not adapted to show segment registers from struct reg.
Also, no machine-depended ptrace command is added to set segment
registers for debugged process.

In collaboration with: pho
Discussed with: peter
Reviewed by: jhb
Linuxolator tested by: dchagin


# 32c01de2 13-Mar-2009 Dmitry Chagin <dchagin@FreeBSD.org>

Implement new way of branding ELF binaries by looking to a
".note.ABI-tag" section.

The search order of a brand is changed, now first of all the
".note.ABI-tag" is looked through.

Move code which fetch osreldate for ELF binary to check_note() handler.

PR: 118473
Approved by: kib (mentor)


# 2ee8325f 05-Mar-2009 John Baldwin <jhb@FreeBSD.org>

A better fix for handling different FPU initial control words for different
ABIs:
- Store the FPU initial control word in the pcb for each thread.
- When first using the FPU, load the initial control word after restoring
the clean state if it is not the standard control word.
- Provide a correct control word for Linux/i386 binaries under
FreeBSD/amd64.
- Adjust the control word returned for fpugetregs()/npxgetregs() when a
thread hasn't used the FPU yet to reflect the real initial control
word for the current ABI.
- The Linux/i386 ABI for FreeBSD/i386 now properly sets the right control
word instead of trashing whatever the current state of the FPU is.

Reviewed by: bde


# 4d7c2e8a 03-Mar-2009 Dmitry Chagin <dchagin@FreeBSD.org>

Add AT_PLATFORM, AT_HWCAP and AT_CLKTCK auxiliary vector entries which
are used by glibc. This silents the message "2.4+ kernel w/o ELF notes?"
from some programs at start, among them are top and pkill.

Do the assignment of the vector entries in elf_linux_fixup()
as it is done in glibc.

Fix some minor style issues.

Submitted by: Marcin Cieslak <saper at SYSTEM PL>
Approved by: kib (mentor)
MFC after: 1 week


# d065e13d 31-Jan-2009 David E. O'Brien <obrien@FreeBSD.org>

Fix the inconsistent tabbing.

Noticed by: bde


# e6493bbe 31-Jan-2009 David E. O'Brien <obrien@FreeBSD.org>

Change some movl's to mov's. Newer GAS no longer accept 'movl' instructions
for moving between a segment register and a 32-bit memory location.

Looked at by: jhb


# 0e7faf39 16-Dec-2008 Warner Losh <imp@FreeBSD.org>

Remove obsolete AT_DEBUG stuff. It never should have been committed
in the first place, let alone migrated to linux emulation.

Reviewed by: peter, rdivacky


# b4cf0e62 21-Nov-2008 Konstantin Belousov <kib@FreeBSD.org>

Add sv_flags field to struct sysentvec with intention to provide description
of the ABI of the currently executing image. Change some places to test
the flags instead of explicit comparing with address of known sysentvec
structures to determine ABI features.

Discussed with: dchagin, imp, jhb, peter


# aa8b2011 19-Oct-2008 Konstantin Belousov <kib@FreeBSD.org>

Correctly fill siginfo for the signals delivered by linux tkill/tgkill.
It is required for async cancellation to work.

Fix PROC_LOCK leak in linux_tgkill when signal delivery attempt is made
to not linux process.

Do not call em_find(p, ...) with p unlocked.

Move common code for linux_tkill() and linux_tgkill() into
linux_do_tkill().

Change linux siginfo_t definition to match actual linux one. Extend
uid fields to 4 bytes from 2. The extension does not change structure
layout and is binary compatible with previous definition, because i386
is little endian, and each uid field has 2 byte padding after it.

Reported by: Nicolas Joly <njoly pasteur fr>
Submitted by: dchangin
MFC after: 1 month


# d7f03759 19-Oct-2008 Ulf Lilleengen <lulf@FreeBSD.org>

- Import the HEAD csup code which is the basis for the cvsmode work.


# f1f0dd9e 18-Oct-2008 Konstantin Belousov <kib@FreeBSD.org>

Set PCB_32BIT and clear PCB_GS32BIT for linux32 binaries.

Tested by: dchagin
MFC after: 3 days


# a8d403e1 24-Sep-2008 Konstantin Belousov <kib@FreeBSD.org>

Change the static struct sysentvec and struct Elf_Brandinfo initializers
to the C99 style. At least, it is easier to read sysent definitions
that way, and search for the actual instances of sigcode etc.

Explicitely initialize sysentvec.sv_maxssiz that was missed in most
sysvecs.

No objection from: jhb
MFC after: 1 month


# 48b05c3f 08-Apr-2008 Konstantin Belousov <kib@FreeBSD.org>

Implement the linux syscalls
openat, mkdirat, mknodat, fchownat, futimesat, fstatat, unlinkat,
renameat, linkat, symlinkat, readlinkat, fchmodat, faccessat.

Submitted by: rdivacky
Sponsored by: Google Summer of Code 2007
Tested by: pho


# 22eca0bf 13-Mar-2008 Konstantin Belousov <kib@FreeBSD.org>

Since version 4.3, gcc changed its behaviour concerning the i386/amd64
ABI and the direction flag, that is it now assumes that the direction
flag is cleared at the entry of a function and it doesn't clear once
more if needed. This new behaviour conforms to the i386/amd64 ABI.

Modify the signal handler frame setup code to clear the DF {e,r}flags
bit on the amd64/i386 for the signal handlers.

jhb@ noted that it might break old apps if they assumed DF == 1 would be
preserved in the signal handlers, but that such apps should be rare and
that older versions of gcc would not generate such apps.

Submitted by: Aurelien Jarno <aurelien aurel32 net>
PR: 121422
Reviewed by: jhb
MFC after: 2 weeks


# 6617724c 12-Mar-2008 Jeff Roberson <jeff@FreeBSD.org>

Remove kernel support for M:N threading.

While the KSE project was quite successful in bringing threading to
FreeBSD, the M:N approach taken by the kse library was never developed
to its full potential. Backwards compatibility will be provided via
libmap.conf for dynamically linked binaries and static binaries will
be broken.


# 96a2b635 20-Sep-2007 Konstantin Belousov <kib@FreeBSD.org>

Fill in cr2 in the signal context from ksi->ksi_addr.
Together with the sys/i386/i386/trap.c rev. 1.306 it fixes the PR.

Submitted by: rdivacky
Suggested by: jhb
Sponsored by: Google Summer of Code 2007
PR: kern/77710
Approved by: re (kensmith)


# 59d8f3ff 12-Jul-2007 John Baldwin <jhb@FreeBSD.org>

Fix a couple of issues with the stack limit for 32-bit processes on 64-bit
kernels exposed by the recent fixes to resource limits for 32-bit processes
on 64-bit kernels:
- Let ABIs expose their maximum stack size via a new pointer in sysentvec
and use that in preference to maxssiz during exec() rather than always
using maxssiz for all processses.
- Apply the ABI's limit fixup to the previous stack size when adjusting
RLIMIT_STACK to determine if the existing mapping for the stack needs to
be grown or shrunk (as well as how much it should be grown or shrunk).

Approved by: re (kensmith)


# 19059a13 14-May-2007 John Baldwin <jhb@FreeBSD.org>

Rework the support for ABIs to override resource limits (used by 32-bit
processes under 64-bit kernels). Previously, each 32-bit process overwrote
its resource limits at exec() time. The problem with this approach is that
the new limits affect all child processes of the 32-bit process, including
if the child process forks and execs a 64-bit process. To fix this, don't
ovewrite the resource limits during exec(). Instead, sv_fixlimits() is
now replaced with a different function sv_fixlimit() which asks the ABI to
sanitize a single resource limit. We then use this when querying and
setting resource limits. Thus, if a 32-bit process sets a limit, then
that new limit will be inherited by future children. However, if the
32-bit process doesn't change a limit, then a future 64-bit child will
see the "full" 64-bit limit rather than the 32-bit limit.

MFC is tentative since it will break the ABI of old linux.ko modules (no
other modules are affected).

MFC after: 1 week


# 357afa71 02-Apr-2007 Jung-uk Kim <jkim@FreeBSD.org>

MFP4: Turn emul_lock into a mutex.

Submitted by: rdivacky


# 9c5b213e 29-Mar-2007 Jung-uk Kim <jkim@FreeBSD.org>

MFP4: Linux set_thread_area syscall (aka TLS) support for amd64.

Initial version was submitted by Divacky Roman and mostly rewritten by me.

Tested by: emulation


# 786e4fc4 03-Dec-2006 Alexander Leidinger <netchild@FreeBSD.org>

MFP4 (110939):

MFi386: return EOPNOTSUPP for unknown module events.

Submitted by: rdivacky


# d4d2a400 31-Oct-2006 Konstantin Belousov <kib@FreeBSD.org>

Fix a typo resulting in truncated linux32 signal trampoline code copied
to the usermode. Usually, signal handler segfaulted on return.

Reviewed by: jhb
MFC after: 3 days


# bb59e63f 09-Sep-2006 Alexander Leidinger <netchild@FreeBSD.org>

Change futex lock from mutex to sx. Make futex_get atomic (protected by the
futex lock).

Sponsored by: Google SoC 2006
Submitted by: rdivacky
Suggested by: jhb


# 94cb2ecf 17-Aug-2006 Alexander Leidinger <netchild@FreeBSD.org>

Move some stuff into headers where they belong.

Sponsored by: Google SoC 2006
Submitted by: rdivacky
Noticed by: jhb, ssouhlal


# c632e9d3 17-Aug-2006 Alexander Leidinger <netchild@FreeBSD.org>

Initialize the emul sx-lock.

Sponsored by: Google SoC 2006
Submitted by: rdivacky


# 7c09e6c0 15-Aug-2006 Alexander Leidinger <netchild@FreeBSD.org>

Initialize the eventhandlers, mutexes and sx locks.

Sponsored by: Google SoC 2006
Submitted by: rdivacky


# 9b44bfc5 14-Aug-2006 Alexander Leidinger <netchild@FreeBSD.org>

Add the linux 2.6.x stuff (not used by default!):
- TLS - complete
- pid/tid mangling - complete
- thread area - complete
- futexes - complete with issues
- clone() extension - complete with some possible minor issues
- mq*/timer*/clock* stuff - complete but untested and the mq* stuff is
disabled when not build as part of the kernel with native FreeBSD mq*
support (module support for this will come later)

Tested with:
- linux-firefox - works, tested
- linux-opera - works, tested
- linux-realplay - doesnt work, issue with futexes
- linux-skype - doesnt work, issue with futexes
- linux-rt2-demo - works, tested
- linux-acroread - doesnt work, unknown reason (coredump) and sometimes
issue with futexes
- various unix utilities in linux-base-gentoo3 and linux-base-fc4:
everything tried worked

On amd64 not everything is supported like on i386, the catchup is planned for
later when the remaining bugs in the new functions are fixed.

To test this new stuff, you have to run
sysctl compat.linux.osrelease=2.6.16
to switch back use
sysctl compat.linux.osrelease=2.4.2

Don't switch while running a linux program, strange things may or may not
happen.

Sponsored by: Google SoC 2006
Submitted by: rdivacky
Some suggestions/help by: jhb, kib, manu@NetBSD.org, netchild


# 50e422f0 10-Aug-2006 Alexander Leidinger <netchild@FreeBSD.org>

Add some more errno mappings (bsd -> linux) and a comment about the status..

Submitted by: "Intron" <mag@intron.ac>


# 387196bf 06-May-2006 Doug Ambrisko <ambrisko@FreeBSD.org>

Forgot the amd/linux32 part since sys/*/linux didn't match :-(

Pointed out by: Alexander (thanks)


# aefce619 19-Mar-2006 Ruslan Ermilov <ru@FreeBSD.org>

Unbreak COMPAT_LINUX32 option support on amd64.

Broken by: netchild


# c85625bf 18-Mar-2006 Alexander Leidinger <netchild@FreeBSD.org>

regen


# 1f7642e0 18-Mar-2006 Alexander Leidinger <netchild@FreeBSD.org>

regen after COMPAT_43 removal


# 900b28f9 26-Dec-2005 Maxim Sobolev <sobomax@FreeBSD.org>

Remove kern.elf32.can_exec_dyn sysctl. Instead extend Brandinfo structure
with flags bitfield and set BI_CAN_EXEC_DYN flag for all brands that usually
allow executing elf dynamic binaries (aka shared libraries). When it is
requested to execute ET_DYN elf image check if this flag is on after we
know the elf brand allowing execution if so.

PR: kern/87615
Submitted by: Marcin Koziej <creep@desk.pl>


# 410d8579 15-Dec-2005 John Baldwin <jhb@FreeBSD.org>

Remove linux_mib_destroy() (which I actually added in between 5.0 and 5.1)
which existed to cleanup the linux_osname mutex. Now that MTX_SYSINIT()
has grown a SYSUNINIT to destroy mutexes on unload, the extra destroy here
was redundant and resulted in panics in debug kernels.

MFC after: 1 week
Reported by: Goran Gajic ggajic at afrodita dot rcub dot bg dot ac dot yu


# 1471f287 02-Nov-2005 Paul Saab <ps@FreeBSD.org>

Calling setrlimit from 32bit apps could potentially increase certain
limits beyond what should be capiable in a 32bit process, so we
must fixup the limits.

Reviewed by: jhb


# 728ef954 14-Oct-2005 John Baldwin <jhb@FreeBSD.org>

The signal code is now an int rather than a long, so update debug printfs.


# 9104847f 13-Oct-2005 David Xu <davidxu@FreeBSD.org>

1. Change prototype of trapsignal and sendsig to use ksiginfo_t *, most
changes in MD code are trivial, before this change, trapsignal and
sendsig use discrete parameters, now they uses member fields of
ksiginfo_t structure. For sendsig, this change allows us to pass
POSIX realtime signal value to user code.

2. Remove cpu_thread_siginfo, it is no longer needed because we now always
generate ksiginfo_t data and feed it to libpthread.

3. Add p_sigqueue to proc structure to hold shared signals which were
blocked by all threads in the proc.

4. Add td_sigqueue to thread structure to hold all signals delivered to
thread.

5. i386 and amd64 now return POSIX standard si_code, other arches will
be fixed.

6. In this sigqueue implementation, pending signal set is kept as before,
an extra siginfo list holds additional siginfo_t data for signals.
kernel code uses psignal() still behavior as before, it won't be failed
even under memory pressure, only exception is when deleting a signal,
we should call sigqueue_delete to remove signal from sigqueue but
not SIGDELSET. Current there is no kernel code will deliver a signal
with additional data, so kernel should be as stable as before,
a ksiginfo can carry more information, for example, allow signal to
be delivered but throw away siginfo data if memory is not enough.
SIGKILL and SIGSTOP have fast path in sigqueue_add, because they can
not be caught or masked.
The sigqueue() syscall allows user code to queue a signal to target
process, if resource is unavailable, EAGAIN will be returned as
specification said.
Just before thread exits, signal queue memory will be freed by
sigqueue_flush.
Current, all signals are allowed to be queued, not only realtime signals.

Earlier patch reviewed by: jhb, deischen
Tested on: i386, amd64


# 2a988f7c 22-Sep-2005 Stephan Uphoff <ups@FreeBSD.org>

Fix the "fpudna: fpcurthread == curthread XXX times" problem.

Tested by: kris@
Reviewed by: peter@
MFC after: 3 days


# 813a5e14 29-Jul-2005 John Baldwin <jhb@FreeBSD.org>

Move MODULE_DEPEND() statements for SYSVIPC dependencies to linux_ipc.c
so that they aren't duplicated 3 times and are also in the same file as
the code that depends on the SYSVIPC modules.


# f2c7668e 23-Mar-2005 David Schultz <das@FreeBSD.org>

Make ps_nargvstr and ps_nenvstr unsigned. This fixes an input
validation error in procfs/linprocfs that can be exploited by local
users to cause a kernel panic. All versions of FreeBSD with the patch
referenced in SA-04:17.procfs have this bug, but versions without that
patch have a more serious bug instead. This problem only affects
systems on which procfs or linprocfs is mounted.

Found by: Coverity Prevent analysis tool
Security: Local DOS


# 1d15fdd9 18-Feb-2005 John Baldwin <jhb@FreeBSD.org>

- Add a custom version of exec_copyin_args() to deal with the 32-bit
pointers in argv and envv in userland and use that together with
kern_execve() and exec_free_args() to implement linux_execve() for the
amd64/linux32 ABI without using the stackgap.
- Implement linux_nanosleep() using the recently added kern_nanosleep().
- Use linux_emul_convpath() instead of linux_emul_find() in
exec_linux_imgact_try().

Tested by: cokane
Silence on: amd64


# 610ecfe0 29-Jan-2005 Maxim Sobolev <sobomax@FreeBSD.org>

o Split out kernel part of execve(2) syscall into two parts: one that
copies arguments into the kernel space and one that operates
completely in the kernel space;

o use kernel-only version of execve(2) to kill another stackgap in
linuxlator/i386.

Obtained from: DragonFlyBSD (partially)
MFC after: 2 weeks


# 6004362e 26-Nov-2004 David Schultz <das@FreeBSD.org>

Don't include sys/user.h merely for its side-effect of recursively
including other headers.


# ce55a234 16-Aug-2004 David E. O'Brien <obrien@FreeBSD.org>

I missed an 'IA32' in the documentation.


# c680f6b1 16-Aug-2004 David E. O'Brien <obrien@FreeBSD.org>

I'm not sure what tjr envisioned for turning on FreeBSD/i386 rt support,
but make it COMPAT_IA32 for now.
Fix the 'DEBUG' argument code to unbreak the amd64 LINT build.


# ea0fabbc 16-Aug-2004 Tim J. Robbins <tjr@FreeBSD.org>

Add preliminary support for running 32-bit Linux binaries on amd64, enabled
with the COMPAT_LINUX32 option. This is largely based on the i386 MD Linux
emulations bits, but also builds on the 32-bit FreeBSD and generic IA-32
binary emulation work.

Some of this is still a little rough around the edges, and will need to be
revisited before 32-bit and 64-bit Linux emulation support can coexist in
the same kernel.