History log of /freebsd-10.3-release/sys/amd64/linux32/linux32_sysvec.c
Revision Date Author Comments
(<<< Hide modified files)
(Show modified files >>>)
# 296373 04-Mar-2016 marius

- Copy stable/10@296371 to releng/10.3 in preparation for 10.3-RC1
builds.
- Update newvers.sh to reflect RC1.
- Update __FreeBSD_version to reflect 10.3.
- Update default pkg(8) configuration to use the quarterly branch.

Approved by: re (implicit)

# 294901 27-Jan-2016 delphij

MFC r294900:

Implement AT_SECURE properly.

AT_SECURE auxv entry has been added to the Linux 2.5 kernel to pass a
boolean flag indicating whether secure mode should be enabled. 1 means
that the program has changes its credentials during the execution.
Being exported AT_SECURE used by glibc issetugid() call.

Submitted by: imp, dchagin
Security: FreeBSD-SA-16:10.linux
Security: CVE-2016-1883


# 294368 20-Jan-2016 jhb

MFC 289769,289822,290143,290144:
Rename remaining linux32 symbols from linux_* to linux32_*.

289769:
Rename remaining linux32 symbols such as linux_sysent[] and
linux_syscallnames[] from linux_* to linux32_* to avoid conflicts with
linux64.ko. While here, add support for linux64 binaries to systrace.
- Update NOPROTO entries in amd64/linux/syscalls.master to match the
main table to fix systrace build.
- Add a special case for union l_semun arguments to the systrace
generation.
- The systrace_linux32 module now only builds the systrace_linux32.ko.
module on amd64.
- Add a new systrace_linux module that builds on both i386 and amd64.
For i386 it builds the existing systrace_linux.ko. For amd64 it
builds a systrace_linux.ko for 64-bit binaries.

289822:
Fix build for the KTR-enabled kernels.

290143:
Fix build with DEBUG defined.

290144:
Update for LINUX32 rename. The assembler didn't complain about undefined
symbols but just used 0 after the rename.


# 294136 16-Jan-2016 dchagin

MFC r293613:

Implement vsyscall hack. Prior to 2.13 glibc uses vsyscall
instead of vdso. An upcoming linux_base-c6 needs it.


# 293609 09-Jan-2016 dchagin

MFC r289055 (by mjg@):

linux: fix handling of out-of-bounds syscall attempts

Due to an off by one the code would read an entry past the table, as
opposed to the last entry which contains the nosys handler.

This fixes my fault.

MFC r289058 (by cem@):

Fix missing semi-colon from r289055.

MFC r289768 (by jhb@):

Merge r289055 to amd64/linux32:

linux: fix handling of out-of-bounds syscall attempts

Due to an off by one the code would read an entry past the table, as
opposed to the last entry which contains the nosys handler.


# 293575 09-Jan-2016 dchagin

MFC r283474:

Rework signal code to allow using it by other modules, like linprocfs:

1. Linux sigset always 64 bit on all platforms. In order to move Linux
sigset code to the linux_common module define it as 64 bit int. Move
Linux sigset manipulation routines to the MI path.

2. Move Linux signal number definitions to the MI path. In general, they
are the same on all platforms except for a few signals.

3. Map Linux RT signals to the FreeBSD RT signals and hide signal conversion
tables to avoid conversion errors.

4. Emulate Linux SIGPWR signal via FreeBSD SIGRTMIN signal which is outside
of allowed on Linux signal numbers.

PR: 197216


# 293569 09-Jan-2016 dchagin

MFC r283467:

Call nosys in case when the incorrect syscall number is specified.

Its my fault, fixed by mjg@ at r289055.


# 293540 09-Jan-2016 dchagin

MFC r283436:

Use the BSD_TO_LINUX_SIGNAL() wherever there is no need
to check the ABI as it is known.


# 293535 09-Jan-2016 dchagin

MFC r283431:

Add AT_RANDOM and AT_EXECFN auxiliary vector entries which are used by
glibc. At list since glibc version 2.16 using AT_RANDOM is mandatory.


# 293528 09-Jan-2016 dchagin

MFC r283422:

Refund the proc emuldata struct for future use. For now move flags from
thread emuldata to proc emuldata as it was originally intended.

As we can have both 64 & 32 bit Linuxulator running any eventhandler
can be called twice for us. To prevent this move eventhandlers code
from linux_emul.c to the linux_common.ko module.


# 293527 09-Jan-2016 dchagin

MFC r283421:

Introduce a new module linux_common.ko which is intended for the
following primary purposes:

1. Remove the dependency of linsysfs and linprocfs modules from linux.ko,
which will be architecture specific on amd64.

2. Incorporate into linux_common.ko general code for platforms on which
we'll support two Linuxulator modules (for both instruction set - 32 & 64 bit).

3. Move malloc(9) declaration to linux_common.ko, to enable getting memory
usage statistics properly.

Currently linux_common.ko incorporates a code from linux_mib.c and linux_util.c
and linprocfs, linsysfs and linux kernel modules depend on linux_common.ko.

Temporarily remove dtrace garbage from linux_mib.c and linux_util.c


# 293517 09-Jan-2016 dchagin

MFC r283411:

Remove stale comment about a signal trampoline which
is moved to the shared page at r219609.


# 293516 09-Jan-2016 dchagin

MFC r283410:

Put linux_platform into the vdso to avoid copying it onto the stack at
every exec.


# 293514 09-Jan-2016 dchagin

MFC r283407:

Implement vdso - virtual dynamic shared object. Through vdso Linux
exposes functions from kernel with proper DWARF CFI information so that
it becomes easier to unwind through them.
Using vdso is a mandatory for a thread cancelation && cleanup
on a modern glibc.


# 293495 09-Jan-2016 dchagin

MFC r283385:

Some style(9) && whitespaces fixes. No functional changes.


# 293493 09-Jan-2016 dchagin

MFC r283383:

Switch linuxulator to use the native 1:1 threads.

The reasons:
1. Get rid of the stubs/quirks with process dethreading,
process reparent when the process group leader exits and close
to this problems on wait(), waitpid(), etc.
2. Reuse our kernel code instead of writing excessive thread
managment routines in Linuxulator.

Implementation details:

1. The thread is created via kern_thr_new() in the clone() call with
the CLONE_THREAD parameter. Thus, everything else is a process.
2. The test that the process has a threads is done via P_HADTHREADS
bit p_flag of struct proc.
3. Per thread emulator state data structure is now located in the
struct thread and freed in the thread_dtor() hook.
Mandatory holdig of the p_mtx required when referencing emuldata
from the other threads.
4. PID mangling has changed. Now Linux pid is the native tid
and Linux tgid is the native pid, with the exception of the first
thread in the process where tid and pid are one and the same.

Ugliness:

In case when the Linux thread is the initial thread in the thread
group thread id is equal to the process id. Glibc depends on this
magic (assert in pthread_getattr_np.c). So for system calls that
take thread id as a parameter we should use the special method
to reference struct thread.


# 267561 17-Jun-2014 dchagin

Revert MFC r266925 because it can lead to instant panic at fexecve():

To allow to run interpreter itself add a new ELF branding type.

Pointed out by: kib, mjg


# 266999 03-Jun-2014 dchagin

MFC r266925:

To allow to run the interpreter itself add a new ELF branding type.
Allow Linux ABI to run ELF interpreter.


# 258559 25-Nov-2013 emaste

MFC r258135: x86: Allow users to change PSL_RF via ptrace(PT_SETREGS...)

Debuggers may need to change PSL_RF. Note that tf_eflags is already stored
in the signal context during signal handling and PSL_RF previously could
be modified via sigreturn, so this change should not provide any new
ability to userspace.

For background see the thread at:
http://lists.freebsd.org/pipermail/freebsd-i386/2007-September/005910.html

Reviewed by: jhb, kib

Sponsored by: DARPA, AFRL
Approved by: re (gjb)


# 256281 10-Oct-2013 gjb

Copy head (r256279) to stable/10 as part of the 10.0-RELEASE cycle.

Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation


# 250423 09-May-2013 dchagin

Retire write-only PCB_GS32BIT pcb flag on amd64.


# 246085 29-Jan-2013 jhb

Reduce duplication between i386/linux/linux.h and amd64/linux32/linux.h
by moving bits that are MI out into headers in compat/linux.

Reviewed by: Chagin Dmitry dmitry | gmail
MFC after: 2 weeks


# 241394 10-Oct-2012 kevlo

Revert previous commit...

Pointyhat to: kevlo (myself)


# 241370 09-Oct-2012 kevlo

Prefer NULL over 0 for pointers


# 230132 15-Jan-2012 uqs

Convert files to UTF-8


# 227309 07-Nov-2011 ed

Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs.

The SYSCTL_NODE macro defines a list that stores all child-elements of
that node. If there's no SYSCTL_DECL macro anywhere else, there's no
reason why it shouldn't be static.


# 220026 26-Mar-2011 dchagin

Export the correct AT_PLATFORM value.
Since signal trampolines are copied to the shared page do not need to
leave place on the stack for it. Forgotten in the previous commit.

MFC after: 1 Week


# 219609 13-Mar-2011 dchagin

Enable shared page use for amd64/linux32 and i386/linux binaries.
Move signal trampoline code from the top of the stack to the shared page.

MFC after: 2 Weeks


# 219405 08-Mar-2011 dchagin

Extend struct sysvec with new method sv_schedtail, which is used for an
explicit process at fork trampoline path instead of eventhadler(schedtail)
invocation for each child process.

Remove eventhandler(schedtail) code and change linux ABI to use newly added
sysvec method.

While here replace explicit comparing of module sysentvec structure with the
newly created process sysentvec to detect the linux ABI.

Discussed with: kib

MFC after: 2 Week


# 218658 13-Feb-2011 dchagin

Sort include files in the alphabetical order.


# 217424 14-Jan-2011 jkim

Remove redundant, bogus, and even harmful uses of setting TS bit in CR0.
It is done from fpstate_drop() when it is really necessary.

Reviewed by: kib
MFC after: 1 week


# 216634 22-Dec-2010 jkim

Improve PCB flags handling and make it more robust. Add two new functions
for manipulating pcb_flags. These inline functions are very similar to
atomic_set_char(9) and atomic_clear_char(9) but without unnecessary LOCK
prefix for SMP. Add comments about the rationale[1]. Use these functions
wherever possible. Although there are some places where it is not strictly
necessary (e.g., a PCB is copied to create a new PCB), it is done across
the board for sake of consistency. Turn pcb_full_iret into a PCB flag as
it is safe now. Move rarely used fields before pcb_flags and reduce size
of pcb_flags to one byte. Fix some style(9) nits in pcb.h while I am in
the neighborhood.

Reviewed by: kib
Submitted by: kib[1]
MFC after: 2 months


# 216255 07-Dec-2010 kib

Update some comments related to use of amd64 full context switch.
In exec_linux_setregs(), use locally cached pointer to pcb to set
pcb_full_iret.
In set_regs(), note that full return is needed when code that sets
segment registers is enabled.

MFC after: 1 week


# 216253 07-Dec-2010 kib

Retire write-only PCB_FULLCTX pcb flag on amd64.

Reminded by: Petr Salinger <Petr.Salinger seznam cz>
Tested by: pho
MFC after: 1 week


# 213716 12-Oct-2010 kib

Add macro DECLARE_MODULE_TIED to denote a module as requiring the
kernel of exactly the same __FreeBSD_version as the headers module was
compiled against.

Mark our in-tree ABI emulators with DECLARE_MODULE_TIED. The modules
use kernel interfaces that the Release Engineering Team feel are not
stable enough to guarantee they will not change during the life cycle
of a STABLE branch. In particular, the layout of struct sysentvec is
declared to be not part of the STABLE KBI.

Discussed with: bz, rwatson
Approved by: re (bz, kensmith)
MFC after: 2 weeks


# 210555 28-Jul-2010 alc

The interpreter name should no longer be treated as a buffer that can be
overwritten. (This change should have been included in r210545.)

Submitted by: kib


# 208453 23-May-2010 kib

Reorganize syscall entry and leave handling.

Extend struct sysvec with three new elements:
sv_fetch_syscall_args - the method to fetch syscall arguments from
usermode into struct syscall_args. The structure is machine-depended
(this might be reconsidered after all architectures are converted).
sv_set_syscall_retval - the method to set a return value for usermode
from the syscall. It is a generalization of
cpu_set_syscall_retval(9) to allow ABIs to override the way to set a
return value.
sv_syscallnames - the table of syscall names.

Use sv_set_syscall_retval in kern_sigsuspend() instead of hardcoding
the call to cpu_set_syscall_retval().

The new functions syscallenter(9) and syscallret(9) are provided that
use sv_*syscall* pointers and contain the common repeated code from
the syscall() implementations for the architecture-specific syscall
trap handlers.

Syscallenter() fetches arguments, calls syscall implementation from
ABI sysent table, and set up return frame. The end of syscall
bookkeeping is done by syscallret().

Take advantage of single place for MI syscall handling code and
implement ptrace_lwpinfo pl_flags PL_FLAG_SCE, PL_FLAG_SCX and
PL_FLAG_EXEC. The SCE and SCX flags notify the debugger that the
thread is stopped at syscall entry or return point respectively. The
EXEC flag augments SCX and notifies debugger that the process address
space was changed by one of exec(2)-family syscalls.

The i386, amd64, sparc64, sun4v, powerpc and ia64 syscall()s are
changed to use syscallenter()/syscallret(). MIPS and arm are not
converted and use the mostly unchanged syscall() implementation.

Reviewed by: jhb, marcel, marius, nwhitehorn, stas
Tested by: marcel (ia64), marius (sparc64), nwhitehorn (powerpc),
stas (mips)
MFC after: 1 month


# 205642 25-Mar-2010 nwhitehorn

Change the arguments of exec_setregs() so that it receives a pointer
to the image_params struct instead of several members of that struct
individually. This makes it easier to expand its arguments in the future
without touching all platforms.

Reviewed by: jhb


# 205014 11-Mar-2010 nwhitehorn

Provide groundwork for 32-bit binary compatibility on non-x86 platforms,
for upcoming 64-bit PowerPC and MIPS support. This renames the COMPAT_IA32
option to COMPAT_FREEBSD32, removes some IA32-specific code from MI parts
of the kernel and enhances the freebsd32 compatibility code to support
big-endian platforms.

Reviewed by: kib, jhb


# 198507 27-Oct-2009 kib

In r197963, a race with thread being selected for signal delivery
while in kernel mode, and later changing signal mask to block the
signal, was fixed for sigprocmask(2) and ptread_exit(3). The same race
exists for sigreturn(2), setcontext(2) and swapcontext(2) syscalls.

Use kern_sigprocmask() instead of direct manipulation of td_sigmask to
reschedule newly blocked signals, closing the race.

Reviewed by: davidxu
Tested by: pho
MFC after: 1 month


# 196512 24-Aug-2009 bz

Fix handling of .note.ABI-tag section for GNU systems [1].
Handle GNU/Linux according to LSB Core Specification 4.0,
Chapter 11. Object Format, 11.8. ABI note tag.

Also check the first word of desc, not only name, according to
glibc abi-tags specification to distinguish between Linux and
kFreeBSD.

Add explicit handling for Debian GNU/kFreeBSD, which runs
on our kernels as well [2].

In {amd64,i386}/trap.c, when checking osrel of the current process,
also check the ABI to not change the signal behaviour for Linux
binary processes, now that we save an osrel version for all three
from the lists above in struct proc [2].

These changes make it possible to run FreeBSD, Debian GNU/kFreeBSD
and Linux binaries on the same machine again for at least i386 and
amd64, and no longer break kFreeBSD which was detected as GNU(/Linux).

PR: kern/135468
Submitted by: dchagin [1] (initial patch)
Suggested by: kib [2]
Tested by: Petr Salinger (Petr.Salinger seznam.cz) for kFreeBSD
Reviewed by: kib
MFC after: 3 days


# 195486 09-Jul-2009 kib

Restore the segment registers and segment base MSRs for amd64 syscall
return path only when neither thread was context switched while
executing syscall code nor syscall explicitely modified LDT or MSRs.

Save segment registers in trap handlers before interrupts are enabled,
to not allow context switches to happen before registers are saved.
Use separated byte in pcb for indication of fast/full return, since
pcb_flags are not synchronized with context switches.

The change puts back syscall microbenchmark numbers that were slowed
down after commit of the support for LDT on amd64.

Reviewed by: jeff
Tested (and tested, and tested ...) by: pho
Approved by: re (kensmith)


# 191973 10-May-2009 dchagin

Do not export AT_CLKTCK when emulating Linux kernel prior
to 2.4.0, as it has appeared in the 2.4.0-rc7 first time.
Being exported, AT_CLKTCK is returned by sysconf(_SC_CLK_TCK),
glibc falls back to the hard-coded CLK_TCK value when aux entry
is not present.

Glibc versions prior to 2.2.1 always use hard-coded CLK_TCK value.

For older applications/libc's which depends on hard-coded CLK_TCK
value user should set compat.linux.osrelease less than 2.4.0.

Approved by: kib (mentor)


# 191966 10-May-2009 dchagin

Rework r189362, r191883.
The frequency of the statistics clock is given by stathz.
Use stathz if it is available, otherwise use hz.

Pointed out by: bde

Approved by: kib (mentor)


# 191896 07-May-2009 jamie

Move the per-prison Linux MIB from a private one-off pointer to the new
OSD-based jail extensions. This allows the Linux MIB to accessed via
jail_set and jail_get, and serves as a demonstration of adding jail support
to a module.

Reviewed by: dchagin, kib
Approved by: bz (mentor)


# 191741 02-May-2009 dchagin

Move extern variable definitions to the header file.

Approved by: kib (mentor)
MFC after: 1 month


# 191719 01-May-2009 dchagin

Reimplement futexes.
Old implemention used Giant to protect the kernel data structures,
but at the same time called malloc(M_WAITOK), that could cause the
calling thread to sleep and lost Giant protection. User-visible
result was the missed wakeup.

New implementation uses one sx lock per futex. The sx protects
the futex structures and allows to sleep while copyin or copyout
are performed.

Unlike linux, we return EINVAL when FUTEX_CMP_REQUEUE operation
is requested and either caller specified futexes are equial or
second futex already exists. This is acceptable since the situation
can only occur from the application error, and glibc falls back to
old FUTEX_WAKE operation when FUTEX_CMP_REQUEUE returns an error.

Approved by: kib (mentor)
MFC after: 1 month


# 190708 05-Apr-2009 dchagin

Fix KBI breakage by r190520 which affects older linux.ko binaries:

1) Move the new field (brand_note) to the end of the Brandinfo structure.
2) Add a new flag BI_BRAND_NOTE that indicates that the brand_note pointer
is valid.
3) Use the brand_note field if the flag BI_BRAND_NOTE is set and as old
modules won't have the flag set, so the new field brand_note would be
ignored.

Suggested by: jhb
Reviewed by: jhb
Approved by: kib (mentor)
MFC after: 6 days


# 190620 01-Apr-2009 kib

Save and restore segment registers on amd64 when entering and leaving
the kernel on amd64. Fill and read segment registers for mcontext and
signals. Handle traps caused by restoration of the
invalidated selectors.

Implement user-mode creation and manipulation of the process-specific
LDT descriptors for amd64, see sysarch(2).

Implement support for TSS i/o port access permission bitmap for amd64.

Context-switch LDT and TSS. Do not save and restore segment registers on
the context switch, that is handled by kernel enter/leave trampolines
now. Remove segment restore code from the signal trampolines for
freebsd/amd64, freebsd/ia32 and linux/i386 for the same reason.

Implement amd64-specific compat shims for sysarch.

Linuxolator (temporary ?) switched to use gsbase for thread_area pointer.

TODO:
Currently, gdb is not adapted to show segment registers from struct reg.
Also, no machine-depended ptrace command is added to set segment
registers for debugged process.

In collaboration with: pho
Discussed with: peter
Reviewed by: jhb
Linuxolator tested by: dchagin


# 189771 13-Mar-2009 dchagin

Implement new way of branding ELF binaries by looking to a
".note.ABI-tag" section.

The search order of a brand is changed, now first of all the
".note.ABI-tag" is looked through.

Move code which fetch osreldate for ELF binary to check_note() handler.

PR: 118473
Approved by: kib (mentor)


# 189423 05-Mar-2009 jhb

A better fix for handling different FPU initial control words for different
ABIs:
- Store the FPU initial control word in the pcb for each thread.
- When first using the FPU, load the initial control word after restoring
the clean state if it is not the standard control word.
- Provide a correct control word for Linux/i386 binaries under
FreeBSD/amd64.
- Adjust the control word returned for fpugetregs()/npxgetregs() when a
thread hasn't used the FPU yet to reflect the real initial control
word for the current ABI.
- The Linux/i386 ABI for FreeBSD/i386 now properly sets the right control
word instead of trashing whatever the current state of the FPU is.

Reviewed by: bde


# 189362 04-Mar-2009 dchagin

Add AT_PLATFORM, AT_HWCAP and AT_CLKTCK auxiliary vector entries which
are used by glibc. This silents the message "2.4+ kernel w/o ELF notes?"
from some programs at start, among them are top and pkill.

Do the assignment of the vector entries in elf_linux_fixup()
as it is done in glibc.

Fix some minor style issues.

Submitted by: Marcin Cieslak <saper at SYSTEM PL>
Approved by: kib (mentor)
MFC after: 1 week


# 187964 31-Jan-2009 obrien

Fix the inconsistent tabbing.

Noticed by: bde


# 187948 31-Jan-2009 obrien

Change some movl's to mov's. Newer GAS no longer accept 'movl' instructions
for moving between a segment register and a 32-bit memory location.

Looked at by: jhb


# 186211 17-Dec-2008 imp

Remove obsolete AT_DEBUG stuff. It never should have been committed
in the first place, let alone migrated to linux emulation.

Reviewed by: peter, rdivacky


# 185169 22-Nov-2008 kib

Add sv_flags field to struct sysentvec with intention to provide description
of the ABI of the currently executing image. Change some places to test
the flags instead of explicit comparing with address of known sysentvec
structures to determine ABI features.

Discussed with: dchagin, imp, jhb, peter


# 184058 19-Oct-2008 kib

Correctly fill siginfo for the signals delivered by linux tkill/tgkill.
It is required for async cancellation to work.

Fix PROC_LOCK leak in linux_tgkill when signal delivery attempt is made
to not linux process.

Do not call em_find(p, ...) with p unlocked.

Move common code for linux_tkill() and linux_tgkill() into
linux_do_tkill().

Change linux siginfo_t definition to match actual linux one. Extend
uid fields to 4 bytes from 2. The extension does not change structure
layout and is binary compatible with previous definition, because i386
is little endian, and each uid field has 2 byte padding after it.

Reported by: Nicolas Joly <njoly pasteur fr>
Submitted by: dchangin
MFC after: 1 month


# 184026 18-Oct-2008 kib

Set PCB_32BIT and clear PCB_GS32BIT for linux32 binaries.

Tested by: dchagin
MFC after: 3 days


# 183322 24-Sep-2008 kib

Change the static struct sysentvec and struct Elf_Brandinfo initializers
to the C99 style. At least, it is easier to read sysent definitions
that way, and search for the actual instances of sigcode etc.

Explicitely initialize sysentvec.sv_maxssiz that was missed in most
sysvecs.

No objection from: jhb
MFC after: 1 month


# 177997 08-Apr-2008 kib

Implement the linux syscalls
openat, mkdirat, mknodat, fchownat, futimesat, fstatat, unlinkat,
renameat, linkat, symlinkat, readlinkat, fchmodat, faccessat.

Submitted by: rdivacky
Sponsored by: Google Summer of Code 2007
Tested by: pho


# 177145 13-Mar-2008 kib

Since version 4.3, gcc changed its behaviour concerning the i386/amd64
ABI and the direction flag, that is it now assumes that the direction
flag is cleared at the entry of a function and it doesn't clear once
more if needed. This new behaviour conforms to the i386/amd64 ABI.

Modify the signal handler frame setup code to clear the DF {e,r}flags
bit on the amd64/i386 for the signal handlers.

jhb@ noted that it might break old apps if they assumed DF == 1 would be
preserved in the signal handlers, but that such apps should be rare and
that older versions of gcc would not generate such apps.

Submitted by: Aurelien Jarno <aurelien aurel32 net>
PR: 121422
Reviewed by: jhb
MFC after: 2 weeks


# 177091 12-Mar-2008 jeff

Remove kernel support for M:N threading.

While the KSE project was quite successful in bringing threading to
FreeBSD, the M:N approach taken by the kse library was never developed
to its full potential. Backwards compatibility will be provided via
libmap.conf for dynamically linked binaries and static binaries will
be broken.


# 172255 20-Sep-2007 kib

Fill in cr2 in the signal context from ksi->ksi_addr.
Together with the sys/i386/i386/trap.c rev. 1.306 it fixes the PR.

Submitted by: rdivacky
Suggested by: jhb
Sponsored by: Google Summer of Code 2007
PR: kern/77710
Approved by: re (kensmith)


# 171410 12-Jul-2007 jhb

Fix a couple of issues with the stack limit for 32-bit processes on 64-bit
kernels exposed by the recent fixes to resource limits for 32-bit processes
on 64-bit kernels:
- Let ABIs expose their maximum stack size via a new pointer in sysentvec
and use that in preference to maxssiz during exec() rather than always
using maxssiz for all processses.
- Apply the ABI's limit fixup to the previous stack size when adjusting
RLIMIT_STACK to determine if the existing mapping for the stack needs to
be grown or shrunk (as well as how much it should be grown or shrunk).

Approved by: re (kensmith)


# 169565 14-May-2007 jhb

Rework the support for ABIs to override resource limits (used by 32-bit
processes under 64-bit kernels). Previously, each 32-bit process overwrote
its resource limits at exec() time. The problem with this approach is that
the new limits affect all child processes of the 32-bit process, including
if the child process forks and execs a 64-bit process. To fix this, don't
ovewrite the resource limits during exec(). Instead, sv_fixlimits() is
now replaced with a different function sv_fixlimit() which asks the ABI to
sanitize a single resource limit. We then use this when querying and
setting resource limits. Thus, if a 32-bit process sets a limit, then
that new limit will be inherited by future children. However, if the
32-bit process doesn't change a limit, then a future 64-bit child will
see the "full" 64-bit limit rather than the 32-bit limit.

MFC is tentative since it will break the ABI of old linux.ko modules (no
other modules are affected).

MFC after: 1 week


# 168275 02-Apr-2007 jkim

MFP4: Turn emul_lock into a mutex.

Submitted by: rdivacky


# 168035 30-Mar-2007 jkim

MFP4: Linux set_thread_area syscall (aka TLS) support for amd64.

Initial version was submitted by Divacky Roman and mostly rewritten by me.

Tested by: emulation


# 164860 03-Dec-2006 netchild

MFP4 (110939):

MFi386: return EOPNOTSUPP for unknown module events.

Submitted by: rdivacky


# 163829 31-Oct-2006 kib

Fix a typo resulting in truncated linux32 signal trampoline code copied
to the usermode. Usually, signal handler segfaulted on return.

Reviewed by: jhb
MFC after: 3 days


# 162182 09-Sep-2006 netchild

Change futex lock from mutex to sx. Make futex_get atomic (protected by the
futex lock).

Sponsored by: Google SoC 2006
Submitted by: rdivacky
Suggested by: jhb


# 161419 17-Aug-2006 netchild

Move some stuff into headers where they belong.

Sponsored by: Google SoC 2006
Submitted by: rdivacky
Noticed by: jhb, ssouhlal


# 161400 17-Aug-2006 netchild

Initialize the emul sx-lock.

Sponsored by: Google SoC 2006
Submitted by: rdivacky


# 161315 15-Aug-2006 netchild

Initialize the eventhandlers, mutexes and sx locks.

Sponsored by: Google SoC 2006
Submitted by: rdivacky


# 161310 15-Aug-2006 netchild

Add the linux 2.6.x stuff (not used by default!):
- TLS - complete
- pid/tid mangling - complete
- thread area - complete
- futexes - complete with issues
- clone() extension - complete with some possible minor issues
- mq*/timer*/clock* stuff - complete but untested and the mq* stuff is
disabled when not build as part of the kernel with native FreeBSD mq*
support (module support for this will come later)

Tested with:
- linux-firefox - works, tested
- linux-opera - works, tested
- linux-realplay - doesnt work, issue with futexes
- linux-skype - doesnt work, issue with futexes
- linux-rt2-demo - works, tested
- linux-acroread - doesnt work, unknown reason (coredump) and sometimes
issue with futexes
- various unix utilities in linux-base-gentoo3 and linux-base-fc4:
everything tried worked

On amd64 not everything is supported like on i386, the catchup is planned for
later when the remaining bugs in the new functions are fixed.

To test this new stuff, you have to run
sysctl compat.linux.osrelease=2.6.16
to switch back use
sysctl compat.linux.osrelease=2.4.2

Don't switch while running a linux program, strange things may or may not
happen.

Sponsored by: Google SoC 2006
Submitted by: rdivacky
Some suggestions/help by: jhb, kib, manu@NetBSD.org, netchild


# 161204 10-Aug-2006 netchild

Add some more errno mappings (bsd -> linux) and a comment about the status..

Submitted by: "Intron" <mag@intron.ac>


# 158334 06-May-2006 ambrisko

Forgot the amd/linux32 part since sys/*/linux didn't match :-(

Pointed out by: Alexander (thanks)


# 156874 19-Mar-2006 ru

Unbreak COMPAT_LINUX32 option support on amd64.

Broken by: netchild


# 156851 18-Mar-2006 netchild

regen


# 156843 18-Mar-2006 netchild

regen after COMPAT_43 removal


# 153741 26-Dec-2005 sobomax

Remove kern.elf32.can_exec_dyn sysctl. Instead extend Brandinfo structure
with flags bitfield and set BI_CAN_EXEC_DYN flag for all brands that usually
allow executing elf dynamic binaries (aka shared libraries). When it is
requested to execute ET_DYN elf image check if this flag is on after we
know the elf brand allowing execution if so.

PR: kern/87615
Submitted by: Marcin Koziej <creep@desk.pl>


# 153448 15-Dec-2005 jhb

Remove linux_mib_destroy() (which I actually added in between 5.0 and 5.1)
which existed to cleanup the linux_osname mutex. Now that MTX_SYSINIT()
has grown a SYSUNINIT to destroy mutexes on unload, the extra destroy here
was redundant and resulted in panics in debug kernels.

MFC after: 1 week
Reported by: Goran Gajic ggajic at afrodita dot rcub dot bg dot ac dot yu


# 151980 02-Nov-2005 ps

Calling setrlimit from 32bit apps could potentially increase certain
limits beyond what should be capiable in a 32bit process, so we
must fixup the limits.

Reviewed by: jhb


# 151343 14-Oct-2005 jhb

The signal code is now an int rather than a long, so update debug printfs.


# 151316 14-Oct-2005 davidxu

1. Change prototype of trapsignal and sendsig to use ksiginfo_t *, most
changes in MD code are trivial, before this change, trapsignal and
sendsig use discrete parameters, now they uses member fields of
ksiginfo_t structure. For sendsig, this change allows us to pass
POSIX realtime signal value to user code.

2. Remove cpu_thread_siginfo, it is no longer needed because we now always
generate ksiginfo_t data and feed it to libpthread.

3. Add p_sigqueue to proc structure to hold shared signals which were
blocked by all threads in the proc.

4. Add td_sigqueue to thread structure to hold all signals delivered to
thread.

5. i386 and amd64 now return POSIX standard si_code, other arches will
be fixed.

6. In this sigqueue implementation, pending signal set is kept as before,
an extra siginfo list holds additional siginfo_t data for signals.
kernel code uses psignal() still behavior as before, it won't be failed
even under memory pressure, only exception is when deleting a signal,
we should call sigqueue_delete to remove signal from sigqueue but
not SIGDELSET. Current there is no kernel code will deliver a signal
with additional data, so kernel should be as stable as before,
a ksiginfo can carry more information, for example, allow signal to
be delivered but throw away siginfo data if memory is not enough.
SIGKILL and SIGSTOP have fast path in sigqueue_add, because they can
not be caught or masked.
The sigqueue() syscall allows user code to queue a signal to target
process, if resource is unavailable, EAGAIN will be returned as
specification said.
Just before thread exits, signal queue memory will be freed by
sigqueue_flush.
Current, all signals are allowed to be queued, not only realtime signals.

Earlier patch reviewed by: jhb, deischen
Tested on: i386, amd64


# 150473 22-Sep-2005 ups

Fix the "fpudna: fpcurthread == curthread XXX times" problem.

Tested by: kris@
Reviewed by: peter@
MFC after: 3 days


# 148540 29-Jul-2005 jhb

Move MODULE_DEPEND() statements for SYSVIPC dependencies to linux_ipc.c
so that they aren't duplicated 3 times and are also in the same file as
the code that depends on the SYSVIPC modules.


# 144011 23-Mar-2005 das

Make ps_nargvstr and ps_nenvstr unsigned. This fixes an input
validation error in procfs/linprocfs that can be exploited by local
users to cause a kernel panic. All versions of FreeBSD with the patch
referenced in SA-04:17.procfs have this bug, but versions without that
patch have a more serious bug instead. This problem only affects
systems on which procfs or linprocfs is mounted.

Found by: Coverity Prevent analysis tool
Security: Local DOS


# 142057 18-Feb-2005 jhb

- Add a custom version of exec_copyin_args() to deal with the 32-bit
pointers in argv and envv in userland and use that together with
kern_execve() and exec_free_args() to implement linux_execve() for the
amd64/linux32 ABI without using the stackgap.
- Implement linux_nanosleep() using the recently added kern_nanosleep().
- Use linux_emul_convpath() instead of linux_emul_find() in
exec_linux_imgact_try().

Tested by: cokane
Silence on: amd64


# 140992 29-Jan-2005 sobomax

o Split out kernel part of execve(2) syscall into two parts: one that
copies arguments into the kernel space and one that operates
completely in the kernel space;

o use kernel-only version of execve(2) to kill another stackgap in
linuxlator/i386.

Obtained from: DragonFlyBSD (partially)
MFC after: 2 weeks


# 138129 27-Nov-2004 das

Don't include sys/user.h merely for its side-effect of recursively
including other headers.


# 133846 16-Aug-2004 obrien

I missed an 'IA32' in the documentation.


# 133844 16-Aug-2004 obrien

I'm not sure what tjr envisioned for turning on FreeBSD/i386 rt support,
but make it COMPAT_IA32 for now.
Fix the 'DEBUG' argument code to unbreak the amd64 LINT build.


# 133819 16-Aug-2004 tjr

Add preliminary support for running 32-bit Linux binaries on amd64, enabled
with the COMPAT_LINUX32 option. This is largely based on the i386 MD Linux
emulations bits, but also builds on the 32-bit FreeBSD and generic IA-32
binary emulation work.

Some of this is still a little rough around the edges, and will need to be
revisited before 32-bit and 64-bit Linux emulation support can coexist in
the same kernel.