History log of /freebsd-9.3-release/sys/amd64/linux32/linux32_machdep.c
Revision Date Author Comments
(<<< Hide modified files)
(Show modified files >>>)
# 267654 19-Jun-2014 gjb

Copy stable/9 to releng/9.3 as part of the 9.3-RELEASE cycle.

Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation

# 250785 18-May-2013 dchagin

MFC r250423:

Retire write-only PCB_GS32BIT pcb flag on amd64.


# 248532 19-Mar-2013 jkim

MFC: r234352

Implement pipe2 syscall for Linuxulator.


# 225736 22-Sep-2011 kensmith

Copy head to stable/9 as part of 9.0-RELEASE release cycle.

Approved by: re (implicit)


# 225617 16-Sep-2011 kmacy

In order to maximize the re-usability of kernel code in user space this
patch modifies makesyscalls.sh to prefix all of the non-compatibility
calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel
entry points and all places in the code that use them. It also
fixes an additional name space collision between the kernel function
psignal and the libc function of the same name by renaming the kernel
psignal kern_psignal(). By introducing this change now we will ease future
MFCs that change syscalls.

Reviewed by: rwatson
Approved by: re (bz)


# 224778 11-Aug-2011 rwatson

Second-to-last commit implementing Capsicum capabilities in the FreeBSD
kernel for FreeBSD 9.0:

Add a new capability mask argument to fget(9) and friends, allowing system
call code to declare what capabilities are required when an integer file
descriptor is converted into an in-kernel struct file *. With options
CAPABILITIES compiled into the kernel, this enforces capability
protection; without, this change is effectively a no-op.

Some cases require special handling, such as mmap(2), which must preserve
information about the maximum rights at the time of mapping in the memory
map so that they can later be enforced in mprotect(2) -- this is done by
narrowing the rights in the existing max_protection field used for similar
purposes with file permissions.

In namei(9), we assert that the code is not reached from within capability
mode, as we're not yet ready to enforce namespace capabilities there.
This will follow in a later commit.

Update two capability names: CAP_EVENT and CAP_KEVENT become
CAP_POST_KEVENT and CAP_POLL_KEVENT to more accurately indicate what they
represent.

Approved by: re (bz)
Submitted by: jonathan
Sponsored by: Google Inc


# 218616 12-Feb-2011 dchagin

Move linux_clone(), linux_fork(), linux_vfork() to a MI path.


# 218613 12-Feb-2011 dchagin

In preparation for moving linux_clone() to a MI path
introduce linux_set_upcall_kse().


# 218612 12-Feb-2011 dchagin

In preparation for moving linux_clone () to a MI path
move the TLS code in a separate function.

Use function parameter instead of direct using register.


# 218100 30-Jan-2011 dchagin

The kern_wait() code already removes the SIGCHLD signal for the waited
process. Removing other SIGCHLD signals is not needed and may cause
problems.

Pointed out by: jilles

MFC after: 1 Month.


# 218059 29-Jan-2011 dchagin

My style(9) bug.

Pointed out by: kib

MFC after: 1 Month.


# 218030 28-Jan-2011 dchagin

Implement a variation of the linux_common_wait() which should
be used by linuxolator itself.

Move linux_wait4() to MD path as it requires native struct
rusage translation to struct l_rusage on linux32/amd64.

MFC after: 1 Month.


# 218028 28-Jan-2011 dchagin

To avoid excessive code duplication move struct rusage translation
to a separate function.

MFC after: 1 Month.


# 217896 26-Jan-2011 dchagin

Add macro to test the sv_flags of any process. Change some places to test
the flags instead of explicit comparing with address of known sysentvec
structures.

MFC after: 1 month


# 216634 21-Dec-2010 jkim

Improve PCB flags handling and make it more robust. Add two new functions
for manipulating pcb_flags. These inline functions are very similar to
atomic_set_char(9) and atomic_clear_char(9) but without unnecessary LOCK
prefix for SMP. Add comments about the rationale[1]. Use these functions
wherever possible. Although there are some places where it is not strictly
necessary (e.g., a PCB is copied to create a new PCB), it is done across
the board for sake of consistency. Turn pcb_full_iret into a PCB flag as
it is safe now. Move rarely used fields before pcb_flags and reduce size
of pcb_flags to one byte. Fix some style(9) nits in pcb.h while I am in
the neighborhood.

Reviewed by: kib
Submitted by: kib[1]
MFC after: 2 months


# 210501 26-Jul-2010 kib

Remove unneeded includes.

Submitted by: alc
MFC after: 1 week


# 210431 23-Jul-2010 kib

Remove the linux_exec_copyin_args(), freebsd32_exec_copyin_args() may
server as well. COMPAT_FREEBSD32 is a prerequisite for COMPAT_LINUX32.

Reviewed by: alc
MFC after: 3 weeks


# 210429 23-Jul-2010 alc

Eliminate a little bit of duplicated code.


# 208994 10-Jun-2010 kan

Do not require pos parameter to be zero in MAP_ANONYMOUS mmap requests
in Linux emulation layer. Linux seems to only require that pos is
page-aligned, but otherwise ignores it. Default FreeBSD mmap parameter
checking is too strict to allow some Linux binaries to run. tsMuxeR is
one example of such a binary.

Discussed with: jhb
MFC after: 1 week


# 198554 28-Oct-2009 jhb

Fix some problems with effective mmap() offsets > 32 bits. This was
partially fixed on amd64 earlier. Rather than forcing linux_mmap_common()
to use a 32-bit offset, have it accept a 64-bit file offset. This offset
is then passed to the real mmap() call. Rather than inventing a structure
to hold the normal linux_mmap args that has a 64-bit offset, just pass
each of the arguments individually to linux_mmap_common() since that more
closes matches the existing style of various kern_foo() functions.

Submitted by: Christian Zander @ Nvidia
MFC after: 1 week


# 190620 01-Apr-2009 kib

Save and restore segment registers on amd64 when entering and leaving
the kernel on amd64. Fill and read segment registers for mcontext and
signals. Handle traps caused by restoration of the
invalidated selectors.

Implement user-mode creation and manipulation of the process-specific
LDT descriptors for amd64, see sysarch(2).

Implement support for TSS i/o port access permission bitmap for amd64.

Context-switch LDT and TSS. Do not save and restore segment registers on
the context switch, that is handled by kernel enter/leave trampolines
now. Remove segment restore code from the signal trampolines for
freebsd/amd64, freebsd/ia32 and linux/i386 for the same reason.

Implement amd64-specific compat shims for sysarch.

Linuxolator (temporary ?) switched to use gsbase for thread_area pointer.

TODO:
Currently, gdb is not adapted to show segment registers from struct reg.
Also, no machine-depended ptrace command is added to set segment
registers for debugged process.

In collaboration with: pho
Discussed with: peter
Reviewed by: jhb
Linuxolator tested by: dchagin


# 188750 18-Feb-2009 kib

Adapt linux emulation to use cv for vfork wait.

Submitted by: Takahiro Kurosawa <takahiro.kurosawa gmail com>
PR: kern/131506


# 185438 29-Nov-2008 kib

Fix iovec32 for linux32/amd64.

Add a custom version of copyiniov() to deal with the 32-bit iovec
pointers from userland (to be used later).

Adjust prototypes for linux_readv() and linux_writev() to use new
l_iovec32 definition and to match actual linux code. In particular,
use ulong for fd (why ?).

Submitted by: dchagin


# 184849 11-Nov-2008 ed

Several cleanups related to pipe(2).

- Use `fildes[2]' instead of `*fildes' to make more clear that pipe(2)
fills an array with two descriptors.

- Remove EFAULT from the manual page. Because of the current calling
convention, pipe(2) raises a segmentation fault when an invalid
address is passed.

- Introduce kern_pipe() to make it easier for binary emulations to
implement pipe(2).

- Make Linux binary emulation use kern_pipe(), which means we don't have
to recover td_retval after calling the FreeBSD system call.

Approved by: rdivacky
Discussed on: arch


# 182868 08-Sep-2008 kib

The pcb_gs32p should be per-cpu, not per-thread pointer. This is
location in GDT where the segment descriptor from pcb_gs32sd is
copied, and the location is in GDT local to CPU.

Noted and reviewed by: peter
MFC after: 1 week


# 182866 08-Sep-2008 kib

In linux_set_thread_area(), mark pcb as PCB_GS32BIT. This was missed
when r180992 was committed.

Reviewed by: peter
MFC after: 1 week


# 180992 30-Jul-2008 kib

Bring back the save/restore of the %ds, %es, %fs and %gs registers for
the 32bit images on amd64.

Change the semantic of the PCB_32BIT pcb flag to request the context
switch code to operate on the segment registers. Its previous meaning
of saving or restoring the %gs base offset is assigned to the new
PCB_GS32BIT flag.

FreeBSD 32bit image activator sets the PCB_32BIT flag, while Linux 32bit
emulation sets PCB_32BIT | PCB_GS32BIT.

Reviewed by: peter
MFC after: 2 weeks


# 176193 11-Feb-2008 jkim

Fix Linux mmap with MAP_GROWSDOWN flag.

Reported by: Andriy Gapon (avg at icyb dot net dot ua)
Tested by: Andriy Gapon (avg at icyb dot net dot ua)
Pointyhat: me
MFC after: 3 days


# 171216 04-Jul-2007 peter

Don't add the 'pad' argument to the mmap/truncate/etc syscalls.

Submitted by: kensmith
Approved by: re (kensmith)


# 170307 04-Jun-2007 jeff

Commit 14/14 of sched_lock decomposition.
- Use thread_lock() rather than sched_lock for per-thread scheduling
sychronization.
- Use the per-process spinlock rather than the sched_lock for per-process
scheduling synchronization.

Tested by: kris, current@
Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc.
Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)


# 169458 10-May-2007 kan

Do not dereference linux_to_bsd_signal[-1] if userland has
passed zero as exit signal.

GCC 4.2 changes the kernel data segment layout not to have 0
in that memory location. This code ran by luck before and now
the luck has run out.


# 168848 18-Apr-2007 jkim

Fix style(9) and comments.

Submitted by: Scot Hetzel (swhetzel at gmail dot com)


# 168844 18-Apr-2007 jkim

style(9) says sizeof's are not be followed by a space. Fix them.


# 168843 18-Apr-2007 jkim

Implement settimeofday() for Linuxulator/amd64.

Submitted by: Scot Hetzel (swhetzel at gmail dot com)


# 168063 30-Mar-2007 jkim

MFP4: Fix style(9) nits and grammar in comments.


# 168056 30-Mar-2007 jkim

MFP4: 114193, 114194

Dont "return" in linux_clone() after we forked the new process in a case
of problems. Move the copyout of p2->p_pid outside the emul_lock coverage.

Submitted by: Roman Divacky


# 168035 29-Mar-2007 jkim

MFP4: Linux set_thread_area syscall (aka TLS) support for amd64.

Initial version was submitted by Divacky Roman and mostly rewritten by me.

Tested by: emulation


# 167157 01-Mar-2007 jkim

MFP4: 115220, 115222

- Fix style(9) and reduce diff between amd64 and i386.
- Prefix Linuxulator macros with LINUX_ to prevent future collision.


# 167048 27-Feb-2007 jkim

MFP4: 115094

Linux does not check file descriptor when MAP_ANONYMOUS is set.
This should fix recent LTP test regressions.

Reported by: Scot Hetzel (swhetzel at gmail dot com)
netchild


# 166944 24-Feb-2007 netchild

Partial MFp4 of 114977:
Whitespace commit: Fix grammar, spelling and punctuation.

Submitted by: "Scot Hetzel" <swhetzel@gmail.com>


# 166731 14-Feb-2007 jkim

Fix accidental removal of an empty line from the previous commit.


# 166729 14-Feb-2007 jkim

MFP4: 113033

Port iopl(2) from i386. This fixes LTP iopl01 and iopl02 on amd64.


# 166727 14-Feb-2007 jkim

MFP4: 113025, 113146, 113177, 113203, 113500, 113546, 113570

- PROT_READ, PROT_WRITE, or PROT_EXEC implies PROT_READ and PROT_EXEC.
Linux/ia64's i386 emulation layer does this and it complies with Linux
header files. This fixes mmap05 LTP test case on amd64.
- Do not adjust stack size when failure has occurred.
- Synchronize i386 mmap/mprotect with amd64.


# 166395 01-Feb-2007 kib

Fix LOR that occurs because proctree_lock was acquired while holding
emuldata lock by moving the code upwards outside the emul_lock coverage.

Submitted by: rdivacky


# 166394 01-Feb-2007 kib

MFi386: Use LINUX_SIG_VALID macro.

Submitted by: rdivacky


# 166188 23-Jan-2007 jeff

- Remove setrunqueue and replace it with direct calls to sched_add().
setrunqueue() was mostly empty. The few asserts and thread state
setting were moved to the individual schedulers. sched_add() was
chosen to displace it for naming consistency reasons.
- Remove adjustrunqueue, it was 4 lines of code that was ifdef'd to be
different on all three schedulers where it was only called in one place
each.
- Remove the long ifdef'd out remrunqueue code.
- Remove the now redundant ts_state. Inspect the thread state directly.
- Don't set TSF_* flags from kern_switch.c, we were only doing this to
support a feature in one scheduler.
- Change sched_choose() to return a thread rather than a td_sched. Also,
rely on the schedulers to return the idlethread. This simplifies the
logic in choosethread(). Aside from the run queue links kern_switch.c
mostly does not care about the contents of td_sched.

Discussed with: julian

- Move the idle thread loop into the per scheduler area. ULE wants to
do something different from the other schedulers.

Suggested by: jhb

Tested on: x86/amd64 sched_{4BSD, ULE, CORE}.


# 166150 20-Jan-2007 netchild

MFp4 (113077, 113083, 113103, 113124, 113097):

Dont expose em->shared to the outside world before its properly
initialized. Might not affect anything but its at least a better
coding style.

Dont expose em via p->p_emuldata until its properly initialized.
This also enables us to get rid of some locking and simplify the
code because we are workin on a local copy.

In linux_fork and linux_vfork create the process in stopped state
to be sure that the new process runs with fully initialized emuldata
structure [1]. Also fix the vfork (both in linux_clone and linux_vfork)
race that could result in never woken up process [2].

Reported by: Scot Hetzel [1]
Suggested by: jhb [2]
Reviewed by: jhb (at least some important parts)
Submitted by: rdivacky
Tested by: Scot Hetzel (on amd64)

Change 2 comments (in the new code) to comply to style(9).

Suggested by: jhb


# 166007 14-Jan-2007 netchild

MFp4 (112893):
Make linux_vfork() actually work. This enables make to work again with 2.6.
It also fixes the LTP vfork tests.

Submitted by: rdivacky


# 165867 07-Jan-2007 netchild

MFp4 (112498):
Rename the locking flags to EMUL_DOLOCK and EMUL_DONTLOCK to prevent confusion.

Submitted by: rdivacky


# 165832 06-Jan-2007 netchild

MFi386 rev 1.56:
Bring the linux mmap code more into line with how linux (2.4.x) behaves.

Tested by: Scot Hetzel <swhetzel@gmail.com> on amd64 without PROT_EXEC

Additionally to the i386 version always use PROT_EXEC in the mapping like the
previous version of the amd64 code did. We need to examinate this further to
decide what the right thing to do is. For now this fixes several problems in
the LTP test runs and should behave regarding PROT_EXEC like before.


# 165408 20-Dec-2006 jkim

MFP4: 109655

- Move linux_nanosleep() from src/sys/amd64/linux32/linux32_machdep.c to
src/sys/compat/linux/linux_time.c.
- Validate timespec ranges before use as Linux kernel does.
- Fix l_timespec structure.
- Clean up style(9) nits.


# 163374 15-Oct-2006 netchild

MFP4 (106538 + 106541):
Implement CLONE_VFORK. This fixes the clone05 LTP test.

Submitted by: rdivacky


# 163373 15-Oct-2006 netchild

Revert my previous commit, I mismerged this to the wrong place.

Pointy hat to: netchild


# 163372 15-Oct-2006 netchild

MFP4 (106541): Fix the clone05 test in the LTP.

Submitted by: rdivacky


# 163371 15-Oct-2006 netchild

MFP4 (107144[1]): Implement CLONE_FS on i386[1] and amd64.

Submitted by: rdivacky [1]


# 162954 02-Oct-2006 phk

First part of a little cleanup in the calendar/timezone/RTC handling.

Move relevant variables to <sys/clock.h> and fix #includes as necessary.

Use libkern's much more time- & spamce-efficient BCD routines.


# 161696 28-Aug-2006 netchild

MFi386 parts of rev 1.55 (modulo real MD parts):
- implement CLONE_PARENT semantic
- lock proc in the currently disabled part of CLONE_THREAD

Submitted by: rdivacky


# 161611 25-Aug-2006 netchild

Emulate what vfork does instead of using it in linux_vfork. This way
we can do the stuff we need to do with linux processes at fork and
don't panic the kernel at exit of the child.

Submitted by: rdivacky
Tested with: tst-vfork* (glibc regression tests)
Tested by: netchild


# 161474 20-Aug-2006 netchild

Sync the MI parts for amd64 with i386 and remove the corresponding special
handling for amd64 in the common code. The MD parts for amd64 are still
outstanding, but at least this fixes some panics on amd64.

Sponsored by: Google SoC 2006
Submitted by: rdivacky
Tested by: bsam


# 161365 16-Aug-2006 netchild

Style fixes to comments.

Sponsored by: Google SoC 2006
Submitted by: rdivacky
Noticed by: jhb, ssouhlal


# 161310 15-Aug-2006 netchild

Add the linux 2.6.x stuff (not used by default!):
- TLS - complete
- pid/tid mangling - complete
- thread area - complete
- futexes - complete with issues
- clone() extension - complete with some possible minor issues
- mq*/timer*/clock* stuff - complete but untested and the mq* stuff is
disabled when not build as part of the kernel with native FreeBSD mq*
support (module support for this will come later)

Tested with:
- linux-firefox - works, tested
- linux-opera - works, tested
- linux-realplay - doesnt work, issue with futexes
- linux-skype - doesnt work, issue with futexes
- linux-rt2-demo - works, tested
- linux-acroread - doesnt work, unknown reason (coredump) and sometimes
issue with futexes
- various unix utilities in linux-base-gentoo3 and linux-base-fc4:
everything tried worked

On amd64 not everything is supported like on i386, the catchup is planned for
later when the remaining bugs in the new functions are fixed.

To test this new stuff, you have to run
sysctl compat.linux.osrelease=2.6.16
to switch back use
sysctl compat.linux.osrelease=2.4.2

Don't switch while running a linux program, strange things may or may not
happen.

Sponsored by: Google SoC 2006
Submitted by: rdivacky
Some suggestions/help by: jhb, kib, manu@NetBSD.org, netchild


# 156440 08-Mar-2006 ups

Fix exec_map resource leaks.

Tested by: kris@


# 155402 06-Feb-2006 jhb

- Always call exec_free_args() in kern_execve() instead of doing it in all
the callers if the exec either succeeds or fails early.
- Move the code to call exit1() if the exec fails after the vmspace is
gone to the bottom of kern_execve() to cut down on some code duplication.


# 147588 24-Jun-2005 jhb

Correct the amount of data to allocate in these local copies of
exec_copyin_strings() to catch up to rev 1.266 of kern_exec.c. This fixes
panics on amd64 with compat binaries since exec_free_args() was freeing
more memory than these functions were allocating and the mismatch could
cause memory to be freed out from under other concurrent execs.

Approved by: re (scottl)


# 144670 05-Apr-2005 jhb

Fix a change in a debug printf I missed in an earlier commit.


# 144449 31-Mar-2005 jhb

- Use a custom version of copyinuio() to implement readv/writev using
kern_readv/writev.
- Use kern_sched_rr_get_interval() rather than the stackgap.


# 144441 31-Mar-2005 jhb

- Fix some sign extension problems with implicit 32 to 64 bit conversions.
- Fix the mmap2() wrapper to not truncate high addresses.

Submitted by: Christian Zander


# 142057 18-Feb-2005 jhb

- Add a custom version of exec_copyin_args() to deal with the 32-bit
pointers in argv and envv in userland and use that together with
kern_execve() and exec_free_args() to implement linux_execve() for the
amd64/linux32 ABI without using the stackgap.
- Implement linux_nanosleep() using the recently added kern_nanosleep().
- Use linux_emul_convpath() instead of linux_emul_find() in
exec_linux_imgact_try().

Tested by: cokane
Silence on: amd64


# 136152 05-Oct-2004 jhb

Rework how we store process times in the kernel such that we always store
the raw values including for child process statistics and only compute the
system and user timevals on demand.

- Fix the various kern_wait() syscall wrappers to only pass in a rusage
pointer if they are going to use the result.
- Add a kern_getrusage() function for the ABI syscalls to use so that they
don't have to play stackgap games to call getrusage().
- Fix the svr4_sys_times() syscall to just call calcru() to calculate the
times it needs rather than calling getrusage() twice with associated
stackgap, etc.
- Add a new rusage_ext structure to store raw time stats such as tick counts
for user, system, and interrupt time as well as a bintime of the total
runtime. A new p_rux field in struct proc replaces the same inline fields
from struct proc (i.e. p_[isu]ticks, p_[isu]u, and p_runtime). A new p_crux
field in struct proc contains the "raw" child time usage statistics.
ruadd() has been changed to handle adding the associated rusage_ext
structures as well as the values in rusage. Effectively, the values in
rusage_ext replace the ru_utime and ru_stime values in struct rusage. These
two fields in struct rusage are no longer used in the kernel.
- calcru() has been split into a static worker function calcru1() that
calculates appropriate timevals for user and system time as well as updating
the rux_[isu]u fields of a passed in rusage_ext structure. calcru() uses a
copy of the process' p_rux structure to compute the timevals after updating
the runtime appropriately if any of the threads in that process are
currently executing. It also now only locks sched_lock internally while
doing the rux_runtime fixup. calcru() now only requires the caller to
hold the proc lock and calcru1() only requires the proc lock internally.
calcru() also no longer allows callers to ask for an interrupt timeval
since none of them actually did.
- calcru() now correctly handles threads executing on other CPUs.
- A new calccru() function computes the child system and user timevals by
calling calcru1() on p_crux. Note that this means that any code that wants
child times must now call this function rather than reading from p_cru
directly. This function also requires the proc lock.
- This finishes the locking for rusage and friends so some of the Giant locks
in exit1() and kern_wait() are now gone.
- The locking in ttyinfo() has been tweaked so that a shared lock of the
proctree lock is used to protect the process group rather than the process
group lock. By holding this lock until the end of the function we now
ensure that the process/thread that we pick to dump info about will no
longer vanish while we are trying to output its info to the console.

Submitted by: bde (mostly)
MFC after: 1 month


# 134586 01-Sep-2004 julian

Give setrunqueue() and sched_add() more of a clue as to
where they are coming from and what is expected from them.

MFC after: 2 days


# 134269 24-Aug-2004 jhb

Correct the arguments to kern_sigaltstack() as they were reversed.

PR: kern/68079
Submitted by: Georg-W. Koltermann gwk at rahn-koltermann dot de


# 133843 16-Aug-2004 obrien

Fix the 'DEBUG' argument code to unbreak the amd64 LINT build.


# 133819 16-Aug-2004 tjr

Add preliminary support for running 32-bit Linux binaries on amd64, enabled
with the COMPAT_LINUX32 option. This is largely based on the i386 MD Linux
emulations bits, but also builds on the 32-bit FreeBSD and generic IA-32
binary emulation work.

Some of this is still a little rough around the edges, and will need to be
revisited before 32-bit and 64-bit Linux emulation support can coexist in
the same kernel.