History log of /freebsd-9.3-release/sys/compat/freebsd32/
Revision Date Author Comments
267654 20-Jun-2014 gjb

Copy stable/9 to releng/9.3 as part of the 9.3-RELEASE cycle.

Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation


263771 26-Mar-2014 kib

MFC r263349:
Make the array pointed to by AT_PAGESIZES auxv properly aligned.


261561 06-Feb-2014 kib

MFC r261080:
The posix_fallocate(2) syscall should return error number on error,
without modifying errno.

MFC r261290:
The posix_madvise(3) and posix_fadvise(2) should return error on
failure, same as posix_fallocate(2).


260221 03-Jan-2014 pluknet

Regen.


260208 02-Jan-2014 jhb

MFC 255708,255711,255731:
Extend the support for exempting processes from being killed when swap is
exhausted.
- Add a new protect(1) command that can be used to set or revoke protection
from arbitrary processes. Similar to ktrace it can apply a change to all
existing descendants of a process as well as future descendants.
- Add a new procctl(2) system call that provides a generic interface for
control operations on processes (as opposed to the debugger-specific
operations provided by ptrace(2)). procctl(2) uses a combination of
idtype_t and an id to identify the set of processes on which to operate
similar to wait6().
- Add a PROC_SPROTECT control operation to manage the protection status
of a set of processes. MADV_PROTECT still works for backwards
compatability.
- Add a p_flag2 to struct proc (and a corresponding ki_flag2 to kinfo_proc)
the first bit of which is used to track if P_PROTECT should be inherited
by new child processes.


258888 03-Dec-2013 kib

MFC r258661:
Add sysctl KERN_PROC_SIGTRAMP to retrieve signal trampoline location for the
given process.


258784 30-Nov-2013 peter

MFC: r258718: fix emulated jail_v0 byte order


256305 11-Oct-2013 kib

MFC r256061:
Add padding to match the compat32 struct stat32 definition to the real
struct stat on 32bit architectures.


254665 22-Aug-2013 kib

Regenerate.


254664 22-Aug-2013 kib

MFC r253494:
id_t is 64bit, provide the compat32 wrapper for clock_getcpuclockid2(2).

Reminded by: Petr Salinger <Petr.Salinger@seznam.cz>


254399 16-Aug-2013 davidxu

Regen.


254398 16-Aug-2013 davidxu

MFC r239347, 240295, 240296 and 253325:

r239347 | davidxu | 2012-08-17 10:26:31 +0800 (Fri, 17 Aug 2012) | 7 lines

Implement syscall clock_getcpuclockid2, so we can get a clock id
for process, thread or others we want to support.
Use the syscall to implement POSIX API clock_getcpuclock and
pthread_getcpuclockid.

PR: 168417

------------------------------------------------------------------------
r240295 | davidxu | 2012-09-10 13:00:29 +0800 (Mon, 10 Sep 2012) | 2 lines

Add missing prototype for clock_getcpuclockid.

------------------------------------------------------------------------
r240296 | davidxu | 2012-09-10 13:09:39 +0800 (Mon, 10 Sep 2012) | 2 lines

Process CPU-Time Clocks option is supported, define _POSIX_CPUTIME.

------------------------------------------------------------------------
r253325 | kib | 2013-07-14 03:32:50 +0800 (Sun, 14 Jul 2013) | 6 lines

Allow to call clock_gettime() on the clock id for zombie process.

Reported by: Petr Salinger <Petr.Salinger@seznam.cz>
PR: threads/180496
Sponsored by: The FreeBSD Foundation


254131 09-Aug-2013 kib

Regenerate.


254130 09-Aug-2013 kib

MFC r253530:
Implement compat32 wrappers for the ktimer_* syscalls.


254129 09-Aug-2013 kib

Regenerate.


254128 09-Aug-2013 kib

MFC r253529:
Wrap kmq_notify(2) for compat32 to properly consume struct sigevent32
argument.


254127 09-Aug-2013 kib

Regenerate.


254126 09-Aug-2013 kib

MFC r253528:
The freebsd32_lio_listio() compat syscall takes the struct sigevent32.


254125 09-Aug-2013 kib

MFC r253527:
Move the convert_sigevent32() utility function into freebsd32_misc.c
for consumption outside the vfs_aio.c.

For SIGEV_THREAD_ID and SIGEV_SIGNAL notification delivery methods,
also copy in the sigev_value, since librt event pumping loop compares
note generation number with the value passed through sigev_value.


254029 07-Aug-2013 kib

MFC r253525:
Use the same union name on the left and right sides of the conversion.


251052 28-May-2013 kib

Regenerate.


251051 28-May-2013 kib

MFC r250853:
Fix the wait6(2) on 32bit architectures and for the compat32, by using
the right type for the argument in syscalls.master. Also fix the
posix_fallocate(2) and posix_fadvise(2) compat32 syscalls on the
architectures which require padding of the 64bit argument.


244174 13-Dec-2012 kib

Regenerate.


244172 13-Dec-2012 kib

MFC r242958:
Add the wait6(2) system call. It takes POSIX waitid()-like process
designator to select a process which is waited for. The system call
optionally returns siginfo_t which would be otherwise provided to
SIGCHLD handler, as well as extended structure accounting for child
and cumulative grandchild resource usage.

Allow to get the current rusage information for non-exited processes
as well, similar to Solaris.

The explicit WEXITED flag is required to wait for exited processes,
allowing for more fine-grained control of the events the waiter is
interested in.

Fix the handling of siginfo for WNOWAIT option for all wait*(2)
family, by not removing the queued signal state.

PR: standards/170346

MFC r243133:
Style fixes for r242958.

MFC r243134:
Alphabetically reorder the forward-declarations of the structures.
Add the declaration for enum idtype, to be used later.

MFC r243135:
Move the definition of the idtype_t from sys/types.h to sys/wait.h.
Fix the bug, use #if __BSD_VISIBLE instead of #if defined(__BSD_VISIBLE),
since __BSD_VISIBLE is always defined.
Reformat the comments from the Solaris style to KNF.

MFC r243136:
Restore the proper handling of the pid 0 for waitpid(2).
Fix the style around.


239581 22-Aug-2012 kib

Regen.


239580 22-Aug-2012 kib

MFC r239296:
Provide 32bit compat for old truncate(2) and ftruncate(2).


239576 22-Aug-2012 kib

Regen.


239575 22-Aug-2012 kib

MFC r239248:
Implement the old mmap syscall for compat32, when COMPAT_43 option is
enabled. The syscall is used by FreeBSD 1.1.5.1 dynamic linker.


237134 15-Jun-2012 kib

MFC r226342 (by marcel):
In elf32_trans_prot() and when compiling for amd64 or ia64, add
PROT_EXECUTE when PROT_READ is needed. By default i386 allows
execution when reading is allowed and JDK 1.4.x depends on that.

MFC r226343 (by marcel):
In sys_obreak() and when compiling for amd64 or ia64, when the process
is ILP32 (i.e. i386) grant execute permissions by default. The JDK 1.4.x
depends on being able to execute from the heap on i386.

MFC r226347 (by marcel):
In freebsd32_mmap() and when compiling for amd64 or ia64, also
ask for execute permissions when read permissions are wanted.
This is needed for JDK 1.4.x on i386.

MFC r226348 (by marcel):
Wrap mprotect(2).

MFC r226349 (by marcel):
Wrap mprotect(2) so that we can add execute permissions when read
permissions are requested. This is needed on amd64 and ia64 for
JDK 1.4.x.

MFC r226353 (by marcel):
Use PTRIN().

MFC r226388:
Control the execution permission of the readable segments for
i386 binaries on the amd64 and ia64 with the sysctl, instead of
unconditionally enabling it.

MFC note: the syscall tables were regenerated in r226349 and committed
together with changes to non-generated files. The merge includes
syscall tables regenerated after the merge, for stable/9.


236292 30-May-2012 kib

MFC r235850:
Calculate the count of per-process cow faults. Export the count to
userspace using the obscure spare int field in struct kinfo_proc.

MFC r236136:
Fix ki_cow for compat32 binaries.


232290 29-Feb-2012 davidxu

MFC 230857:

If multiple threads call kevent() to get AIO events on same kqueue fd,
it is possible that a single AIO event will be reported to multiple
threads, it is not threading friendly, and the existing API can not
control this behavior.
Allocate a kevent flags field sigev_notify_kevent_flags for AIO event
notification in sigevent, and allow user to pass EV_CLEAR, EV_DISPATCH
or EV_ONESHOT to AIO kernel code, user can control whether the event
should be cleared once it is retrieved by a thread. This change should
be comptaible with existing application, because the field should have
already been zero-filled, and no additional action will be taken by
kernel.

PR: kern/156567

MFC 231006:

Add 32-bit compat code for AIO kevent flags introduced in revision 230857.

MFC 231724:

Add notes about sigev_notify_kevent_flags introduced in revision 230857
which enables thread-friendly polling on same fd for AIO events.

Reviewed by: delphij

MFC 231777:

Bump .Dd date for previous revision.


230725 29-Jan-2012 mckusick

MFC r230249:

Make sure all intermediate variables holding mount flags (mnt_flag)
and that all internal kernel calls passing mount flags are declared
as uint64_t so that flags in the top 32-bits are not lost.

MFC r230250:

There are several bugs/hangs when trying to take a snapshot on a UFS/FFS
filesystem running with journaled soft updates. Until these problems
have been tracked down, return ENOTSUPP when an attempt is made to
take a snapshot on a filesystem running with journaled soft updates.


229724 06-Jan-2012 jhb

Regen.


229723 06-Jan-2012 jhb

MFC 227070,227341,227502:
Add the posix_fadvise(2) system call. It is somewhat similar to
madvise(2) except that it operates on a file descriptor instead of a
memory region. It is currently only supported on regular files.

Note that this adds a new VOP, so all filesystem modules must be
recompiled.

Approved by: re (kib)


229513 04-Jan-2012 jhb

Regen.

Reminded by: kib


229500 04-Jan-2012 jhb

MFC 226364:
Use PAIR32TO64() for the offset and length parameters to
freebsd32_posix_fallocate() to properly handle big-endian platforms.


229487 04-Jan-2012 pluknet

MFC r227447:

struct timespec32: change types of tv_sec and tv_nsec fields to signed
to match native struct timespec ABI on __LP32__.


225736 23-Sep-2011 kensmith

Copy head to stable/9 as part of 9.0-RELEASE release cycle.

Approved by: re (implicit)


225618 16-Sep-2011 kmacy

Auto-generated code from sys_ prefixing makesyscalls.sh change

Approved by: re(bz)


225617 16-Sep-2011 kmacy

In order to maximize the re-usability of kernel code in user space this
patch modifies makesyscalls.sh to prefix all of the non-compatibility
calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel
entry points and all places in the code that use them. It also
fixes an additional name space collision between the kernel function
psignal and the libc function of the same name by renaming the kernel
psignal kern_psignal(). By introducing this change now we will ease future
MFCs that change syscalls.

Reviewed by: rwatson
Approved by: re (bz)


224778 11-Aug-2011 rwatson

Second-to-last commit implementing Capsicum capabilities in the FreeBSD
kernel for FreeBSD 9.0:

Add a new capability mask argument to fget(9) and friends, allowing system
call code to declare what capabilities are required when an integer file
descriptor is converted into an in-kernel struct file *. With options
CAPABILITIES compiled into the kernel, this enforces capability
protection; without, this change is effectively a no-op.

Some cases require special handling, such as mmap(2), which must preserve
information about the maximum rights at the time of mapping in the memory
map so that they can later be enforced in mprotect(2) -- this is done by
narrowing the rights in the existing max_protection field used for similar
purposes with file permissions.

In namei(9), we assert that the code is not reached from within capability
mode, as we're not yet ready to enforce namespace capabilities there.
This will follow in a later commit.

Update two capability names: CAP_EVENT and CAP_KEVENT become
CAP_POST_KEVENT and CAP_POLL_KEVENT to more accurately indicate what they
represent.

Approved by: re (bz)
Submitted by: jonathan
Sponsored by: Google Inc


224199 18-Jul-2011 bz

Rename ki_ocomm to ki_tdname and OCOMMLEN to TDNAMLEN.
Provide backward compatibility defines under BURN_BRIDGES.

Suggested by: jhb
Reviewed by: emaste
Sponsored by: Sandvine Incorporated
Approved by: re (kib)


224140 17-Jul-2011 marck

Correct small typo in a do{}while(0) define

Approved by: kib
MFC after: 2 weeks


224067 15-Jul-2011 jonathan

Auto-generated system call code with cap_new(), cap_getrights().

Approved by: mentor (rwatson), re (Capsicum blanket)
Sponsored by: Google Inc


224066 15-Jul-2011 jonathan

Add cap_new() and cap_getrights() system calls.

Implement two previously-reserved Capsicum system calls:
- cap_new() creates a capability to wrap an existing file descriptor
- cap_getrights() queries the rights mask of a capability.

Approved by: mentor (rwatson), re (Capsicum blanket)
Sponsored by: Google Inc


223167 16-Jun-2011 kib

Regen.


223166 16-Jun-2011 kib

Implement compat32 for old lseek, for the a.out binaries on amd64.


220792 18-Apr-2011 mdf

Regen.


220791 18-Apr-2011 mdf

Add the posix_fallocate(2) syscall. The default implementation in
vop_stdallocate() is filesystem agnostic and will run as slow as a
read/write loop in userspace; however, it serves to correctly
implement the functionality for filesystems that do not implement a
VOP_ALLOCATE.

Note that __FreeBSD_version was already bumped today to 900036 for any
ports which would like to use this function.

Also reserve space in the syscall table for posix_fadvise(2).

Reviewed by: -arch (previous version)


220281 02-Apr-2011 kib

Implement compat32 shims for PCIOCGETCONF.

There is a generic problem with the shims for ioctls that receive
pointers to the usermode data areas in the data argument. We either have
to modify the handler to accept UIO_USERSPACE/UIO_SYSSPACE indicator, or
allocate and fill a usermode memory for data buffer in the host format.
The change goes the second route, in particular because we do not need
to modify the handler.

Submitted by: John Wehle <john feith com>
MFC after: 2 weeks


220280 02-Apr-2011 kib

Provide the structures and ioctl number definition for handling
PCIOCGETCONF compat32.

Submitted by: John Wehle <john feith com>
MFC after: 2 weeks


220239 01-Apr-2011 kib

Regen


220238 01-Apr-2011 kib

Add support for executing the FreeBSD 1/i386 a.out binaries on amd64.

In particular:
- implement compat shims for old stat(2) variants and ogetdirentries(2);
- implement delivery of signals with ancient stack frame layout and
corresponding sigreturn(2);
- implement old getpagesize(2);
- provide a user-mode trampoline and LDT call gate for lcall $7,$0;
- port a.out image activator and connect it to the build as a module
on amd64.

The changes are hidden under COMPAT_43.

MFC after: 1 month


220164 30-Mar-2011 trasz

Regenerate.


220163 30-Mar-2011 trasz

Add rctl. It's used by racct to take user-configurable actions based
on the set of rules it maintains and the current resource usage. It also
privides userland API to manage that ruleset.

Sponsored by: The FreeBSD Foundation
Reviewed by: kib (earlier version)


220159 30-Mar-2011 kib

Regen.


220158 30-Mar-2011 kib

Provide compat32 shims for kldstat(2).

Requested and tested by: jpaetzel
MFC after: 1 week


219989 25-Mar-2011 kib

Implement compat32 MEMRANGE_GET and MEMRANGE_SET. This is needed to
run 32bit Xorg server with VESA driver.

Submitted by: John Wehle <john feith com>
MFC after: 1 week


219988 25-Mar-2011 kib

Fully emulate MDIOCLIST for compat32.

MFC after: 1 week


219987 25-Mar-2011 kib

Remove unneccessary panics, that can be easily triggered by user.
The copyin() function handles NULL as well as any other pointer.

MFC after: 3 days


219986 25-Mar-2011 kib

Fix file leakage in the freebsd32_ioctl routines.

Code inspection shows freebsd32_ioctl calls fget for a fd and calls
a subroutine to handle each specific ioctl. It is expected that the
subroutine will call fdrop when done. However many of the subroutines
will exit out early if copyin encounters an error resulting in fdrop
never being called.

Submitted by: John Wehle <john feith com>
MFC after: 3 days


219560 12-Mar-2011 avg

add DTrace systrace support for linux32 and freebsd32 on amd64 syscalls

Regenerate system call and systrace support files.

PR: kern/152822
Submitted by: Artem Belevich <fbsdlist@src.cx>
Reviewed by: jhb (earlier version)
MFC after: 3 weeks


219559 12-Mar-2011 avg

add DTrace systrace support for linux32 and freebsd32 on amd64 syscalls

This commits makes necessary changes in syscall/sysent generation
infrastructure.

PR: kern/152822
Submitted by: Artem Belevich <fbsdlist@src.cx>
Reviewed by: jhb (ealier version)
MFC after: 3 weeks


219307 05-Mar-2011 trasz

Export login class information via kinfo and make it possible to view
it using "ps -o class".


219305 05-Mar-2011 trasz

Regenerate.


219304 05-Mar-2011 trasz

Add two new system calls, setloginclass(2) and getloginclass(2). This makes
it possible for the kernel to track login class the process is assigned to,
which is required for RCTL. This change also make setusercontext(3) call
setloginclass(2) and makes it possible to retrieve current login class using
id(1).

Reviewed by: kib (as part of a larger patch)


219132 01-Mar-2011 rwatson

Regenerate system call files following addition of cap_enter(2),
cap_getmode(2), and capabilities.conf.

Reviewed by: anderson
Discussed with: benl, kris, pjd
Obtained from: Capsicum Project
Sponsored by: Google, Inc.
MFC after: 3 months


219129 01-Mar-2011 rwatson

Add initial support for Capsicum's Capability Mode to the FreeBSD kernel,
compiled conditionally on options CAPABILITIES:

Add a new credential flag, CRED_FLAG_CAPMODE, which indicates that a
subject (typically a process) is in capability mode.

Add two new system calls, cap_enter(2) and cap_getmode(2), which allow
setting and querying (but never clearing) the flag.

Export the capability mode flag via process information sysctls.

Sponsored by: Google, Inc.
Reviewed by: anderson
Discussed with: benl, kris, pjd
Obtained from: Capsicum Project
MFC after: 3 months


217151 08-Jan-2011 kib

Create shared (readonly) page. Each ABI may specify the use of page by
setting SV_SHP flag and providing pointer to the vm object and mapping
address. Provide simple allocator to carve space in the page, tailored
to put the code with alignment restrictions.

Enable shared page use for amd64, both native and 32bit FreeBSD
binaries. Page is private mapped at the top of the user address
space, moving a start of the stack one page down. Move signal
trampoline code from the top of the stack to the shared page.

Reviewed by: alc


216572 19-Dec-2010 kib

Restore the ABI of struct kinfo_proc32 after r213536.

MFC after: 3 days


215747 23-Nov-2010 pluknet

Update MNT_ROOTFS comments after changes in the root mount logic.

Reported by: arundel
Suggested by: marcel (phrasing)
Approved by: kib (mentor)


215679 22-Nov-2010 attilio

Add the ability for GDB to printout the thread name along with other
thread specific informations.

In order to do that, and in order to avoid KBI breakage with existing
infrastructure the following semantic is implemented:
- For live programs, a new member to the PT_LWPINFO is added (pl_tdname)
- For cores, a new ELF note is added (NT_THRMISC) that can be used for
storing thread specific, miscellaneous, informations. Right now it is
just popluated with a thread name.

GDB, then, retrieves the correct informations from the corefile via the
BFD interface, as it groks the ELF notes and create appropriate
pseudo-sections.

Sponsored by: Sandvine Incorporated
Tested by: gianni
Discussed with: dim, kan, kib
MFC after: 2 weeks


211412 17-Aug-2010 kib

Supply some useful information to the started image using ELF aux vectors.
In particular, provide pagesize and pagesizes array, the canary value
for SSP use, number of host CPUs and osreldate.

Tested by: marius (sparc64)
MFC after: 1 month


211006 07-Aug-2010 kib

Prefer struct sysentvec sv_psstrings to hardcoding FREEBSD32_PS_STRINGS
in the compat32 code. Use sv_usrstack instead of FREEBSD32_USRSTACK as well.

MFC after: 1 week


211005 07-Aug-2010 kib

Add compat32 definition for (old) struct ostat.

MFC after: 1 week


210848 04-Aug-2010 kib

Copy inode birthtime to the struct stat32.

MFC after: 1 week


210847 04-Aug-2010 kib

Fix style.

MFC after: 1 week


210796 03-Aug-2010 kib

When compat32 recvmsg(2) does not need to copy out control messages, set
msg_controllen to 0.

PR: kern/149227
Submitted by: Stef Walter <stef memberwebs com>
MFC after: 1 weeks


210545 27-Jul-2010 alc

Introduce exec_alloc_args(). The objective being to encapsulate the
details of the string buffer allocation in one place.

Eliminate the portion of the string buffer that was dedicated to storing
the interpreter name. The pointer to the interpreter name can simply be
made to point to the appropriate argument string.

Reviewed by: kib


210498 26-Jul-2010 kib

Revert r210451, and the similar part of the r210431. The forward-declaration
for the enum tag when enum definition is not complete is not allowed by
C99, and is gcc extension.

Requested by: stefanf
MFC after: 28 days


210475 25-Jul-2010 alc

Change the order in which the file name, arguments, environment, and
shell command are stored in exec*()'s demand-paged string buffer. For
a "buildworld" on an 8GB amd64 multiprocessor, the new order reduces
the number of global TLB shootdowns by 31%. It also eliminates about
330k page faults on the kernel address space.

Change exec_shell_imgact() to use "args->begin_argv" consistently as
the start of the argument and environment strings. Previously, it
would sometimes use "args->buf", which is the start of the overall
buffer, but no longer the start of the argument and environment
strings. While I'm here, eliminate unnecessary passing of "&length"
to copystr(), where we don't actually care about the length of the
copied string.

Clean up the initialization of the exec map. In particular, use the
correct size for an entry, and express that size in the same way that
is used when an entry is allocated. The old size was one page too
large. (This discrepancy originated in 2004 when I rewrote
exec_map_first_page() to use sf_buf_alloc() instead of the exec map
for mapping the first page of the executable.)

Reviewed by: kib


210431 23-Jul-2010 kib

Remove the linux_exec_copyin_args(), freebsd32_exec_copyin_args() may
server as well. COMPAT_FREEBSD32 is a prerequisite for COMPAT_LINUX32.

Reviewed by: alc
MFC after: 3 weeks


210429 23-Jul-2010 alc

Eliminate a little bit of duplicated code.


209687 04-Jul-2010 kib

Constify source argument for siginfo_to_siginfo32().

MFC after: 1 week


209581 28-Jun-2010 kib

Regenerate


209579 28-Jun-2010 kib

Count number of threads that enter and leave dynamically registered
syscalls. On the dynamic syscall deregistration, wait until all
threads leave the syscall code. This somewhat increases the safety
of the loadable modules unloading.

Reviewed by: jhb
Tested by: pho
MFC after: 1 month


207008 21-Apr-2010 kib

Provide compat32 shims for kinfo_proc sysctl. This allows 32bit ps(1) to
mostly work on 64bit host.

The work is based on an original patch submitted by emaste, obtained
from Sandvine's source tree.

Reviewed by: jhb
MFC after: 1 week


207007 21-Apr-2010 kib

Extract the code to copy-out struct rusage32 from struct rusage
into the new function.

Reviewed by: jhb
MFC after: 1 week


205792 28-Mar-2010 ed

Rename st_*timespec fields to st_*tim for POSIX 2008 compliance.

A nice thing about POSIX 2008 is that it finally standardizes a way to
obtain file access/modification/change times in sub-second precision,
namely using struct timespec, which we already have for a very long
time. Unfortunately POSIX uses different names.

This commit adds compatibility macros, so existing code should still
build properly. Also change all source code in the kernel to work
without any of the compatibility macros. This makes it all a less
ambiguous.

I am also renaming st_birthtime to st_birthtim, even though it was a
local extension anyway. It seems Cygwin also has a st_birthtim.


205328 19-Mar-2010 kib

Regen


205327 19-Mar-2010 kib

Remove empty line.

MFC after: 2 weeks


205325 19-Mar-2010 kib

Implement compat32 shims for mqueuefs.

Reviewed by: jhb
MFC after: 2 weeks


205324 19-Mar-2010 kib

Implement compat32 shims for ksem syscalls.

Reviewed by: jhb
MFC after: 2 weeks


205323 19-Mar-2010 kib

Move SysV IPC freebsd32 compat shims from freebsd32_misc.c to corresponding
sysv_{msg,sem,shm}.c files.

Mark SysV IPC freebsd32 syscalls as NOSTD and add required
SYSCALL_INIT_HELPER/SYSCALL32_INIT_HELPERs to provide auto
register/unregister on module load.

This makes COMPAT_FREEBSD32 functional with SysV IPC compiled and loaded
as modules.

Reviewed by: jhb
MFC after: 2 weeks


205322 19-Mar-2010 kib

Move SysV IPC freebsd32 compat shims helpers from freebsd32_misc.c to
sysv_ipc.c.

Reviewed by: jhb
MFC after: 2 weeks


205321 19-Mar-2010 kib

Introduce SYSCALL_INIT_HELPER and SYSCALL32_INIT_HELPER macros and
neccessary support functions to allow registering dynamically loaded
syscalls from the MOD_LOAD handlers. Helpers handle registration
failures semi-automatically.

Reviewed by: jhb
MFC after: 2 weeks


205320 19-Mar-2010 kib

FOr SYSCALL_MODULE_HELPER, use "sys/<syscallname>" module name.
FOr SYSCALL32_MODULE_HELPER, use "sys32/<syscallname>" module name.
This avoids modules name conflict when compat32 syscall does not
need shims.

Note that SYSCALL_MODULE_HELPER is going to be unused in the tree by
several next commits.

Suggested by: jhb
MFC after: 2 weeks


205319 19-Mar-2010 kib

Make freebsd32_copyiniov() available outside of freebsd32_misc.

MFC after: 2 weeks


205016 11-Mar-2010 nwhitehorn

Regen after big endian compatibility import.


205014 11-Mar-2010 nwhitehorn

Provide groundwork for 32-bit binary compatibility on non-x86 platforms,
for upcoming 64-bit PowerPC and MIPS support. This renames the COMPAT_IA32
option to COMPAT_FREEBSD32, removes some IA32-specific code from MI parts
of the kernel and enhances the freebsd32 compatibility code to support
big-endian platforms.

Reviewed by: kib, jhb


203660 08-Feb-2010 ed

Remove unused LIBCOMPAT keyword from syscalls.master.


200619 16-Dec-2009 imp

Revert 200606.


200606 16-Dec-2009 imp

Fix compiling FREEBSD_COMPAT[4,5,6] without FREEBSD_COMPAT7.

Note: Not sure this is the right way to do compat, but it makes the
headers consistent with the implementations.


200112 04-Dec-2009 kib

Regenerate.


200111 04-Dec-2009 kib

Add several syscall compat32 entries for acl manipulation.
They do not require translation of the arguments.

Tested by: bsam
MFC after: 1 week


198512 27-Oct-2009 kib

Regenerate


198508 27-Oct-2009 kib

Current pselect(3) is implemented in usermode and thus vulnerable to
well-known race condition, which elimination was the reason for the
function appearance in first place. If sigmask supplied as argument to
pselect() enables a signal, the signal might be delivered before thread
called select(2), causing lost wakeup. Reimplement pselect() in kernel,
making change of sigmask and sleep atomic.

Since signal shall be delivered to the usermode, but sigmask restored,
set TDP_OLDMASK and save old mask in td_oldsigmask. The TDP_OLDMASK
should be cleared by ast() in case signal was not gelivered during
syscall execution.

Reviewed by: davidxu
Tested by: pho
MFC after: 1 month


198507 27-Oct-2009 kib

In r197963, a race with thread being selected for signal delivery
while in kernel mode, and later changing signal mask to block the
signal, was fixed for sigprocmask(2) and ptread_exit(3). The same race
exists for sigreturn(2), setcontext(2) and swapcontext(2) syscalls.

Use kern_sigprocmask() instead of direct manipulation of td_sigmask to
reschedule newly blocked signals, closing the race.

Reviewed by: davidxu
Tested by: pho
MFC after: 1 month


198506 27-Oct-2009 kib

In kern_sigsuspend(), better manipulate thread signal mask using
kern_sigprocmask() to properly notify other possible candidate threads
for signal delivery.

Since sigsuspend() shall only return to usermode after a signal was
delivered, do cursig/postsig loop immediately after waiting for
signal, repeating the wait if wakeup was spurious due to race with
other thread fetching signal from the process queue before us. Add
thread_suspend_check() call to allow the thread to be stopped or killed
while in loop.

Modify last argument of kern_sigprocmask() from boolean to flags,
allowing the function to be called with locked proc. Convertion of the
callers that supplied 1 to the old argument will be done in the next
commit, and due to SIGPROCMASK_OLD value equial to 1, code is formally
correct in between.

Reviewed by: davidxu
Tested by: pho
MFC after: 1 month


197637 30-Sep-2009 rwatson

Regenerate system call files following r197636.


197636 30-Sep-2009 rwatson

Reserve system call numbers for Capsicum security framework capabilities,
capability mode, and process descriptors: cap_new, cap_getrights, cap_enter,
cap_getmode, pdfork, pdkill, pdgetpid, and pdwait.

Obtained from: TrustedBSD Project
Sponsored by: Google
MFC after: 3 weeks


197057 10-Sep-2009 des

If a certain feature that was present in FreeBSD 7 was removed or changed in
FreeBSD 8, the compatibility shims should be built not just when FreeBSD 7
compatibility is requested, but also when compatibility with any older
FreeBSD version where that feature was present is requested.o

Without this patch, a kernel config that sets COMPAT_FREEBSD6 but not *7
would fail to build due to inconsistencies between the declaration of the
compatibility shims and their use in the SysV code.

There are similar errors in other *proto.h headers in the tree.

MFC after: 3 weeks


197049 09-Sep-2009 kib

kern_select(9) copies fd_set in and out of userspace in quantities of
longs. Since 32bit processes longs are 4 bytes, 64bit kernel may copy in
or out 4 bytes more then the process expected.

Calculate the amount of bytes to copy taking into account size of fd_set
for the current process ABI.

Diagnosed and tested by: Peter Jeremy <peterjeremy acm org>
Reviewed by: jhb
MFC after: 1 week


195911 27-Jul-2009 jhb

Fix the freebsd32 versions of semsys(), shmsys(), and msgsys() to use the
old ABI versions of the relevant control system call (e.g.
freebsd7_freebsd32_msgctl() instead of freebsd32_msgctl() for msgsys()).

Approved by: re (kib)


195469 08-Jul-2009 trasz

Regen the freebsd32 parts.

Approved by: re (kib)


195468 08-Jul-2009 trasz

Fix freebsd32 version of lpathconf(2).

Approved by: re (kib)


195458 08-Jul-2009 trasz

There is an optimization in chmod(1), that makes it not to call chmod(2)
if the new file mode is the same as it was before; however, this
optimization must be disabled for filesystems that support NFSv4 ACLs.
Chmod uses pathconf(2) to determine whether this is the case - however,
pathconf(2) always follows symbolic links, while the 'chmod -h' doesn't.

This change adds lpathconf(3) to make it possible to solve that problem
in a clean way.

Reviewed by: rwatson (earlier version)
Approved by: re (kib)


195104 27-Jun-2009 rwatson

Replace AUDIT_ARG() with variable argument macros with a set more more
specific macros for each audit argument type. This makes it easier to
follow call-graphs, especially for automated analysis tools (such as
fxr).

In MFC, we should leave the existing AUDIT_ARG() macros as they may be
used by third-party kernel modules.

Suggested by: brooks
Approved by: re (kib)
Obtained from: TrustedBSD Project
MFC after: 1 week


194919 24-Jun-2009 jhb

Regen.


194910 24-Jun-2009 jhb

Change the ABI of some of the structures used by the SYSV IPC API:
- The uid/cuid members of struct ipc_perm are now uid_t instead of unsigned
short.
- The gid/cgid members of struct ipc_perm are now gid_t instead of unsigned
short.
- The mode member of struct ipc_perm is now mode_t instead of unsigned short
(this is merely a style bug).
- The rather dubious padding fields for ABI compat with SV/I386 have been
removed from struct msqid_ds and struct semid_ds.
- The shm_segsz member of struct shmid_ds is now a size_t instead of an
int. This removes the need for the shm_bsegsz member in struct
shmid_kernel and should allow for complete support of SYSV SHM regions
>= 2GB.
- The shm_nattch member of struct shmid_ds is now an int instead of a
short.
- The shm_internal member of struct shmid_ds is now gone. The internal
VM object pointer for SHM regions has been moved into struct
shmid_kernel.
- The existing __semctl(), msgctl(), and shmctl() system call entries are
now marked COMPAT7 and new versions of those system calls which support
the new ABI are now present.
- The new system calls are assigned to the FBSD-1.1 version in libc. The
FBSD-1.0 symbols in libc now refer to the old COMPAT7 system calls.
- A simplistic framework for tagging system calls with compatibility
symbol versions has been added to libc. Version tags are added to
system calls by adding an appropriate __sym_compat() entry to
src/lib/libc/incldue/compat.h. [1]

PR: kern/16195 kern/113218 bin/129855
Reviewed by: arch@, rwatson
Discussed with: kan, kib [1]


194833 24-Jun-2009 jhb

Add a new COMPAT7 flag for FreeBSD 7.x compatibility system calls.


194647 22-Jun-2009 jhb

Regen.


194645 22-Jun-2009 jhb

Fix a typo in a comment.


194392 17-Jun-2009 jhb

Regen.


194390 17-Jun-2009 jhb

- Add the ability to mix multiple flags seperated by pipe ('|') characters
in the type field of system call tables. Specifically, one can now use
the 'NO*' types as flags in addition to the 'COMPAT*' types. For example,
to tag 'COMPAT*' system calls as living in a KLD via NOSTD. The COMPAT*
type is required to be listed first in this case.
- Add new functions 'type()' and 'flag()' to the embedded awk script in
makesyscalls.sh that return true if a requested flag is found in the
type field ($3). The flag() function checks all of the flags in the
field, but type() only checks the first flag. type() is meant to be
used in the top-level "switch" statement and flag() should be used
otherwise.
- Retire the CPT_NOA type, it is now replaced with "COMPAT|NOARGS" using
the flags approach.
- Tweak the comment descriptions of COMPAT[46] system calls so that they
say "freebsd[46] foo" rather than "old foo".
- Document the COMPAT6 type.
- Sync comments in compat32 syscall table with the master table.


194263 15-Jun-2009 jhb

Regen.


194262 15-Jun-2009 jhb

Add a new 'void closefrom(int lowfd)' system call. When called, it closes
any open file descriptors >= 'lowfd'. It is largely identical to the same
function on other operating systems such as Solaris, DFly, NetBSD, and
OpenBSD. One difference from other *BSD is that this closefrom() does not
fail with any errors. In practice, while the manpages for NetBSD and
OpenBSD claim that they return EINTR, they ignore internal errors from
close() and never return EINTR. DFly does return EINTR, but for the common
use case (closing fd's prior to execve()), the caller really wants all
fd's closed and returning EINTR just forces callers to call closefrom() in
a loop until it stops failing.

Note that this implementation of closefrom(2) does not make any effort to
resolve userland races with open(2) in other threads. As such, it is not
multithread safe.

Submitted by: rwatson (initial version)
Reviewed by: rwatson
MFC after: 2 weeks


193917 10-Jun-2009 kib

Regenerate


193916 10-Jun-2009 kib

Add several syscall compat32 entries for extattr manipulation syscalls,
that do not require translation of the arguments.

Requested by: kientzle
Reviewed by: jhb (previous wrong version)
MFC after: 1 week


193235 01-Jun-2009 rwatson

Regenerate generated syscall files following changes to struct sysent in
r193234.


192895 27-May-2009 jamie

Add hierarchical jails. A jail may further virtualize its environment
by creating a child jail, which is visible to that jail and to any
parent jails. Child jails may be restricted more than their parents,
but never less. Jail names reflect this hierarchy, being MIB-style
dot-separated strings.

Every thread now points to a jail, the default being prison0, which
contains information about the physical system. Prison0's root
directory is the same as rootvnode; its hostname is the same as the
global hostname, and its securelevel replaces the global securelevel.
Note that the variable "securelevel" has actually gone away, which
should not cause any problems for code that properly uses
securelevel_gt() and securelevel_ge().

Some jail-related permissions that were kept in global variables and
set via sysctls are now per-jail settings. The sysctls still exist for
backward compatibility, used only by the now-deprecated jail(2) system
call.

Approved by: bz (mentor)


191675 29-Apr-2009 jamie

Regen for new jail system calls in r191673.

Approved by: bz (mentor)


191673 29-Apr-2009 jamie

Introduce the extensible jail framework, using the same "name=value"
interface as nmount(2). Three new system calls are added:
* jail_set, to create jails and change the parameters of existing jails.
This replaces jail(2).
* jail_get, to read the parameters of existing jails. This replaces the
security.jail.list sysctl.
* jail_remove to kill off a jail's processes and remove the jail.
Most jail parameters may now be changed after creation, and jails may be
set to exist without any attached processes. The current jail(2) system
call still exists, though it is now a stub to jail_set(2).

Approved by: bz (mentor)


190622 01-Apr-2009 kib

Regen


190621 01-Apr-2009 kib

Rename implementation function for freebsd32 sysarch(2) to allow for
the arguments translations. Provide ABI-compatible definition of the
struct i386_ldt_args for freebsd32 compat layer.

In collaboration with: pho
Reviewed by: jhb


190529 29-Mar-2009 ed

Emulate the FIODGNAME ioctl in our 32-bit emulator.

It's quite strange that nobody reported this issue before. It turns out
functions like ttyname(), ptsname() and fdevname() don't work in
compat32. This means it't not even possible to run applications like
script(1) inside a 32-bit FreeBSD jail.

Fix this by converting 32-bit fiodgname_arg structures to their 64-bit
equivalent.

Reported by: kris
Tested by: kris


190466 27-Mar-2009 jamie

Whitespace/spelling fixes in advance of upcoming functional changes.

Approved by: bz (mentor)


189290 02-Mar-2009 jamie

Extend the "vfsopt" mount options for more general use. Make struct
vfsopt and the vfs_buildopts function public, and add some new fields
to struct vfsopt (pos and seen), and new functions vfs_getopt_pos and
vfs_opterror.

Further extend the interface to allow reading options from the kernel
in addition to sending them to the kernel, with vfs_setopt and related
functions.

While this allows the "name=value" option interface to be used for more
than just FS mounts (planned use is for jails), it retains the current
"vfsopt" name and <sys/mount.h> requirement.

Approved by: bz (mentor)


186564 29-Dec-2008 ed

Push down Giant inside sysctl. Also add some more assertions to the code.

In the existing code we didn't really enforce that callers hold Giant
before calling userland_sysctl(), even though there is no guarantee it
is safe. Fix this by just placing Giant locks around the call to the oid
handler. This also means we only pick up Giant for a very short period
of time. Maybe we should add MPSAFE flags to sysctl or phase it out all
together.

I've also added SYSCTL_LOCK_ASSERT(). We have to make sure sysctl_root()
and name2oid() are called with the sysctl lock held.

Reviewed by: Jille Timmermans <jille quis cx>


185898 11-Dec-2008 bz

Add 32-bit compat support for AIO.

jhb probably forgot to commit this file with r185878 and will want to
review this. It unbreaks the build here.

Obtained from: p4 //depot/user/jhb/lock/compat/freebsd32/freebsd32_signal.h#2


185879 10-Dec-2008 jhb

Regen.


185878 10-Dec-2008 jhb

- Add 32-bit compat system calls for VFS_AIO. The system calls live in the
aio code and are registered via the recently added SYSCALL32_*() helpers.
- Since the aio code likes to invoke fuword and suword a lot down in the
"bowels" of system calls, add a structure holding a set of operations for
things like storing errors, copying in the aiocb structure, storing
status, etc. The 32-bit system calls use a separate operations vector to
handle fuword32 vs fuword, etc. Also, the oldsigevent handling is now
done by having seperate operation vectors with different aiocb copyin
routines.
- Split out kern_foo() functions for the various AIO system calls so the
32-bit front ends can manage things like copying in and converting
timespec structures, etc.
- For both the native and 32-bit aio_suspend() and lio_listio() calls,
just use copyin() to read the array of aiocb pointers instead of using
a for loop that iterated over fuword/fuword32. The error handling in
the old case was incomplete (lio_listio() just ignored any aiocb's that
it got an EFAULT trying to read rather than reporting an error), and
possibly slower.

MFC after: 1 month


185589 03-Dec-2008 jhb

When unloading a 32-bit system call module, restore the sysent vector in
the 32-bit system call table instead of the main system call table.


185436 29-Nov-2008 bz

Regen after jail support was added in r185435.


185435 29-Nov-2008 bz

MFp4:
Bring in updated jail support from bz_jail branch.

This enhances the current jail implementation to permit multiple
addresses per jail. In addtion to IPv4, IPv6 is supported as well.
Due to updated checks it is even possible to have jails without
an IP address at all, which basically gives one a chroot with
restricted process view, no networking,..

SCTP support was updated and supports IPv6 in jails as well.

Cpuset support permits jails to be bound to specific processor
sets after creation.

Jails can have an unrestricted (no duplicate protection, etc.) name
in addition to the hostname. The jail name cannot be changed from
within a jail and is considered to be used for management purposes
or as audit-token in the future.

DDB 'show jails' command was added to aid debugging.

Proper compat support permits 32bit jail binaries to be used on 64bit
systems to manage jails. Also backward compatibility was preserved where
possible: for jail v1 syscalls, as well as with user space management
utilities.

Both jail as well as prison version were updated for the new features.
A gap was intentionally left as the intermediate versions had been
used by various patches floating around the last years.

Bump __FreeBSD_version for the afore mentioned and in kernel changes.

Special thanks to:
- Pawel Jakub Dawidek (pjd) for his multi-IPv4 patches
and Olivier Houchard (cognet) for initial single-IPv6 patches.
- Jeff Roberson (jeff) and Randall Stewart (rrs) for their
help, ideas and review on cpuset and SCTP support.
- Robert Watson (rwatson) for lots and lots of help, discussions,
suggestions and review of most of the patch at various stages.
- John Baldwin (jhb) for his help.
- Simon L. Nielsen (simon) as early adopter testing changes
on cluster machines as well as all the testers and people
who provided feedback the last months on freebsd-jail and
other channels.
- My employer, CK Software GmbH, for the support so I could work on this.

Reviewed by: (see above)
MFC after: 3 months (this is just so that I get the mail)
X-MFC Before: 7.2-RELEASE if possible


184829 10-Nov-2008 peter

Sigh. Fix a pointer/int compile error.


184828 10-Nov-2008 peter

Fix a signal emulation bug introduced in r163018 (and present in 7.x).
This prevents 32 bit signal handlers from finding out what the faulting
address is. Both the secret 4th argument and siginfo->si_addr are zero.


184790 09-Nov-2008 ed

Regenerate system call tables for r184789.


184789 09-Nov-2008 ed

Mark uname(), getdomainname() and setdomainname() with COMPAT_FREEBSD4.

Looking at our source code history, it seems the uname(),
getdomainname() and setdomainname() system calls got deprecated
somewhere after FreeBSD 1.1, but they have never been phased out
properly. Because we don't have a COMPAT_FREEBSD1, just use
COMPAT_FREEBSD4.

Also fix the Linuxolator to build without the setdomainname() routine by
just making it call userland_sysctl on kern.domainname. Also replace the
setdomainname()'s implementation to use this approach, because we're
duplicating code with sysctl_domainname().

I wasn't able to keep these three routines working in our
COMPAT_FREEBSD32, because that would require yet another keyword for
syscalls.master (COMPAT4+NOPROTO). Because this routine is probably
unused already, this won't be a problem in practice. If it turns out to
be a problem, we'll just restore this functionality.

Reviewed by: rdivacky, kib


184589 03-Nov-2008 dfr

Regen.


184588 03-Nov-2008 dfr

Implement support for RPCSEC_GSS authentication to both the NFS client
and server. This replaces the RPC implementation of the NFS client and
server with the newer RPC implementation originally developed
(actually ported from the userland sunrpc code) to support the NFS
Lock Manager. I have tested this code extensively and I believe it is
stable and that performance is at least equal to the legacy RPC
implementation.

The NFS code currently contains support for both the new RPC
implementation and the older legacy implementation inherited from the
original NFS codebase. The default is to use the new implementation -
add the NFS_LEGACYRPC option to fall back to the old code. When I
merge this support back to RELENG_7, I will probably change this so
that users have to 'opt in' to get the new code.

To use RPCSEC_GSS on either client or server, you must build a kernel
which includes the KGSSAPI option and the crypto device. On the
userland side, you must build at least a new libc, mountd, mount_nfs
and gssd. You must install new versions of /etc/rc.d/gssd and
/etc/rc.d/nfsd and add 'gssd_enable=YES' to /etc/rc.conf.

As long as gssd is running, you should be able to mount an NFS
filesystem from a server that requires RPCSEC_GSS authentication. The
mount itself can happen without any kerberos credentials but all
access to the filesystem will be denied unless the accessing user has
a valid ticket file in the standard place (/tmp/krb5cc_<uid>). There
is currently no support for situations where the ticket file is in a
different place, such as when the user logged in via SSH and has
delegated credentials from that login. This restriction is also
present in Solaris and Linux. In theory, we could improve this in
future, possibly using Brooks Davis' implementation of variant
symlinks.

Supporting RPCSEC_GSS on a server is nearly as simple. You must create
service creds for the server in the form 'nfs/<fqdn>@<REALM>' and
install them in /etc/krb5.keytab. The standard heimdal utility ktutil
makes this fairly easy. After the service creds have been created, you
can add a '-sec=krb5' option to /etc/exports and restart both mountd
and nfsd.

The only other difference an administrator should notice is that nfsd
doesn't fork to create service threads any more. In normal operation,
there will be two nfsd processes, one in userland waiting for TCP
connections and one in the kernel handling requests. The latter
process will create as many kthreads as required - these should be
visible via 'top -H'. The code has some support for varying the number
of service threads according to load but initially at least, nfsd uses
a fixed number of threads according to the value supplied to its '-n'
option.

Sponsored by: Isilon Systems
MFC after: 1 month


184184 22-Oct-2008 jhb

Regen for freebsd32_getdirentries().


184183 22-Oct-2008 jhb

Split the copyout of *base at the end of getdirentries() out leaving the
rest in kern_getdirentries(). Use kern_getdirentries() to implement
freebsd32_getdirentries(). This fixes a bug where calls to getdirentries()
in 32-bit binaries would trash the 4 bytes after the 'long base' in
userland.

Submitted by: ups
MFC after: 1 week


183365 25-Sep-2008 jhb

Add support for installing 32-bit system calls from kernel modules. This
includes syscall32_{de,}register() routines as well as a module handler
and wrapper macros similar to the support for native syscalls in
<sys/sysent.h>.

MFC after: 1 month


183363 25-Sep-2008 jhb

Sort includes and add multiple include guards.


183362 25-Sep-2008 jhb

Regen.


183361 25-Sep-2008 jhb

Tidy up a few things with syscall generation:
- Instead of using a syscall slot (370) just to get a function prototype
for lkmressys(), add an explicit function prototype to <sys/sysent.h>.
This also removes unused special case checks for 'lkmressys' from
makesyscalls.sh.
- Instead of having magic logic in makesyscalls.sh to only generate a
function prototype the first time 'lkmnosys' is seen, make 'NODEF'
always not generate a function prototype and include an explicit
prototype for 'lkmnosys' in <sys/sysent.h>.
- As a result of the fix in (2), update the LKM syscall entries in
the freebsd32 syscall table to use 'lkmnosys' rather than 'nosys'.
- Use NOPROTO for the __syscall() entry (198) in the native ABI. This
avoids the need for magic logic in makesyscalls.h to only generate
a function prototype the first time 'nosys' is encountered.


183273 22-Sep-2008 obrien

Add freebsd32 compat shims for ioctl(2)
CDIOREADTOCHEADER and CDIOREADTOCENTRYS requests.


183271 22-Sep-2008 obrien

Regenerate for r183270.


183270 22-Sep-2008 obrien

Add freebsd32 compat shims for ioctl(2)
MDIOCATTACH, MDIOCDETACH, MDIOCQUERY, and MDIOCLIST requests.


183189 19-Sep-2008 obrien

Regenerate for r183188.


183188 19-Sep-2008 obrien

Add freebsd32 compat shim for nmount(2).
(and quiet some compiler warnings for vfs_donmount)


183044 15-Sep-2008 obrien

style(9)


183043 15-Sep-2008 obrien

Regenerate for r183042.


183042 15-Sep-2008 obrien

Fix bug in r100384 (rev 1.2) in which the 32-bit swapon(2) was made
"obsolete, not included in system", where as the system call does exist.


182124 24-Aug-2008 rwatson

Regenerate following r182123.


182123 24-Aug-2008 rwatson

When MPSAFE ttys were merged, a new BSM audit event identifier was
allocated for posix_openpt(2). Unfortunately, that identifier
conflicts with other events already allocated to other systems in
OpenBSM. Assign a new globally unique identifier and conform
better to the AUE_ event naming scheme.

This is a stopgap until a new OpenBSM import is done with the
correct identifier, so we'll maintain this as a local diff in svn
until then.

Discussed with: ed
Obtained from: TrustedBSD Project


181972 21-Aug-2008 obrien

Add comments on NOARGS, NODEF, and NOPROTO.


181906 20-Aug-2008 ed

Update system call tables.

The previous commit also included changes to all the system call lists,
but it is a tradition to update these lists in a second commit, so rerun
make sysent to update the $FreeBSD$ tags inside these files to refer to
the latest version of syscalls.master.

Requested by: rwatson


181905 20-Aug-2008 ed

Integrate the new MPSAFE TTY layer to the FreeBSD operating system.

The last half year I've been working on a replacement TTY layer for the
FreeBSD kernel. The new TTY layer was designed to improve the following:

- Improved driver model:

The old TTY layer has a driver model that is not abstract enough to
make it friendly to use. A good example is the output path, where the
device drivers directly access the output buffers. This means that an
in-kernel PPP implementation must always convert network buffers into
TTY buffers.

If a PPP implementation would be built on top of the new TTY layer
(still needs a hooks layer, though), it would allow the PPP
implementation to directly hand the data to the TTY driver.

- Improved hotplugging:

With the old TTY layer, it isn't entirely safe to destroy TTY's from
the system. This implementation has a two-step destructing design,
where the driver first abandons the TTY. After all threads have left
the TTY, the TTY layer calls a routine in the driver, which can be
used to free resources (unit numbers, etc).

The pts(4) driver also implements this feature, which means
posix_openpt() will now return PTY's that are created on the fly.

- Improved performance:

One of the major improvements is the per-TTY mutex, which is expected
to improve scalability when compared to the old Giant locking.
Another change is the unbuffered copying to userspace, which is both
used on TTY device nodes and PTY masters.

Upgrading should be quite straightforward. Unlike previous versions,
existing kernel configuration files do not need to be changed, except
when they reference device drivers that are listed in UPDATING.

Obtained from: //depot/projects/mpsafetty/...
Approved by: philip (ex-mentor)
Discussed: on the lists, at BSDCan, at the DevSummit
Sponsored by: Snow B.V., the Netherlands
dcons(4) fixed by: kan


180436 10-Jul-2008 brooks

style(9): put parentheses around return values.


180434 10-Jul-2008 brooks

Regen


180433 10-Jul-2008 brooks

id_t is a 64-bit integer and thus is passed as two arguments like off_t is.
As a result, those arguments must be recombined before calling the real
syscal implementation. This change fixes 32-bit compatibility for
cpuset_getid(), cpuset_setid(), cpuset_getaffinity(), and
cpuset_setaffinity().


177790 31-Mar-2008 kib

Regen


177789 31-Mar-2008 kib

Add the freebsd32 compatibility shims for the *at() syscalls.

Reviewed by: rwatson, rdivacky
Tested by: pho


177634 26-Mar-2008 dfr

Regen.


177633 26-Mar-2008 dfr

Add the new kernel-mode NFS Lock Manager. To use it instead of the
user-mode lock manager, build a kernel with the NFSLOCKD option and
add '-k' to 'rpc_lockd_flags' in rc.conf.

Highlights include:

* Thread-safe kernel RPC client - many threads can use the same RPC
client handle safely with replies being de-multiplexed at the socket
upcall (typically driven directly by the NIC interrupt) and handed
off to whichever thread matches the reply. For UDP sockets, many RPC
clients can share the same socket. This allows the use of a single
privileged UDP port number to talk to an arbitrary number of remote
hosts.

* Single-threaded kernel RPC server. Adding support for multi-threaded
server would be relatively straightforward and would follow
approximately the Solaris KPI. A single thread should be sufficient
for the NLM since it should rarely block in normal operation.

* Kernel mode NLM server supporting cancel requests and granted
callbacks. I've tested the NLM server reasonably extensively - it
passes both my own tests and the NFS Connectathon locking tests
running on Solaris, Mac OS X and Ubuntu Linux.

* Userland NLM client supported. While the NLM server doesn't have
support for the local NFS client's locking needs, it does have to
field async replies and granted callbacks from remote NLMs that the
local client has contacted. We relay these replies to the userland
rpc.lockd over a local domain RPC socket.

* Robust deadlock detection for the local lock manager. In particular
it will detect deadlocks caused by a lock request that covers more
than one blocking request. As required by the NLM protocol, all
deadlock detection happens synchronously - a user is guaranteed that
if a lock request isn't rejected immediately, the lock will
eventually be granted. The old system allowed for a 'deferred
deadlock' condition where a blocked lock request could wake up and
find that some other deadlock-causing lock owner had beaten them to
the lock.

* Since both local and remote locks are managed by the same kernel
locking code, local and remote processes can safely use file locks
for mutual exclusion. Local processes have no fairness advantage
compared to remote processes when contending to lock a region that
has just been unlocked - the local lock manager enforces a strict
first-come first-served model for both local and remote lockers.

Sponsored by: Isilon Systems
PR: 95247 107555 115524 116679
MFC after: 2 weeks


177613 25-Mar-2008 jhb

Regen.


177612 25-Mar-2008 jhb

Add entries for the cpuset-related system calls. The existing system calls
can be used on little endian systems.

Pointy hat to: jeff


177091 12-Mar-2008 jeff

Remove kernel support for M:N threading.

While the KSE project was quite successful in bringing threading to
FreeBSD, the M:N approach taken by the kse library was never developed
to its full potential. Backwards compatibility will be provided via
libmap.conf for dynamically linked binaries and static binaries will
be broken.


176216 12-Feb-2008 ru

Regenerate for readlink(2).


176215 12-Feb-2008 ru

Change readlink(2)'s return type and type of the last argument
to match POSIX.

Prodded by: Alexey Lyashkov


175518 20-Jan-2008 rwatson

Regenerate.


175517 20-Jan-2008 rwatson

Use audit events AUE_SHMOPEN and AUE_SHMUNLINK with new system calls
shm_open() and shm_unlink(). More auditing will need to be done for
these calls to capture arguments properly.


175165 08-Jan-2008 jhb

Regen for shm_open(2) and shm_unlink(2).


175164 08-Jan-2008 jhb

Add a new file descriptor type for IPC shared memory objects and use it to
implement shm_open(2) and shm_unlink(2) in the kernel:
- Each shared memory file descriptor is associated with a swap-backed vm
object which provides the backing store. Each descriptor starts off with
a size of zero, but the size can be altered via ftruncate(2). The shared
memory file descriptors also support fstat(2). read(2), write(2),
ioctl(2), select(2), poll(2), and kevent(2) are not supported on shared
memory file descriptors.
- shm_open(2) and shm_unlink(2) are now implemented as system calls that
manage shared memory file descriptors. The virtual namespace that maps
pathnames to shared memory file descriptors is implemented as a hash
table where the hash key is generated via the 32-bit Fowler/Noll/Vo hash
of the pathname.
- As an extension, the constant 'SHM_ANON' may be specified in place of the
path argument to shm_open(2). In this case, an unnamed shared memory
file descriptor will be created similar to the IPC_PRIVATE key for
shmget(2). Note that the shared memory object can still be shared among
processes by sharing the file descriptor via fork(2) or sendmsg(2), but
it is unnamed. This effectively serves to implement the getmemfd() idea
bandied about the lists several times over the years.
- The backing store for shared memory file descriptors are garbage
collected when they are not referenced by any open file descriptors or
the shm_open(2) virtual namespace.

Submitted by: dillon, peter (previous versions)
Submitted by: rwatson (I based this on his version)
Reviewed by: alc (suggested converting getmemfd() to shm_open())


174526 10-Dec-2007 jhb

Bah, remove last vestiges of some statfs conversion fixes that aren't quite
ready for CVS yet that snuck into 1.68.

Pointy hat to: jhb


174430 08-Dec-2007 scottl

Grrr, remove an unused variable missed in the last commit.


174424 07-Dec-2007 scottl

Don't expect a return value from statfs_scale_blocks().


174383 06-Dec-2007 jhb

Regen.


174382 06-Dec-2007 jhb

Add freebsd32 compat wrappers for msgctl() and __semctl() using
kern_msgctl() and kern_semctl().

MFC after: 1 week


174381 06-Dec-2007 jhb

Add freebsd32 compat wrappers for msgctl() and _semctl() using
kern_msgctl() and kern_semctl().

MFC after: 1 week


174380 06-Dec-2007 jhb

Move 32-bit SYSV IPC structure definitions into freebsd32_ipc.h.

MFC after: 1 week


174377 06-Dec-2007 jhb

Move several data structure definitions out of freebsd32_misc.c and into
freebsd32.h instead.

MFC after: 1 week


174268 04-Dec-2007 jkim

Remove redundant checks for msgsnd(3) and msgrcv(3).
COMPAT_IA32 (implicitly) requires SYSVSEM, SYSVSHM and SYSVMSG in kernel.

Pointed out by: jhb


172003 28-Aug-2007 jhb

Rework the routines to convert a 5.x+ statfs structure (with fixed-size
64-bit counters) to a 4.x statfs structure (with long-sized counters).
- For block counters, we scale up the block size sufficiently large so
that the resulting block counts fit into a the long-sized (long for the
ABI, so 32-bit in freebsd32) counters. In 4.x the NFS client's statfs
VOP did this already. This can lie about the block size to 4.x binaries,
but it presents a more accurate picture of the ratios of free and
available space.
- For non-block counters, fix the freebsd32 stats converter to cap the
values at INT32_MAX rather than losing the upper 32-bits to match the
behavior of the 4.x statfs conversion routine in vfs_syscalls.c

Approved by: re (kensmith)


171861 16-Aug-2007 davidxu

Regenerate.

Approved by: re(kensmith)


171860 16-Aug-2007 davidxu

Add thr_kill2 compat32 syscall.

Submitted by: Tijl Coosemans tijl at ulyssis dot org
Approved by: re (kensmith)


171215 04-Jul-2007 peter

Add compat6 wrapper code for mmap/lseek/pread/pwrite/truncate/ftruncate.

Approved by: re (kensmith)


171214 04-Jul-2007 peter

Regenerate after mmap/lseek/etc syscall changes

Approved by: re (kensmith)


171213 04-Jul-2007 peter

Add i386 emulation wrappers for mmap/lseek/etc. These use COMPAT6, so
you must use the already existing, already in generic, COMPAT_FREEBSD6
kernel option for running old 32 bit binaries.

Approved by: re (kensmith)


170870 17-Jun-2007 mjacob

Try a cheap way to get around gcc4.2 believing that user arguments
to system calls can change across intervening functions.


170795 15-Jun-2007 emaste

Remove stale 'XXX implement' comments for syscalls which have since been
implemented.


169901 23-May-2007 cognet

Remove duplicate includes.

Submitted by: Cyril Nguyen Huu <cyril ci0 org>


169181 01-May-2007 alc

Eliminate the use of Giant from ia64-specific code in freebsd32_mmap().


165406 20-Dec-2006 jkim

Regen.


165405 20-Dec-2006 jkim

MFP4: (part of) 110058

Fix 32-bit msgsnd(3) and msgrcv(3) emulations for amd64.


164199 11-Nov-2006 ru

Regen.

Forgotten by: trhodes


163961 03-Nov-2006 ru

Regen.


163960 03-Nov-2006 ru

Fix build breakage introduced in previous commit (redeclatation
of sctp functions).


163956 03-Nov-2006 rrs

This commits the remake in kern/ make sysent to get
the correct syscalls.master's $FreeBSD$ tag record and
a make sysent in sys/compat/freebsd32. Thanks Ruslan
for pointing out the steps I missed :-0
Approved by: gnn


163953 03-Nov-2006 rrs

Ok, here it is, we finally add SCTP to current. Note that this
work is not just mine, but it is also the works of Peter Lei
and Michael Tuexen. They both are my two key other developers
working on the project.. and they need ata-boy's too:
****
peterlei@cisco.com
tuexen@fh-muenster.de
****
I did do a make sysent which updated the
syscall's and sysproto.. I hope that is correct... without
it you don't build since we have new syscalls for SCTP :-0

So go out and look at the NOTES, add
option SCTP (make sure inet and inet6 are present too)
and play with SCTP.

I will see about comitting some test tools I have after I
figure out where I should place them. I also have a
lib (libsctp.a) that adds some of the missing socketapi
functions that I need to put into lib's.. I will talk
to George about this :-)

There may still be some 64 bit issues in here, none of
us have a 64 bit processor to test with yet.. Michael
may have a MAC but thats another beast too..

If you have a mac and want to use SCTP contact Michael
he maintains a web site with a loadable module with
this code :-)

Reviewed by: gnn
Approved by: gnn


163664 24-Oct-2006 sobomax

Regen.


163663 24-Oct-2006 sobomax

Fix kernel breakage introduced in the previous commit (redeclatation
of the audit functions).


163658 24-Oct-2006 rwatson

Regenerate.


163657 24-Oct-2006 rwatson

Hook up audit functions in the freebsd32 compatibility code. It is
believed these likely don't require wrappers.

Reported by: sobomax
MFC after: 3 days


163451 17-Oct-2006 davidxu

Regenerate.


163450 17-Oct-2006 davidxu

Sync with master.


163047 06-Oct-2006 davidxu

Regenerate.


163046 06-Oct-2006 davidxu

Implement 32bit umtx_lock and umtx_unlock system calls, these two system
calls are not used by libthr in RELENG_6 and HEAD, it is only used by
the libthr in RELENG-5, the _umtx_op system call can do more incremental
dirty works than these two system calls without having to introduce new
system calls or throw away old system calls when things are going on.


163020 05-Oct-2006 davidxu

Regenerate.


163019 05-Oct-2006 davidxu

Oops, add the missing file.


163018 05-Oct-2006 davidxu

Move some declaration of 32-bit signal structures into file
freebsd32-signal.h, implement sigtimedwait and sigwaitinfo system calls.


162993 03-Oct-2006 rwatson

Regenerate.


162992 03-Oct-2006 rwatson

Change getpagesize() system call audit event to more clearly indicate
that we don't audit it.

MFC after: 3 days
Obtained from: TrustedBSD Project


162954 02-Oct-2006 phk

First part of a little cleanup in the calendar/timezone/RTC handling.

Move relevant variables to <sys/clock.h> and fix #includes as necessary.

Use libkern's much more time- & spamce-efficient BCD routines.


162566 23-Sep-2006 davidxu

Regenerate.


162565 23-Sep-2006 davidxu

Enable sigwait.


162552 22-Sep-2006 davidxu

Regenerate.


162551 22-Sep-2006 davidxu

Add compatible code to let 32bit libthr work on 64bit kernel.


162537 22-Sep-2006 davidxu

Regenerate.


162536 22-Sep-2006 davidxu

Add umtx support for 32bit process on AMD64 machine.


162502 21-Sep-2006 davidxu

Regenerate.


162501 21-Sep-2006 davidxu

sync with master.


162374 17-Sep-2006 rwatson

Regenerate.


162373 17-Sep-2006 rwatson

AUE_SIGALTSTACK instead of AUE_SIGPENDING for sigaltstack().

Obtained from: TrustedBSD Project
MFC after: 3 days


162167 09-Sep-2006 davidxu

The following functions need not to be reimplemented, reuse 64bit
syscalls instead:
sigqueue, thr_set_name, thr_setscheduler, thr_getscheduler,
thr_setschedparam.


161960 03-Sep-2006 rwatson

Regenerate.


161958 03-Sep-2006 rwatson

Set freebsd32 system call event identifiers for:

- old truncate, ftruncate
- old getpeername, gethostid, sethostid, getrlimit, setrlimit, killpg.
- old quota, getsockname, getdirentries.
- lgetfh
- old getdomainname, setdomainname
- sysarch, rtprio, __getcwd, jail, sigtimedwait
- extattrctl, extattr_{get,set,delete,list}_{file,fd,link}
- getresgid, getresuid, kqueue, eaccess, nmount, sendfile
- fhstatfs, kldunloadf

Right identifiers for:

- nfssvc

Remove incorrect identifier for:

- __acl_get_file

Compile tested with help of: sam
Obtained from: TrustedBSD Project


161948 03-Sep-2006 rwatson

Regenerate. Looks like someone missed doing this previously as more than
just the audit event change appears in the diff.


161947 03-Sep-2006 rwatson

Use AUE_NTP_ADJTIME instead of AUE_ADJTIME for ntp_adjtime().

Obtained from: TrustedBSD Project


161425 17-Aug-2006 imp

while (0); -> while (0) in multi-line macros


161367 16-Aug-2006 peter

Grab two syscall numbers. One is used to emulate functionality that linux
has in its procfs (do a readlink of /proc/self/fd/<nn> to find the pathname
that corresponds to a given file descriptor). Valgrind-3.x needs this
functionality. This is a placeholder only at this time.


161343 16-Aug-2006 jkim

Include sys/limits.h for INT_MAX. freebsd32_proto.h 1.58 does not include
sys/umtx.h any more and previously it was included from there.


161330 15-Aug-2006 jhb

Regen to propogate <prefix>_AUE_<mumble> changes as well as the earlier
systrace changes.


161328 15-Aug-2006 jhb

- Remove unused sysvec variables from various syscalls.conf.
- Send the systrace_args files for all the compat ABIs to /dev/null for
now. Right now makesyscalls.sh generates a file with a hardcoded
function name, so it wouldn't work for any of the ABIs anyway. Probably
the function name should be configurable via a 'systracename' variable
and the functions should be stored in a function pointer in the sysvec
structure.


160799 28-Jul-2006 jhb

Regen for MPSAFE flag removal.


160798 28-Jul-2006 jhb

Now that all system calls are MPSAFE, retire the SYF_MPSAFE flag used to
mark system calls as being MPSAFE:
- Stop conditionally acquiring Giant around system call invocations.
- Remove all of the 'M' prefixes from the master system call files.
- Remove support for the 'M' prefix from the script that generates the
syscall-related files from the master system call files.
- Don't explicitly set SYF_MPSAFE when registering nfssvc.


160797 28-Jul-2006 jhb

Various fixes to comments in the syscall master files including removing
cruft from the audit import and adding mention of COMPAT4 to freebsd32.


160333 14-Jul-2006 davidxu

sync with master.


160249 10-Jul-2006 jhb

- Split out kern_accept(), kern_getpeername(), and kern_getsockname() for
use by ABI emulators.
- Alter the interface of kern_recvit() somewhat. Specifically, go ahead
and hard code UIO_USERSPACE in the uio as that's what all the callers
specify. In place, add a new uioseg to indicate what type of pointer
is in mp->msg_name. Previously it was always a userland address, but
ABI emulators may pass in kernel-side sockaddrs. Also, remove the
namelenp field and instead require the two places that used it to
explicitly copy mp->msg_namelen out to userland.
- Use the patched kern_recvit() to replace svr4_recvit() and the stock
kern_sendit() to replace svr4_sendit().
- Use kern_bind() instead of stackgap use in ti_bind().
- Use kern_getpeername() and kern_getsockname() instead of stackgap in
svr4_stream_ti_ioctl().
- Use kern_connect() instead of stackgap in svr4_do_putmsg().
- Use kern_getpeername() and kern_accept() instead of stackgap in
svr4_do_getmsg().
- Retire the stackgap from SVR4 compat as it is no longer used.


160246 10-Jul-2006 jhb

Unexpand PTRIN() in several places and fix one instance where 0 was being
used instead of NULL.


159983 27-Jun-2006 jhb

Regen.


159982 27-Jun-2006 jhb

- Expand the scope of Giant some in mount(2) to protect the vfsp structure
from going away. mount(2) is now MPSAFE.
- Expand the scope of Giant some in unmount(2) to protect the mp structure
(or rather, to handle concurrent unmount races) from going away.
umount(2) is now MPSAFE, as well as linux_umount() and linux_oldumount().
- nmount(2) and linux_mount() were already MPSAFE.


159961 26-Jun-2006 jhb

Regen.


159958 26-Jun-2006 jhb

- Sync with master: rmdir(), mkdir(), and extattr_*() are all MPSAFE.
- freebsd32_utimes() is MPSAFE.


159412 08-Jun-2006 ps

Do not copy out the iovec in the 32bit recvmsg call since soreceive
calls uiomove directly.

Reviewed by: ups
MFC after: 1 week


157286 30-Mar-2006 ps

regen for 32bit System V shared memory


157285 30-Mar-2006 ps

Properly support for FreeBSD 4 32bit System V shared memory.

Submitted by: peter
Obtained from: Yahoo!
MFC after: 3 weeks


156440 08-Mar-2006 ups

Fix exec_map resource leaks.

Tested by: kris@


156266 04-Mar-2006 ps

use strlcpy in cvtstatfs and copy_statfs instead of bcopy to ensure
the copied strings are properly terminated.

bzero the statfs32 struct in copy_statfs.


156115 28-Feb-2006 ps

regen for 32bit sendfile


156114 28-Feb-2006 ps

Fix 32bit sendfile by implementing kern_sendfile so that it takes
the header and trailers as iovec arguments instead of copying them
in inside of sendfile.

Reviewed by: jhb
MFC after: 3 weeks


155402 06-Feb-2006 jhb

- Always call exec_free_args() in kern_execve() instead of doing it in all
the callers if the exec either succeeds or fails early.
- Move the code to call exit1() if the exec fails after the vmspace is
gone to the bottom of kern_execve() to cut down on some code duplication.


155295 04-Feb-2006 rwatson

Regenerate.


155294 04-Feb-2006 rwatson

Audit FreeBSD 32-bit system calls on 64-bit FreeBSD systems.

Obtained from: TrustedBSD Project


154596 20-Jan-2006 ambrisko

Fix the build. When I added the lutimes the futimes definitions
went away in the generated files? This didn't happen on my amd64
test machine but did when I committed it on my other i386 machine.
I need to figure this out since a regen on the amd64 doesn't fix it
now. For now make the build work again. Matt caught this before
my local mirror caught up.


154587 20-Jan-2006 ambrisko

Regen.


154586 20-Jan-2006 ambrisko

Add 32bit version of lutimes so untar doesn't mess up sym-links on amd64.


153692 23-Dec-2005 ru

Regen.


153691 23-Dec-2005 ru

Fix build.


153681 23-Dec-2005 phk

Regenerate sysent with new abort2 system call.

Implement abort2(const char *reason, int narg, void **args);

Submitted by: "Wojciech A. Koszek" <dunstan@freebsd.czest.pl>


153680 23-Dec-2005 phk

Add missing 455-462 syscalls as unimplemented


153679 23-Dec-2005 phk

Add abort2() systemcall.


153248 08-Dec-2005 ambrisko

Regen for futimes.


153247 08-Dec-2005 ambrisko

Add 32bit version of futimes so untar doesn't result in bad dates
(Jan 1, 1970) when run on amd64.

Reviewed by: ps


152134 06-Nov-2005 ps

Copy out the number of iovecs in freebsd32_recvmsg, not the length
of a single iovec.


151909 31-Oct-2005 ps

Reformat socket control messages on input/output for 32bit compatibility
on 64bit systems.

Submitted by: ps, ups
Reviewed by: jhb


151721 26-Oct-2005 peter

Regenerate (with the correct #ifdef COMPAT_43 tests now)


151720 26-Oct-2005 peter

There is no 'freebsd3_' prefix for COMPAT_43 syscalls. Those are all
bundled under MCOMPAT and have an 'o' prefix. Adjust as appropriate.
This re-enables compiling without COMPAT_43 again.


151597 23-Oct-2005 obrien

Add a 'clean' target.


151583 23-Oct-2005 ps

regen


151582 23-Oct-2005 ps

Implement for FreeBSD 3 32 binaries:
sigaction, sigprocmask, sigpending, sigvec, sigblock, sigsetmask,
sigsuspend, sigstack


151360 15-Oct-2005 ps

regen after recvmsg, recvfrom, sendmsg


151359 15-Oct-2005 ps

Implement the 32bit versions of recvmsg, recvfrom, sendmsg

Partially obtained from: jhb


151358 15-Oct-2005 ps

regen for clock_gettime, clock_settime, clock_getres


151357 15-Oct-2005 ps

Implement 32bit wrappers for clock_gettime, clock_settime, and
clock_getres.


151356 15-Oct-2005 ps

regen


151355 15-Oct-2005 ps

Correct the prototype for freebsd32_nanosleep and use the proper
size when copying struct timespec32 in and out.


150883 03-Oct-2005 jhb

Use the constants for the syscall names from syscall.h rather than
hardcoding the numbers for the SYSVIPC syscalls.


150632 27-Sep-2005 peter

Regenerate


150631 27-Sep-2005 peter

Implement 32 bit getcontext/setcontext/swapcontext on amd64. I've added
stubs for ia64 to keep it compiling. These are used by 32 bit apps such
as gdb.


147975 13-Jul-2005 jhb

Regen.


147974 13-Jul-2005 jhb

Make a pass through all the compat ABIs sychronizing the MP safe flags
with the master syscall table as well as marking several ABI wrapper
functions safe.

MFC after: 1 week


147964 13-Jul-2005 jhb

Wrap the ia64-specific freebsd32_mmap_partial() hack in Giant for now
since it calls into VFS and VM. This makes the freebsd32_mmap() routine
MP safe and the extra Giants here can be revisited later.

Glanced at by: marcel
MFC after: 3 days


147814 07-Jul-2005 jhb

Regenerate.

Approved by: re (scottl)


147813 07-Jul-2005 jhb

- Add two new system calls: preadv() and pwritev() which are like readv()
and writev() except that they take an additional offset argument and do
not change the current file position. In SAT speak:
preadv:readv::pread:read and pwritev:writev::pwrite:write.
- Try to reduce code duplication some by merging most of the old
kern_foov() and dofilefoo() functions into new dofilefoo() functions
that are called by kern_foov() and kern_pfoov(). The non-v functions
now all generate a simple uio on the stack from the passed in arguments
and then call kern_foov(). For example, read() now just builds a uio and
calls kern_readv() and pwrite() just builds a uio and calls kern_pwritev().

PR: kern/80362
Submitted by: Marc Olzheim marcolz at stack dot nl (1)
Approved by: re (scottl)
MFC after: 1 week


147654 29-Jun-2005 jhb

- Change the commented out freebsd32_xxx() example to use kern_xxx() along
with a single copyin() + translate and translate + copyout() rather than
using the stackgap.
- Remove implementation of the stackgap for freebsd32 since it is no longer
used for that compat ABI.

Approved by: re (scottl)


147588 24-Jun-2005 jhb

Correct the amount of data to allocate in these local copies of
exec_copyin_strings() to catch up to rev 1.266 of kern_exec.c. This fixes
panics on amd64 with compat binaries since exec_free_args() was freeing
more memory than these functions were allocating and the mismatch could
cause memory to be freed out from under other concurrent execs.

Approved by: re (scottl)


147302 11-Jun-2005 pjd

Do not allocate memory based on not-checked argument from userland.
It can be used to panic the kernel by giving too big value.
Fix it by moving allocation and size verification into kern_getfsstat().
This even simplifies kern_getfsstat() consumers, but destroys symmetry -
memory is allocated inside kern_getfsstat(), but has to be freed by the
caller.

Found by: FreeBSD Kernel Stress Test Suite: http://www.holm.cc/stress/
Reported by: Peter Holm <peter@holm.cc>


147178 09-Jun-2005 pjd

Avoid code duplication in serval places by introducing universal
kern_getfsstat() function.

Obtained from: jhb


146950 03-Jun-2005 ps

Wrap copyin/copyout for kevent so the 32bit wrapper does not have
to malloc nchanges * sizeof(struct kevent) AND/OR nevents *
sizeof(struct kevent) on every syscall.

Glanced at by: peter, jmg
Obtained from: Yahoo!
MFC after: 2 weeks


146807 30-May-2005 rwatson

Rebuild generated system call definition files following the addition of
the audit event field to the syscalls.master file format.

Submitted by: wsalamon
Obtained from: TrustedBSD Project


146806 30-May-2005 rwatson

Introduce a new field in the syscalls.master file format to hold the
audit event identifier associated with each system call, which will
be stored by makesyscalls.sh in the sy_auevent field of struct sysent.
For now, default the audit identifier on all system calls to AUE_NULL,
but in the near future, other BSM event identifiers will be used. The
mapping of system calls to event identifiers is many:one due to
multiple system calls that map to the same end functionality across
compatibility wrappers, ABI wrappers, etc.

Submitted by: wsalamon
Obtained from: TrustedBSD Project


146583 24-May-2005 ps

Copyout to userland if kern_sigaction succeeds


144450 31-Mar-2005 jhb

- Use a custom version of copyinuio() to implement readv/writev using
kern_readv/writev.
- Use kern_settimeofday() and kern_adjtime() rather than stackgapping it.


142934 01-Mar-2005 ps

Use kern_kevent instead of the stackgap for 32bit syscall wrapping.

Submitted by: jhb
Tested on: amd64


142918 01-Mar-2005 ps

Ooops. I will compile test before committing. The stackgap version
of kevent32 will be going away shortly, so this is temporary until
I commit the non-stackgap version.


142874 01-Mar-2005 ps

Correct the freebsd32_kevent prototype.


142390 24-Feb-2005 jhb

Regen.


142389 24-Feb-2005 jhb

Use msync() to implement msync() for freebsd32 emulation. This isn't quite
right for certain MAP_FIXED mappings on ia64 but it will work fine for all
other mappings and works fine on amd64.

Requested by: ps, Christian Zander
MFC after: 1 week


142059 18-Feb-2005 jhb

- Add a custom version of exec_copyin_args() to deal with the 32-bit
pointers in argv and envv in userland and use that together with
kern_execve() and exec_free_args() to implement freebsd32_execve()
without using the stackgap.
- Fix freebsd32_adjtime() to call adjtime() rather than utimes(). Still
uses stackgap for now.
- Use kern_setitimer(), kern_getitimer(), kern_select(), kern_utimes(),
kern_statfs(), kern_fstatfs(), kern_fhstatfs(), kern_stat(),
kern_fstat(), and kern_lstat().

Tested by: cokane (amd64)
Silence on: amd64, ia64


140482 19-Jan-2005 ps

Add a 32bit syscall wrapper for modstat

Obtained from: Yahoo!


140481 19-Jan-2005 ps

- rename nanosleep1 to kern_nanosleep
- Add a 32bit syscall entry for nanosleep

Reviewed by: peter
Obtained from: Yahoo!


139682 04-Jan-2005 jhb

Regenerate.


139681 04-Jan-2005 jhb

Partial sync up to the master syscalls.master file:
- Mark mount, unmount and nmount MPSAFE.
- Add a stub for _umtx_op().
- Mark open(), link(), unlink(), and freebsd32_sigaction() MPSAFE.

Pointy hats to: several


138129 27-Nov-2004 das

Don't include sys/user.h merely for its side-effect of recursively
including other headers.


137877 18-Nov-2004 marks

Rebuild from compat/freebsd32/syscalls.master:1.43

Reviewed by: imp, phk, njl, peter
Approved by: njl


137876 18-Nov-2004 marks

32-bit FreeBSD ABI compatibility stubs from syscalls.master:1.179

Reviewed by: imp, phk, njl, peter
Approved by: njl


136834 23-Oct-2004 rwatson

Rebuild from FreeBSD32 syscalls.master:1.42.


136833 23-Oct-2004 rwatson

32-bit FreeBSD ABI compatibility stubs from syscalls.master:1.178.


136404 11-Oct-2004 peter

Put on my peril sensitive sunglasses and add a flags field to the internal
sysctl routines and state. Add some code to use it for signalling the need
to downconvert a data structure to 32 bits on a 64 bit OS when requested by
a 32 bit app.

I tried to do this in a generic abi wrapper that intercepted the sysctl
oid's, or looked up the format string etc, but it was a real can of worms
that turned into a fragile mess before I even got it partially working.

With this, we can now run 'sysctl -a' on a 32 bit sysctl binary and have
it not abort. Things like netstat, ps, etc have a long way to go.

This also fixes a bug in the kern.ps_strings and kern.usrstack hacks.
These do matter very much because they are used by libc_r and other things.


136192 06-Oct-2004 mtm

Close a race between a thread exiting and the freeing of it's stack.
After some discussion the best option seems to be to signal the thread's
death from within the kernel. This requires that thr_exit() take an
argument.

Discussed with: davidxu, deischen, marcel
MFC after: 3 days


136152 05-Oct-2004 jhb

Rework how we store process times in the kernel such that we always store
the raw values including for child process statistics and only compute the
system and user timevals on demand.

- Fix the various kern_wait() syscall wrappers to only pass in a rusage
pointer if they are going to use the result.
- Add a kern_getrusage() function for the ABI syscalls to use so that they
don't have to play stackgap games to call getrusage().
- Fix the svr4_sys_times() syscall to just call calcru() to calculate the
times it needs rather than calling getrusage() twice with associated
stackgap, etc.
- Add a new rusage_ext structure to store raw time stats such as tick counts
for user, system, and interrupt time as well as a bintime of the total
runtime. A new p_rux field in struct proc replaces the same inline fields
from struct proc (i.e. p_[isu]ticks, p_[isu]u, and p_runtime). A new p_crux
field in struct proc contains the "raw" child time usage statistics.
ruadd() has been changed to handle adding the associated rusage_ext
structures as well as the values in rusage. Effectively, the values in
rusage_ext replace the ru_utime and ru_stime values in struct rusage. These
two fields in struct rusage are no longer used in the kernel.
- calcru() has been split into a static worker function calcru1() that
calculates appropriate timevals for user and system time as well as updating
the rux_[isu]u fields of a passed in rusage_ext structure. calcru() uses a
copy of the process' p_rux structure to compute the timevals after updating
the runtime appropriately if any of the threads in that process are
currently executing. It also now only locks sched_lock internally while
doing the rux_runtime fixup. calcru() now only requires the caller to
hold the proc lock and calcru1() only requires the proc lock internally.
calcru() also no longer allows callers to ask for an interrupt timeval
since none of them actually did.
- calcru() now correctly handles threads executing on other CPUs.
- A new calccru() function computes the child system and user timevals by
calling calcru1() on p_crux. Note that this means that any code that wants
child times must now call this function rather than reading from p_cru
directly. This function also requires the proc lock.
- This finishes the locking for rusage and friends so some of the Giant locks
in exit1() and kern_wait() are now gone.
- The locking in ttyinfo() has been tweaked so that a shared lock of the
proctree lock is used to protect the process group rather than the process
group lock. By holding this lock until the end of the function we now
ensure that the process/thread that we pick to dump info about will no
longer vanish while we are trying to output its info to the console.

Submitted by: bde (mostly)
MFC after: 1 month


132127 14-Jul-2004 peter

Regen


132126 14-Jul-2004 peter

Unmapped syscalls should be NOPROTO so that we don't get a duplicate
prototype. (kldunloadf in this case)


132117 13-Jul-2004 phk

Give kldunload a -f(orce) argument.

Add a MOD_QUIESCE event for modules. This should return error (EBUSY)
of the module is in use.

MOD_UNLOAD should now only fail if it is impossible (as opposed to
inconvenient) to unload the module. Valid reasons are memory references
into the module which cannot be tracked down and eliminated.

When kldunloading, we abandon if MOD_UNLOAD fails, and if -force is
not given, MOD_QUIESCE failing will also prevent the unload.

For backwards compatibility, we treat EOPNOTSUPP from MOD_QUIESCE as
success.

Document that modules should return EOPNOTSUPP for unknown events.


132116 13-Jul-2004 phk

Add kldunloadf() system call. Stay tuned for follwing commit messages.


131431 02-Jul-2004 marcel

Change the thread ID (thr_id_t) used for 1:1 threading from being a
pointer to the corresponding struct thread to the thread ID (lwpid_t)
assigned to that thread. The primary reason for this change is that
libthr now internally uses the same ID as the debugger and the kernel
when referencing to a kernel thread. This allows us to implement the
support for debugging without additional translations and/or mappings.

To preserve the ABI, the 1:1 threading syscalls, including the umtx
locking API have not been changed to work on a lwpid_t. Instead the
1:1 threading syscalls operate on long and the umtx locking API has
not been changed except for the contested bit. Previously this was
the least significant bit. Now it's the most significant bit. Since
the contested bit should not be tested by userland, this change is
not expected to be visible. Just to be sure, UMTX_CONTESTED has been
removed from <sys/umtx.h>.

Reviewed by: mtm@
ABI preservation tested on: i386, ia64


131430 02-Jul-2004 marcel

Regen.


130640 17-Jun-2004 phk

Second half of the dev_t cleanup.

The big lines are:
NODEV -> NULL
NOUDEV -> NODEV
udev_t -> dev_t
udev2dev() -> findcdev()

Various minor adjustments including handling of userland access to kernel
space struct cdev etc.


128597 24-Apr-2004 marcel

Fix build for non-COMPAT_FREEBSD4 configurations. Make the FreeBSD 4
statfs functions conditional upon the option.


128261 14-Apr-2004 peter

Regen


128260 14-Apr-2004 peter

Catch up to the not-so-recent statfs(2) changes.


127484 27-Mar-2004 mtm

Regen for libthr thread synchronization syscalls.


127482 27-Mar-2004 mtm

Separate thread synchronization from signals in libthr. Instead
use msleep() and wakeup_one().

Discussed with: jhb, peter, tjr


127140 17-Mar-2004 jhb

- Replace wait1() with a kern_wait() function that accepts the pid,
options, status pointer and rusage pointer as arguments. It is up to
the caller to copyout the status and rusage to userland if needed. This
lets us axe the 'compat' argument and hide all that functionality in
owait(), by the way. This also cleans up some locking in kern_wait()
since it no longer has to drop locks around copyout() since all the
copyout()'s are deferred.
- Convert owait(), wait4(), and the various ABI compat wait() syscalls to
use kern_wait() rather than wait1() or wait4(). This removes a bit
more stackgap usage.

Tested on: i386
Compiled on: i386, alpha, amd64


126093 21-Feb-2004 peter

Regen (FWIW)


126092 21-Feb-2004 peter

Try and make the compat sigreturn prototypes closer to reality.


125371 03-Feb-2004 deischen

Regen.


125370 03-Feb-2004 deischen

Sync with kern/syscalls.master.


125171 28-Jan-2004 peter

Regen


125170 28-Jan-2004 peter

Add getitimer swab stub


123790 24-Dec-2003 peter

GC unused 'syshide' override to /dev/null. This was here to disable
the output of the namespc column. Its functionality was removed some time
ago, but the overrides and the namespc column remained.


123756 23-Dec-2003 peter

Regen (should be a NOP except for rcsid)


123755 23-Dec-2003 peter

GC unused namespc column.


123748 23-Dec-2003 peter

Regen


123747 23-Dec-2003 peter

freebsd32_fstat(2) is now MPSAFE


123746 23-Dec-2003 peter

Rather than screw around with the (unsafe) stackgap, call vn_stat/fo_stat
directly for stat/fstat/lstat syscall emulation. It turns out not only
safer, but the code is smaller this way too.


123745 23-Dec-2003 peter

Regen


123744 23-Dec-2003 peter

Eliminate stackgap usage for the (woefully incomplete) path translations
since it isn't needed here anymore.
Use standard open(2)/access(2) and chflags(2) syscalls now.


123427 11-Dec-2003 peter

regen


123426 11-Dec-2003 peter

Mark freebsd32_gettimeofday() as mpsafe


123425 11-Dec-2003 peter

Just implementing a 32 bit version of gettimeofday() was smaller than
the wrapper code. And it doesn't use the stackgap as a bonus.


123417 10-Dec-2003 peter

Regen


123416 10-Dec-2003 peter

Add missing extattr_list_fd(), extattr_list_file(), extattr_list_link()
and kse_switchin() syscall slots.


123415 10-Dec-2003 peter

The osigpending, oaccept, orecvfrom and ogetdirentries entries were
accidently being compiled in as standard. These are part of the
set of unimplemented COMPAT_43 syscall set.


122302 08-Nov-2003 peter

Regen


122301 08-Nov-2003 peter

"implement" vfork(). Add comments next to the other syscalls that need
to be implemented. This is enough to run i386 /bin/tcsh. /bin/sh is still
not happy because of some strange job control problem.


122253 07-Nov-2003 peter

Dont write to the stackgap directly in execve().


122245 07-Nov-2003 jhb

Regen.


122244 07-Nov-2003 jhb

Sync with global syscalls.master by marking ptrace(), dup(), pipe(),
ktrace(), freebsd32_sigaltstack(), sysarch(), issetugid(), utrace(), and
freebsd32_sigaction() as MP safe.


121719 30-Oct-2003 peter

Add CTASSERT()'s to check that the sizes of our replicas of the 32 bit
structures come out the right size.

Fix the ones that broke. stat32 had some missing fields from the end
and statfs32 was broken due to the strange definition of MNAMELEN
(which is dependent on sizeof(long))

I'm not sure if this fixes any actual problems or not.


119336 23-Aug-2003 peter

Switch to using the emulator in the common compat area.
Still work-in-progress.


119333 22-Aug-2003 peter

Initial sweep to de-i386-ify this


119332 22-Aug-2003 peter

Regen


119331 22-Aug-2003 peter

Begin attempting to consolidate the two different i386 emulations
on ia64 and amd64. I'm attempting to keep the generic 32bit-on-64bit
binary support seperate from the i386 support and the MD backend support.


119194 21-Aug-2003 peter

Regen


119193 21-Aug-2003 peter

This is too funny for words. Swap syscalls 416 and 417 around. It works
better that way when sigaction() and sigreturn() do the right thing.


118031 25-Jul-2003 obrien

Use __FBSDID().

Brought to you by: a boring talk at Ottawa Linux Symposium


115430 31-May-2003 peter

Regenerate.


115429 31-May-2003 peter

Make this compile with WITNESS enabled. It wants the syscall names.


115252 23-May-2003 peter

Deal with the user VM space expanding. 32 bit applications do not like
having their stack at the 512GB mark. Give 4GB of user VM space for 32
bit apps. Note that this is significantly more than on i386 which gives
only about 2.9GB of user VM to a process (1GB for kernel, plus page
table pages which eat user VM space).

Approved by: re (blanket)


114988 14-May-2003 peter

Regen

Approved by: re (amd64 blanket)


114987 14-May-2003 peter

Add BASIC i386 binary support for the amd64 kernel. This is largely
stolen from the ia64/ia32 code (indeed there was a repocopy), but I've
redone the MD parts and added and fixed a few essential syscalls. It
is sufficient to run i386 binaries like /bin/ls, /usr/bin/id (dynamic)
and p4. The ia64 code has not implemented signal delivery, so I had
to do that.

Before you say it, yes, this does need to go in a common place. But
we're in a freeze at the moment and I didn't want to risk breaking ia64.
I will sort this out after the freeze so that the common code is in a
common place.

On the AMD64 side, this required adding segment selector context switch
support and some other support infrastructure. The %fs/%gs etc code
is hairy because loading %gs will clobber the kernel's current MSR_GSBASE
setting. The segment selectors are not used by the kernel, so they're only
changed at context switch time or when changing modes. This still needs
to be optimized.

Approved by: re (amd64/* blanket)


114017 25-Apr-2003 jhb

Regen.


114016 25-Apr-2003 jhb

Oops, the thr_* and jail_attach() syscall entries should be NOPROTO rather
than STD.


113989 24-Apr-2003 jhb

Regen.


113987 24-Apr-2003 jhb

Fix the thr_create() entry by adding a trailing \. Also, sync up the
MP safe flag for thr_* with the main table.


113859 22-Apr-2003 jhb

- Replace inline implementations of sigprocmask() with calls to
kern_sigprocmask() in the various binary compatibility emulators.
- Replace calls to sigsuspend(), sigaltstack(), sigaction(), and
sigprocmask() that used the stackgap with calls to the corresponding
kern_sig*() functions instead without using the stackgap.


113275 09-Apr-2003 mike

o In struct prison, add an allprison linked list of prisons (protected
by allprison_mtx), a unique prison/jail identifier field, two path
fields (pr_path for reporting and pr_root vnode instance) to store
the chroot() point of each jail.
o Add jail_attach(2) to allow a process to bind to an existing jail.
o Add change_root() to perform the chroot operation on a specified
vnode.
o Generalize change_dir() to accept a vnode, and move namei() calls
to callers of change_dir().
o Add a new sysctl (security.jail.list) which is a group of
struct xprison instances that represent a snapshot of active jails.

Reviewed by: rwatson, tjr


112908 01-Apr-2003 jeff

- Add thr and umtx system calls.


112896 31-Mar-2003 jeff

- Add a placeholder for sigwait


111119 19-Feb-2003 imp

Back out M_* changes, per decision of the TRB.

Approved by: trb


111002 16-Feb-2003 phk

Remove #include <sys/dkstat.h>


109623 21-Jan-2003 alfred

Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0.
Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.


108409 29-Dec-2002 rwatson

Synchronize to kern/syscalls.master:1.139.

Obtained from: TrustedBSD Project


107923 16-Dec-2002 marcel

Regen: swapoff


107922 16-Dec-2002 marcel

Change swapoff from MNOPROTO to UNIMPL. The former doesn't work.


107913 15-Dec-2002 dillon

This is David Schultz's swapoff code which I am finally able to commit.
This should be considered highly experimental for the moment.

Submitted by: David Schultz <dschultz@uclink.Berkeley.EDU>
MFC after: 3 weeks


107849 14-Dec-2002 alfred

SCARGS removal take II.


107839 13-Dec-2002 alfred

Backout removal SCARGS, the code freeze is only "selectively" over.


107838 13-Dec-2002 alfred

Remove SCARGS.

Reviewed by: md5


106993 16-Nov-2002 deischen

Regenerate after adding syscalls.


106989 16-Nov-2002 deischen

Add *context() syscalls to ia64 32-bit compatability table as requested
in kern/syscalls.master.


106364 02-Nov-2002 rwatson

Sync to src/sys/kern/syscalls.master


105490 19-Oct-2002 peter

Stake a claim on 418 (__xstat), 419 (__xfstat), 420 (__xlstat)


105486 19-Oct-2002 peter

Grab 416/417 real estate before I get burned while testing again.
This is for the not-quite-ready signal/fpu abi stuff. It may not see
the light of day, but I'm certainly not going to be able to validate it
when getting shot in the foot due to syscall number conflicts.


105476 19-Oct-2002 rwatson

Add a placeholder for the execve_mac() system call, similar to SELinux's
execve_secure() system call, which permits a process to pass in a label
for a label change during exec. This permits SELinux to change the
label for the resulting exec without a race following a manual label
change on the process. Because this interface uses our general purpose
MAC label abstraction, we call it execve_mac(), and wrap our port of
SELinux's execve_secure() around it with appropriate sid mappings.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories


104741 09-Oct-2002 peter

re-regen. Sigh.


104740 09-Oct-2002 peter

Sigh. Fix fat-fingering of diff. I knew this was going to happen.


104739 09-Oct-2002 peter

regenerate. sendfile stuff and other recently picked up stubs.


104738 09-Oct-2002 peter

Try and deal with the #ifdef COMPAT_FREEBSD4 sendfile stuff. This would
have been a lot easier if do_sendfile() was usable externally.


104736 09-Oct-2002 peter

Try and patch up some tab-to-space spammage.


104735 09-Oct-2002 peter

Add placeholder stubs for nsendfile, mac_syscall, ksem_close, ksem_post,
ksem_wait, ksem_trywait, ksem_init, ksem_open, ksem_unlink, ksem_getvalue,
ksem_destroy, __mac_get_pid, __mac_get_link, __mac_set_link,
extattr_set_link, extattr_get_link, extattr_delete_link.


104379 02-Oct-2002 archie

Let kse_wakeup() take a KSE mailbox pointer argument.

Reviewed by: julian


103972 25-Sep-2002 archie

Make the following name changes to KSE related functions, etc., to better
represent their purpose and minimize namespace conflicts:

kse_fn_t -> kse_func_t
struct thread_mailbox -> struct kse_thr_mailbox
thread_interrupt() -> kse_thr_interrupt()
kse_yield() -> kse_release()
kse_new() -> kse_create()

Add missing declaration of kse_thr_interrupt() to <sys/kse.h>.
Regenerate the various generated syscall files. Minor style fixes.

Reviewed by: julian


100385 20-Jul-2002 peter

Regenerate


100384 20-Jul-2002 peter

Infrastructure tweaks to allow having both an Elf32 and an Elf64 executable
handler in the kernel at the same time. Also, allow for the
exec_new_vmspace() code to build a different sized vmspace depending on
the executable environment. This is a big help for execing i386 binaries
on ia64. The ELF exec code grows the ability to map partial pages when
there is a page size difference, eg: emulating 4K pages on 8K or 16K
hardware pages.

Flesh out the i386 emulation support for ia64. At this point, the only
binary that I know of that fails is cvsup, because the cvsup runtime
tries to execute code in pages not marked executable.

Obtained from: dfr (mostly, many tweaks from me).


94380 10-Apr-2002 dfr

Initial support for executing IA-32 binaries. This will not compile
without a few patches for the rest of the kernel to allow the image
activator to override exec_copyout_strings and setregs.

None of the syscall argument translation has been done. Possibly, this
translation layer can be shared with any platform that wants to support
running ILP32 binaries on an LP64 host (e.g. sparc32 binaries?)