History log of /freebsd-11-stable/sys/kern/sysv_shm.c
Revision Date Author Comments
(<<< Hide modified files)
(Show modified files >>>)
# 367601 11-Nov-2020 brooks

MFC r367302:

sysvshm: pass relevant uap members as arguments

Alter shmget_allocate_segment and shmget_existing to take the values
they want from struct shmget_args rather than passing the struct
around. In general, uap structures should only be the interface to
sys_<foo> functions.

This makes one small functional change and records the allocated space
rather than the requested space. If this turns out to be a problem (e.g.
if software tries to find undersized segments by exact size rather than
using keys), we can correct that easily.

Reviewed by: kib
Obtained from: CheriBSD
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D27077


# 360451 28-Apr-2020 brooks

MFC r359937:

Centralize compatability translation macros.

Copy the CP, PTRIN, etc macros from freebsd32.h into a sys/abi_compat.h
and replace existing definitation with includes where required. This
eliminates duplicate code and allows Linux and FreeBSD compatability
headers to be included in the same files.

Obtained from: CheriBSD
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D24275


# 343426 25-Jan-2019 kib

MFC r343082:
Implement shmat(2) flag SHM_REMAP.


# 331922 03-Apr-2018 kib

MFC r331640:
Fix several leaks of kernel stack data through paddings.


# 329739 21-Feb-2018 brooks

MFC r329525:

Correct/improve the descriptions if kern.ipc.(shmsegs,sema,msqids).

The description of kern.ipc.shmsegs was wrong since 2005. I updated the
others (which were more correct) to match.

PR: 225933
Reviewed by: cem
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D14391


# 329177 12-Feb-2018 brooks

MFC r328799:

Add kern.ipc.{msqids,semsegs,sema} sysctls for FreeBSD32.

Stop leaking kernel pointers though theses sysctls and make sure that the
padding in the structures is zeroed on allocation to avoid other leaks.

Reviewed by: gordon, kib
Obtained from: CheriBSD
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D13459


# 302408 07-Jul-2016 gjb

Copy head@r302406 to stable/11 as part of the 11.0-RELEASE cycle.
Prune svn:mergeinfo from the new branch, as nothing has been merged
here.

Additional commits post-branch will follow.

Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation


/freebsd-11-stable/MAINTAINERS
/freebsd-11-stable/cddl
/freebsd-11-stable/cddl/contrib/opensolaris
/freebsd-11-stable/cddl/contrib/opensolaris/cmd/dtrace/test/tst/common/print
/freebsd-11-stable/cddl/contrib/opensolaris/cmd/zfs
/freebsd-11-stable/cddl/contrib/opensolaris/lib/libzfs
/freebsd-11-stable/contrib/amd
/freebsd-11-stable/contrib/apr
/freebsd-11-stable/contrib/apr-util
/freebsd-11-stable/contrib/atf
/freebsd-11-stable/contrib/binutils
/freebsd-11-stable/contrib/bmake
/freebsd-11-stable/contrib/byacc
/freebsd-11-stable/contrib/bzip2
/freebsd-11-stable/contrib/com_err
/freebsd-11-stable/contrib/compiler-rt
/freebsd-11-stable/contrib/dialog
/freebsd-11-stable/contrib/dma
/freebsd-11-stable/contrib/dtc
/freebsd-11-stable/contrib/ee
/freebsd-11-stable/contrib/elftoolchain
/freebsd-11-stable/contrib/elftoolchain/ar
/freebsd-11-stable/contrib/elftoolchain/brandelf
/freebsd-11-stable/contrib/elftoolchain/elfdump
/freebsd-11-stable/contrib/expat
/freebsd-11-stable/contrib/file
/freebsd-11-stable/contrib/gcc
/freebsd-11-stable/contrib/gcclibs/libgomp
/freebsd-11-stable/contrib/gdb
/freebsd-11-stable/contrib/gdtoa
/freebsd-11-stable/contrib/groff
/freebsd-11-stable/contrib/ipfilter
/freebsd-11-stable/contrib/ldns
/freebsd-11-stable/contrib/ldns-host
/freebsd-11-stable/contrib/less
/freebsd-11-stable/contrib/libarchive
/freebsd-11-stable/contrib/libarchive/cpio
/freebsd-11-stable/contrib/libarchive/libarchive
/freebsd-11-stable/contrib/libarchive/libarchive_fe
/freebsd-11-stable/contrib/libarchive/tar
/freebsd-11-stable/contrib/libc++
/freebsd-11-stable/contrib/libc-vis
/freebsd-11-stable/contrib/libcxxrt
/freebsd-11-stable/contrib/libexecinfo
/freebsd-11-stable/contrib/libpcap
/freebsd-11-stable/contrib/libstdc++
/freebsd-11-stable/contrib/libucl
/freebsd-11-stable/contrib/libxo
/freebsd-11-stable/contrib/llvm
/freebsd-11-stable/contrib/llvm/projects/libunwind
/freebsd-11-stable/contrib/llvm/tools/clang
/freebsd-11-stable/contrib/llvm/tools/lldb
/freebsd-11-stable/contrib/llvm/tools/llvm-dwarfdump
/freebsd-11-stable/contrib/llvm/tools/llvm-lto
/freebsd-11-stable/contrib/mdocml
/freebsd-11-stable/contrib/mtree
/freebsd-11-stable/contrib/ncurses
/freebsd-11-stable/contrib/netcat
/freebsd-11-stable/contrib/ntp
/freebsd-11-stable/contrib/nvi
/freebsd-11-stable/contrib/one-true-awk
/freebsd-11-stable/contrib/openbsm
/freebsd-11-stable/contrib/openpam
/freebsd-11-stable/contrib/openresolv
/freebsd-11-stable/contrib/pf
/freebsd-11-stable/contrib/sendmail
/freebsd-11-stable/contrib/serf
/freebsd-11-stable/contrib/sqlite3
/freebsd-11-stable/contrib/subversion
/freebsd-11-stable/contrib/tcpdump
/freebsd-11-stable/contrib/tcsh
/freebsd-11-stable/contrib/tnftp
/freebsd-11-stable/contrib/top
/freebsd-11-stable/contrib/top/install-sh
/freebsd-11-stable/contrib/tzcode/stdtime
/freebsd-11-stable/contrib/tzcode/zic
/freebsd-11-stable/contrib/tzdata
/freebsd-11-stable/contrib/unbound
/freebsd-11-stable/contrib/vis
/freebsd-11-stable/contrib/wpa
/freebsd-11-stable/contrib/xz
/freebsd-11-stable/crypto/heimdal
/freebsd-11-stable/crypto/openssh
/freebsd-11-stable/crypto/openssl
/freebsd-11-stable/gnu/lib
/freebsd-11-stable/gnu/usr.bin/binutils
/freebsd-11-stable/gnu/usr.bin/cc/cc_tools
/freebsd-11-stable/gnu/usr.bin/gdb
/freebsd-11-stable/lib/libc/locale/ascii.c
/freebsd-11-stable/sys/cddl/contrib/opensolaris
/freebsd-11-stable/sys/contrib/dev/acpica
/freebsd-11-stable/sys/contrib/ipfilter
/freebsd-11-stable/sys/contrib/libfdt
/freebsd-11-stable/sys/contrib/octeon-sdk
/freebsd-11-stable/sys/contrib/x86emu
/freebsd-11-stable/sys/contrib/xz-embedded
/freebsd-11-stable/usr.sbin/bhyve/atkbdc.h
/freebsd-11-stable/usr.sbin/bhyve/bhyvegc.c
/freebsd-11-stable/usr.sbin/bhyve/bhyvegc.h
/freebsd-11-stable/usr.sbin/bhyve/console.c
/freebsd-11-stable/usr.sbin/bhyve/console.h
/freebsd-11-stable/usr.sbin/bhyve/pci_fbuf.c
/freebsd-11-stable/usr.sbin/bhyve/pci_xhci.c
/freebsd-11-stable/usr.sbin/bhyve/pci_xhci.h
/freebsd-11-stable/usr.sbin/bhyve/ps2kbd.c
/freebsd-11-stable/usr.sbin/bhyve/ps2kbd.h
/freebsd-11-stable/usr.sbin/bhyve/ps2mouse.c
/freebsd-11-stable/usr.sbin/bhyve/ps2mouse.h
/freebsd-11-stable/usr.sbin/bhyve/rfb.c
/freebsd-11-stable/usr.sbin/bhyve/rfb.h
/freebsd-11-stable/usr.sbin/bhyve/sockstream.c
/freebsd-11-stable/usr.sbin/bhyve/sockstream.h
/freebsd-11-stable/usr.sbin/bhyve/usb_emul.c
/freebsd-11-stable/usr.sbin/bhyve/usb_emul.h
/freebsd-11-stable/usr.sbin/bhyve/usb_mouse.c
/freebsd-11-stable/usr.sbin/bhyve/vga.c
/freebsd-11-stable/usr.sbin/bhyve/vga.h
# 298661 26-Apr-2016 cem

osd(9): Change array pointer to array pointer type from void*

This is a minor follow-up to r297422, prompted by a Coverity warning. (It's
not a real defect, just a code smell.) OSD slot array reservations are an
array of pointers (void **) but were cast to void* and back unnecessarily.
Keep the correct type from reservation to use.

osd.9 is updated to match, along with a few trivial igor fixes.

Reported by: Coverity
CID: 1353811
Sponsored by: EMC / Isilon Storage Division


# 298656 26-Apr-2016 jamie

Redo the changes to the SYSV IPC sysctl functions from r298585, so they
don't (mis)use sbufs.

PR: 48471


# 298597 25-Apr-2016 jamie

Fix the logic in r298585: shm_prison_cansee returns an errno, so is
the opposite of a boolean.

PR: 48471


# 298585 25-Apr-2016 jamie

Encapsulate SYSV IPC objects in jails. Define per-module parameters
sysvmsg, sysvsem, and sysvshm, with the following bahavior:

inherit: allow full access to the IPC primitives. This is the same as
the current setup with allow.sysvipc is on. Jails and the base system
can see (and moduly) each other's objects, which is generally considered
a bad thing (though may be useful in some circumstances).

disable: all no access, same as the current setup with allow.sysvipc off.

new: A jail may see use the IPC objects that it has created. It also
gets its own IPC key namespace, so different jails may have their own
objects using the same key value. The parent jail (or base system) can
see the jail's IPC objects, but not its keys.

PR: 48471
Submitted by: based on work by kikuchan98@gmail.com
MFC after: 5 days


# 298433 21-Apr-2016 pfg

sys: use our roundup2/rounddown2() macros when param.h is available.

rounddown2 tends to produce longer lines than the original code
and when the code has a high indentation level it was not really
advantageous to do the replacement.

This tries to strike a balance between readability using the macros
and flexibility of having the expressions, so not everything is
converted.


# 289112 10-Oct-2015 trasz

Change the default setting of kern.ipc.shm_allow_removed from 0 to 1.

This removes the need for manually changing this flag for Google Chrome
users. It also improves compatibility with Linux applications running under
Linuxulator compatibility layer, and possibly also helps in porting software
from Linux.

Generally speaking, the flag allows applications to create the shared memory
segment, attach it, remove it, and then continue to use it and to reattach it
later. This means that the kernel will automatically "clean up" after the
application exits.

It could be argued that it's against POSIX. However, SUSv3 says this
about IPC_RMID: "Remove the shared memory identifier specified by shmid from
the system and destroy the shared memory segment and shmid_ds data structure
associated with it." From my reading, we break it in any case by deferring
removal of the segment until it's detached; we won't break it any more
by also deferring removal of the identifier.

This is the behaviour exhibited by Linux since... probably always, and
also by OpenBSD since the following commit:

revision 1.54
date: 2011/10/27 07:56:28; author: robert; state: Exp; lines: +3 -8;
Allow segments to be used even after they were marked for deletion with
the IPC_RMID flag.
This is permitted as an extension beyond the standards and this is similar
to what other operating systems like linux do.

MFC after: 1 month
Relnotes: yes
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D3603


# 285057 02-Jul-2015 mjg

sysvshm: fix up some whitespace issues and spurious initialisation


# 285056 02-Jul-2015 mjg

sysvshm: don't lock proc when calculating attach_va

vm_daddr is constant and RLIMIT_DATA can be obtained from thread's copy of
rlimits.


# 285055 02-Jul-2015 mjg

sysvshm: fix shmrealloc

The code was supposed to initialize new segs in newsegs array, but used the old
pointer.


# 284215 10-Jun-2015 mjg

Implement lockless resource limits.

Use the same scheme implemented to manage credentials.

Code needing to look at process's credentials (as opposed to thred's) is
provided with *_proc variants of relevant functions.

Places which possibly had to take the proc lock anyway still use the proc
pointer to access limits.


# 282213 29-Apr-2015 trasz

Add kern.racct.enable tunable and RACCT_DISABLED config option.
The point of this is to be able to add RACCT (with RACCT_DISABLED)
to GENERIC, to avoid having to rebuild the kernel to use rctl(8).

Differential Revision: https://reviews.freebsd.org/D2369
Reviewed by: kib@
MFC after: 1 month
Relnotes: yes
Sponsored by: The FreeBSD Foundation


# 282084 27-Apr-2015 kib

Fix locking for oshmctl() and shmsys().

Reported and tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week


# 281094 04-Apr-2015 kib

Restore proper error from oshmctl(2), used by COMPAT_43, when the
segment cannot be found. Broken by r280323.

Sponsored by: The FreeBSD Foundation
MFC after: 3 days


# 281071 04-Apr-2015 kib

Remove useless initialization.

Sponsored by: The FreeBSD Foundation
MFC after: 3 days


# 280325 21-Mar-2015 cognet

error is only used if MAC is defined, so make its declaration conditional
as well.


# 280323 21-Mar-2015 kib

Somewhat modernize the SysV shm code:
- Use real locking, replace Giant with global sx protecting the
subsystem. Since the subsystem' lock is no longer dropped during
the sleepsk, remove not needed SHMSEG_WANTED segment flag, and
revert r278963.
- To do proper code simplification possible after the change of the
lock, restructure several functions into _locked body and
originally-named wrapper which calls into _locked variant. This
allows to eliminate the 'goto done2' spread over the code.
- Merge shm_find_segment_by_shmid() and shm_find_segment_by_shmidx().
- Consistently change all function prototypes to ANSI C.

Reviewed by: mjg (who has earlier version of the similar patch to
introduce real locking)
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks


# 278963 18-Feb-2015 kib

If malloc() sleeps, Giant is dropped. Recheck for another thread
doing our work.

Remove unneeded check for failed M_WAITOK allocation.

Reviewed by: alc
Sponsored by: The FreeBSD Foundation
MFC after: 1 week


# 278697 13-Feb-2015 alc

Preset the object's color, or alignment, to maximize superpage usage.

MFC after: 5 days


# 273707 26-Oct-2014 mjg

Avoid dynamic syscall overhead for statically compiled modules.

The kernel tracks syscall users so that modules can safely unregister them.

But if the module is not unloadable or was compiled into the kernel, there is
no need to do this.

Achieve this by adding SY_THR_STATIC_KLD macro which expands to SY_THR_STATIC
during kernel build and 0 otherwise.

Reviewed by: kib (previous version)
MFC after: 2 weeks


# 270883 31-Aug-2014 alc

Automatically prefault a limited number of mappings to resident pages in
shmat(2), just like mmap(2).

MFC after: 5 days
Sponsored by: EMC / Isilon Storage Division


# 267992 28-Jun-2014 hselasky

Pull in r267961 and r267973 again. Fix for issues reported will follow.


# 267985 27-Jun-2014 gjb

Revert r267961, r267973:

These changes prevent sysctl(8) from returning proper output,
such as:

1) no output from sysctl(8)
2) erroneously returning ENOMEM with tools like truss(1)
or uname(1)
truss: can not get etype: Cannot allocate memory


# 267961 27-Jun-2014 hselasky

Extend the meaning of the CTLFLAG_TUN flag to automatically check if
there is an environment variable which shall initialize the SYSCTL
during early boot. This works for all SYSCTL types both statically and
dynamically created ones, except for the SYSCTL NODE type and SYSCTLs
which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to
be used in the case a tunable sysctl has a custom initialisation
function allowing the sysctl to still be marked as a tunable. The
kernel SYSCTL API is mostly the same, with a few exceptions for some
special operations like iterating childrens of a static/extern SYSCTL
node. This operation should probably be made into a factored out
common macro, hence some device drivers use this. The reason for
changing the SYSCTL API was the need for a SYSCTL parent OID pointer
and not only the SYSCTL parent OID list pointer in order to quickly
generate the sysctl path. The motivation behind this patch is to avoid
parameter loading cludges inside the OFED driver subsystem. Instead of
adding special code to the OFED driver subsystem to post-load tunables
into dynamically created sysctls, we generalize this in the kernel.

Other changes:
- Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask"
to "hw.pcic.intr_mask".
- Removed redundant TUNABLE statements throughout the kernel.
- Some minor code rewrites in connection to removing not needed
TUNABLE statements.
- Added a missing SYSCTL_DECL().
- Wrapped two very long lines.
- Avoid malloc()/free() inside sysctl string handling, in case it is
called to initialize a sysctl from a tunable, hence malloc()/free() is
not ready when sysctls from the sysctl dataset are registered.
- Bumped FreeBSD version to indicate SYSCTL API change.

MFC after: 2 weeks
Sponsored by: Mellanox Technologies


# 258056 12-Nov-2013 alc

Eliminate the gratuitous use of mmap(2) flags from the implementation
of kern_shmat(). Use a simpler approach to determine whether to pass
VMFS_NO_SPACE or VMFS_OPTIMAL_SPACE to vm_map_find().


# 255426 09-Sep-2013 jhb

Add a mmap flag (MAP_32BIT) on 64-bit platforms to request that a mapping use
an address in the first 2GB of the process's address space. This flag should
have the same semantics as the same flag on Linux.

To facilitate this, add a new parameter to vm_map_find() that specifies an
optional maximum virtual address. While here, fix several callers of
vm_map_find() to use a VMFS_* constant for the findspace argument instead of
TRUE and FALSE.

Reviewed by: alc
Approved by: re (kib)


# 253471 19-Jul-2013 jhb

Be more aggressive in using superpages in all mappings of objects:
- Add a new address space allocation method (VMFS_OPTIMAL_SPACE) for
vm_map_find() that will try to alter the alignment of a mapping to match
any existing superpage mappings of the object being mapped. If no
suitable address range is found with the necessary alignment,
vm_map_find() will fall back to using the simple first-fit strategy
(VMFS_ANY_SPACE).
- Change mmap() without MAP_FIXED, shmat(), and the GEM mapping ioctl to
use VMFS_OPTIMAL_SPACE instead of VMFS_ANY_SPACE.

Reviewed by: alc (earlier version)
MFC after: 2 weeks


# 248084 09-Mar-2013 attilio

Switch the vm_object mutex to be a rwlock. This will enable in the
future further optimizations where the vm_object lock will be held
in read mode most of the time the page cache resident pool of pages
are accessed for reading purposes.

The change is mostly mechanical but few notes are reported:
* The KPI changes as follow:
- VM_OBJECT_LOCK() -> VM_OBJECT_WLOCK()
- VM_OBJECT_TRYLOCK() -> VM_OBJECT_TRYWLOCK()
- VM_OBJECT_UNLOCK() -> VM_OBJECT_WUNLOCK()
- VM_OBJECT_LOCK_ASSERT(MA_OWNED) -> VM_OBJECT_ASSERT_WLOCKED()
(in order to avoid visibility of implementation details)
- The read-mode operations are added:
VM_OBJECT_RLOCK(), VM_OBJECT_TRYRLOCK(), VM_OBJECT_RUNLOCK(),
VM_OBJECT_ASSERT_RLOCKED(), VM_OBJECT_ASSERT_LOCKED()
* The vm/vm_pager.h namespace pollution avoidance (forcing requiring
sys/mutex.h in consumers directly to cater its inlining functions
using VM_OBJECT_LOCK()) imposes that all the vm/vm_pager.h
consumers now must include also sys/rwlock.h.
* zfs requires a quite convoluted fix to include FreeBSD rwlocks into
the compat layer because the name clash between FreeBSD and solaris
versions must be avoided.
At this purpose zfs redefines the vm_object locking functions
directly, isolating the FreeBSD components in specific compat stubs.

The KPI results heavilly broken by this commit. Thirdy part ports must
be updated accordingly (I can think off-hand of VirtualBox, for example).

Sponsored by: EMC / Isilon storage division
Reviewed by: jeff
Reviewed by: pjd (ZFS specific review)
Discussed with: alc
Tested by: pho


# 231904 18-Feb-2012 alc

Close a race due to dropping of the map lock between creating a map entry
for a shared mapping and marking the entry for inheritance.

Reviewed by: kib
X-MFC after: r231526


# 231195 08-Feb-2012 pjd

Allow to set kern.ipc.shmmax from /boot/loader.conf.

MFC after: 1 week


# 225617 16-Sep-2011 kmacy

In order to maximize the re-usability of kernel code in user space this
patch modifies makesyscalls.sh to prefix all of the non-compatibility
calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel
entry points and all places in the code that use them. It also
fixes an additional name space collision between the kernel function
psignal and the libc function of the same name by renaming the kernel
psignal kern_psignal(). By introducing this change now we will ease future
MFCs that change syscalls.

Reviewed by: rwatson
Approved by: re (bz)


# 223825 06-Jul-2011 trasz

All the racct_*() calls need to happen with the proc locked. Fixing this
won't happen before 9.0. This commit adds "#ifdef RACCT" around all the
"PROC_LOCK(p); racct_whatever(p, ...); PROC_UNLOCK(p)" instances, in order
to avoid useless locking/unlocking in kernels built without "options RACCT".


# 220399 06-Apr-2011 trasz

Style fix.

Submitted by: jhb@


# 220398 06-Apr-2011 trasz

Add accounting for SysV-related resources.

Sponsored by: The FreeBSD Foundation
Reviewed by: kib (earlier version)


# 220388 06-Apr-2011 trasz

Add ucred pointer to the SysV-related memory structures. This is required
for racct.

Note that after this commit, ipcs(1) needs to be rebuilt. Otherwise, it will
fail with "ipcs: sysctlbyname: kern.ipc.msqids: Cannot allocate memory".

Sponsored by: The FreeBSD Foundation
Reviewed by: kib (earlier version)


# 219028 25-Feb-2011 netchild

Add some FEATURE macros for various features (AUDIT/CAM/IPC/KTR/MAC/NFS/NTP/
PMC/SYSV/...).

No FreeBSD version bump, the userland application to query the features will
be committed last and can serve as an indication of the availablility if
needed.

Sponsored by: Google Summer of Code 2010
Submitted by: kibab
Reviewed by: arch@ (parts by rwatson, trasz, jhb)
X-MFC after: to be determined in last commit with code from this project


# 217555 18-Jan-2011 mdf

Specify a CTLTYPE_FOO so that a future sysctl(8) change does not need
to rely on the format string.


# 216104 01-Dec-2010 trasz

Remove useless NULL checks for M_WAITOK mallocs.


# 209582 28-Jun-2010 dougb

If i is going to be used in the loop unconditionally the declaration
has to be unconditional as well.

Conical head covering to: kib


# 209580 28-Jun-2010 kib

Despite system call deregistration drains the threads executing System V
shm syscalls, and initial check for the number of allocated segments
in the module deinitialization code, the following might happen:
after the check for active segment, while waiting for threads to
leave some other syscall, shmget(2) is called. Then, we can end
up with the shared segment that cannot be detached since sysvshm
module is unloaded.

Prevent the leak by rechecking and disclaiming a reference to the vm
object owned by sysvshm module, that might have grown during the drain.

Tested by: pho
Reviewed by: jhb
MFC after: 1 month


# 209037 11-Jun-2010 ivoras

In another move to join with the age of the Fruitbat, increase SYSV
shared resources defaults beyond absolute minimums.

The new values are chosen mostly by magic. They are still fairly
small and will need increasing for large installations (especially
SHMMAX). However, they are now enough to e.g. start PostgreSQL
installations with ~~300 users and nearly 512 MB of shared buffers.

Reviewed by: A short discussion on hackers@


# 205323 19-Mar-2010 kib

Move SysV IPC freebsd32 compat shims from freebsd32_misc.c to corresponding
sysv_{msg,sem,shm}.c files.

Mark SysV IPC freebsd32 syscalls as NOSTD and add required
SYSCALL_INIT_HELPER/SYSCALL32_INIT_HELPERs to provide auto
register/unregister on module load.

This makes COMPAT_FREEBSD32 functional with SysV IPC compiled and loaded
as modules.

Reviewed by: jhb
MFC after: 2 weeks


# 198449 24-Oct-2009 ru

- Rename tunable kern.ipc.shmmaxpgs to kern.ipc.shmall.
- Explain the fuss when initializing shmmax.

PR: 75542 (mistakenly closed instead of PR 75541)


# 194976 25-Jun-2009 jhb

Use the correct cast for the arguments passed to freebsd_shmctl() in
oshmctl().

Submitted by: kib


# 194959 25-Jun-2009 jhb

Tweak the oshmctl() compile fix: convert the K&R definition to ANSI.


# 194941 25-Jun-2009 rwatson

oshmctl() now requires a sysv_shm.c-local function prototype.


# 194910 24-Jun-2009 jhb

Change the ABI of some of the structures used by the SYSV IPC API:
- The uid/cuid members of struct ipc_perm are now uid_t instead of unsigned
short.
- The gid/cgid members of struct ipc_perm are now gid_t instead of unsigned
short.
- The mode member of struct ipc_perm is now mode_t instead of unsigned short
(this is merely a style bug).
- The rather dubious padding fields for ABI compat with SV/I386 have been
removed from struct msqid_ds and struct semid_ds.
- The shm_segsz member of struct shmid_ds is now a size_t instead of an
int. This removes the need for the shm_bsegsz member in struct
shmid_kernel and should allow for complete support of SYSV SHM regions
>= 2GB.
- The shm_nattch member of struct shmid_ds is now an int instead of a
short.
- The shm_internal member of struct shmid_ds is now gone. The internal
VM object pointer for SHM regions has been moved into struct
shmid_kernel.
- The existing __semctl(), msgctl(), and shmctl() system call entries are
now marked COMPAT7 and new versions of those system calls which support
the new ABI are now present.
- The new system calls are assigned to the FBSD-1.1 version in libc. The
FBSD-1.0 symbols in libc now refer to the old COMPAT7 system calls.
- A simplistic framework for tagging system calls with compatibility
symbol versions has been added to libc. Version tags are added to
system calls by adding an appropriate __sym_compat() entry to
src/lib/libc/incldue/compat.h. [1]

PR: kern/16195 kern/113218 bin/129855
Reviewed by: arch@, rwatson
Discussed with: kan, kib [1]


# 194894 24-Jun-2009 jhb

Deprecate the msgsys(), semsys(), and shmsys() system calls by moving
them under COMPAT_FREEBSD[4567]. Starting with FreeBSD 5.0 the SYSV IPC
API was implemented via direct system calls (e.g. msgctl(), msgget(), etc.)
rather than indirecting through the var-args *sys() system calls. The
shmsys() system call was already effectively deprecated for all but
COMPAT_FREEBSD4 already as its implementation for the !COMPAT_FREEBSD4 case
was to simply invoke nosys().


# 194832 24-Jun-2009 jhb

- Move syscall function argument structure types to be just above the
relevenat system call function.
- Whitespace fixes.


# 194766 23-Jun-2009 kib

Implement global and per-uid accounting of the anonymous memory. Add
rlimit RLIMIT_SWAP that limits the amount of swap that may be reserved
for the uid.

The accounting information (charge) is associated with either map entry,
or vm object backing the entry, assuming the object is the first one
in the shadow chain and entry does not require COW. Charge is moved
from entry to object on allocation of the object, e.g. during the mmap,
assuming the object is allocated, or on the first page fault on the
entry. It moves back to the entry on forks due to COW setup.

The per-entry granularity of accounting makes the charge process fair
for processes that change uid during lifetime, and decrements charge
for proper uid when region is unmapped.

The interface of vm_pager_allocate(9) is extended by adding struct ucred *,
that is used to charge appropriate uid when allocation if performed by
kernel, e.g. md(4).

Several syscalls, among them is fork(2), may now return ENOMEM when
global or per-uid limits are enforced.

In collaboration with: pho
Reviewed by: alc
Approved by: re (kensmith)


# 193845 09-Jun-2009 alc

Eliminate an instance of VM_PROT_READ_IS_EXEC that I overlooked in r190705.


# 193511 05-Jun-2009 rwatson

Move "options MAC" from opt_mac.h to opt_global.h, as it's now in GENERIC
and used in a large number of files, but also because an increasing number
of incorrect uses of MAC calls were sneaking in due to copy-and-paste of
MAC-aware code without the associated opt_mac.h include.

Discussed with: pjd


# 192895 27-May-2009 jamie

Add hierarchical jails. A jail may further virtualize its environment
by creating a child jail, which is visible to that jail and to any
parent jails. Child jails may be restricted more than their parents,
but never less. Jail names reflect this hierarchy, being MIB-style
dot-separated strings.

Every thread now points to a jail, the default being prison0, which
contains information about the physical system. Prison0's root
directory is the same as rootvnode; its hostname is the same as the
global hostname, and its securelevel replaces the global securelevel.
Note that the variable "securelevel" has actually gone away, which
should not cause any problems for code that properly uses
securelevel_gt() and securelevel_ge().

Some jail-related permissions that were kept in global variables and
set via sysctls are now per-jail settings. The sysctls still exist for
backward compatibility, used only by the now-deprecated jail(2) system
call.

Approved by: bz (mentor)


# 189398 05-Mar-2009 kib

Systematically use vm_size_t to specify the size of the segment for VM KPI.
Do not overload the local variable size in kern_shmat() due to vm_size_t
change.
Fix style bug by adding explicit comparision with 0.

Discussed with: bde
MFC after: 1 week


# 189283 02-Mar-2009 kib

Correct types of variables used to track amount of allocated SysV shared
memory from int to size_t. Implement a workaround for current ABI not
allowing to properly save size for and report more then 2Gb sized segment
of shared memory.

This makes it possible to use > 2 Gb shared memory segments on 64bit
architectures. Please note the new BUGS section in shmctl(2) and
UPDATING note for limitations of this temporal solution.

Reviewed by: csjp
Tested by: Nikolay Dzham <i levsha org ua>
MFC after: 2 weeks


# 176221 12-Feb-2008 csjp

Make sure we restrict Linux only IPC calls from being executed
through the FreeBSD ABI. IPC_INFO, SHM_INFO, SHM_STAT were added
specifically for Linux binary support. They are not documented
as being a part of the FreeBSD ABI, also, the structures necessary
for them have been hidden away from the users for a long time.

Also, the Linux ABI layer uses it's own structures to populate the
responses back to the user to ensure that the ABI is consistent.

I think there is a bit more separation work that needs to happen.

Reviewed by: jhb
Discussed with: jhb
Discussed on: freebsd-arch@ (very briefly)
MFC after: 1 month


# 172930 24-Oct-2007 rwatson

Merge first in a series of TrustedBSD MAC Framework KPI changes
from Mac OS X Leopard--rationalize naming for entry points to
the following general forms:

mac_<object>_<method/action>
mac_<object>_check_<method/action>

The previous naming scheme was inconsistent and mostly
reversed from the new scheme. Also, make object types more
consistent and remove spaces from object types that contain
multiple parts ("posix_sem" -> "posixsem") to make mechanical
parsing easier. Introduce a new "netinet" object type for
certain IPv4/IPv6-related methods. Also simplify, slightly,
some entry point names.

All MAC policy modules will need to be recompiled, and modules
not updates as part of this commit will need to be modified to
conform to the new KPI.

Sponsored by: SPARTA (original patches against Mac OS X)
Obtained from: TrustedBSD Project, Apple Computer


# 167232 05-Mar-2007 rwatson

Further system call comment cleanup:

- Remove also "MP SAFE" after prior "MPSAFE" pass. (suggested by bde)
- Remove extra blank lines in some cases.
- Add extra blank lines in some cases.
- Remove no-op comments consisting solely of the function name, the word
"syscall", or the system call name.
- Add punctuation.
- Re-wrap some comments.


# 167211 04-Mar-2007 rwatson

Remove 'MPSAFE' annotations from the comments above most system calls: all
system calls now enter without Giant held, and then in some cases, acquire
Giant explicitly.

Remove a number of other MPSAFE annotations in the credential code and
tweak one or two other adjacent comments.


# 166835 19-Feb-2007 rwatson

Remove call to ipcperm() in shmget_existing(). The flags argument is
ignored on other systems I investigated when accessing an existing
memory segment rather than creating a new one. This call to ipcperm()
is the only one to pass in a complete mode flag to the permission
checks rather than a simple access request mask, and caused problems
for the revised ipcperm() based on the priv(9) interface, which can
now be restored.

PR: 106078


# 163606 22-Oct-2006 rwatson

Complete break-out of sys/sys/mac.h into sys/security/mac/mac_framework.h
begun with a repo-copy of mac.h to mac_framework.h. sys/mac.h now
contains the userspace and user<->kernel API and definitions, with all
in-kernel interfaces moved to mac_framework.h, which is now included
across most of the kernel instead.

This change is the first step in a larger cleanup and sweep of MAC
Framework interfaces in the kernel, and will not be MFC'd.

Obtained from: TrustedBSD Project
Sponsored by: SPARTA


# 162468 20-Sep-2006 rwatson

Remove MAC_DEBUG + MPRINTF debugging from System V IPC. This no longer
appears to be serving a useful purpose, as it was used during initial
development of MAC support for System V IPC.

MFC after: 1 month
Obtained from: TrustedBSD Project
Suggested by: Christopher dot Vance at SPARTA dot com


# 159481 10-Jun-2006 rwatson

Move some functions and definitions from uipc_socket2.c to uipc_socket.c:

- Move sonewconn(), which creates new sockets for incoming connections on
listen sockets, so that all socket allocate code is together in
uipc_socket.c.

- Move 'maxsockets' and associated sysctls to uipc_socket.c with the
socket allocation code.

- Move kern.ipc sysctl node to uipc_socket.c, add a SYSCTL_DECL() for it
to sysctl.h and remove lots of scattered implementations in various
IPC modules.

- Sort sodealloc() after soalloc() in uipc_socket.c for dependency order
reasons. Statisticize soalloc() and sodealloc() as they are now
required only in uipc_socket.c, and are internal to the socket
implementation.

After this change, socket allocation and deallocation is entirely
centralized in one file, and uipc_socket2.c consists entirely of socket
buffer manipulation and default protocol switch functions.

MFC after: 1 month


# 157285 30-Mar-2006 ps

Properly support for FreeBSD 4 32bit System V shared memory.

Submitted by: peter
Obtained from: Yahoo!
MFC after: 3 weeks


# 150937 04-Oct-2005 rwatson

Re-order MAC and DAC checks in shmget() in order to give precedence to
the MAC result, as well as avoid losing the DAC check result when MAC
is enabled.

MFC after: 3 days
Reported by: Patrick LeBlanc <Patrick dot LeBlanc at sparta dot com>


# 148782 06-Aug-2005 csjp

Change the data type of the upper shared memory limits from a signed
integer to an unsigned long. This lifts variables like the maximum
number of pages available for shared memory from 2^31 to 2^32 on 32
bit architectures, and from 2^31 to 2^64 on 64 bit architectures.

It should be noted that this changes breaks ABI on 64 bit architectures
because the size of the shmmax, shmmin, shmmni, shmseg and shmall members
of the shminfo structure has changed.

Silence on: current@


# 146164 12-May-2005 jhb

Actually use the iterating variable in the for loop when trying to avoid
overflow.

Reported by: Vladislav Shabanov vs at rambler-co dot ru
MFC after: 1 week
Glanced at: alfred


# 141710 11-Feb-2005 csjp

Add much needed descriptions for a number of the IPC related sysctl OIDs.
This information will be very useful for people who are tuning applications
which have a dependence on IPC mechanisms.

The following OIDs were documented:

Message queues:
kern.ipc.msgmax
kern.ipc.msgmni
kern.ipc.msgmnb
kern.ipc.msgtlq
kern.ipc.msgssz
kern.ipc.msgseg

Semaphores:
kern.ipc.semmap
kern.ipc.semmni
kern.ipc.semmns
kern.ipc.semmnu
kern.ipc.semmsl
kern.ipc.semopm
kern.ipc.semume
kern.ipc.semusz
kern.ipc.semvmx
kern.ipc.semaem

Shared memory:
kern.ipc.shmmax
kern.ipc.shmmin
kern.ipc.shmmni
kern.ipc.shmseg
kern.ipc.shmall
kern.ipc.shm_use_phys
kern.ipc.shm_allow_removed
kern.ipc.shmsegs

These new descriptions can be viewed using sysctl -d

PR: kern/65219
Submitted by: Dan Nelson <dnelson at allantgroup dot com> (modified)
No objections: developers@
Descriptions reviewed by: gnn
MFC after: 1 week


# 140617 22-Jan-2005 rwatson

Invoke label initialization, creation, cleanup, and tear-down MAC
Framework entry points for System V IPC shared memory.

Submitted by: Dandekar Hrishikesh <rishi_dandekar at sbcglobal dot net>
Obtained from: TrustedBSD Project
Sponsored by: DARPA, SPAWAR, McAfee Research


# 139804 06-Jan-2005 imp

/* -> /*- for copyright notices, minor format tweaks as necessary


# 137613 12-Nov-2004 rwatson

Second of several commits to allow kernel System V IPC data structures
to be modified and extended without breaking the user space ABI:

Use _kernel variants on _ds structures for System V sempahores, message
queues, and shared memory. When interfacing with userspace, export
only the _ds subsets of the _kernel data structures. A lot of search
and replace.

Define the message structure in the _KERNEL portion of msg.h so that it
can be used by other kernel consumers, but not exposed to user space.

Submitted by: Dandekar Hrishikesh <rishi_dandekar at sbcglobal dot net>
Obtained from: TrustedBSD Project
Sponsored by: DARPA, SPAWAR, McAfee Research


# 134675 03-Sep-2004 alc

Push Giant deep into vm_forkproc(), acquiring it only if the process has
mapped System V shared memory segments (see shmfork_myhook()) or requires
the allocation of an ldt (see vm_fault_wire()).


# 132777 28-Jul-2004 kan

Forced commit to note that prevois commit has nothing to do with casts
as lvalues. I meant to replace a bogus function cast with less bogus one.


# 132776 28-Jul-2004 kan

Avoid casts as lvalues.


# 132684 27-Jul-2004 alc

- Use atomic ops for updating the vmspace's refcnt and exitingcnt.
- Push down Giant into shmexit(). (Giant is acquired only if the vmspace
contains shm segments.)
- Eliminate the acquisition of Giant from proc_rwmem().
- Reduce the scope of Giant in exit1(), uncovering the destruction of the
address space.


# 131857 09-Jul-2004 alc

Eliminate struct shm_handle. It is an unnecessary level of indirection to
a vm_object.


# 130730 19-Jun-2004 tjr

When no fixed address is given in a shmat() request, pass a hint address
to vm_map_find() that is less likely to be outside of addressable memory
for 32-bit processes: just past the end of the largest possible heap.
This is the same hint that mmap() uses.


# 129882 30-May-2004 phk

Add missing #include <sys/module.h>


# 125488 05-Feb-2004 nectar

Correct a reference counting bug in shmat(2). If vm_map_find(9)
failed, the reference count for the virtual memory object referenced
by the specified shared memory segment would have been erroneously
incremented.

Reported by: Joost Pol <joost@pine.nl>


# 122201 07-Nov-2003 rwatson

Slight whitespace consistency improvement:
Trim trailing whitespace.
Remove unmatched " " before ")".


# 122088 04-Nov-2003 fjoe

Back out the following revisions:

1.36 +73 -60 src/sys/compat/linux/linux_ipc.c
1.83 +102 -48 src/sys/kern/sysv_shm.c
1.8 +4 -0 src/sys/sys/syscallsubr.h

That change was intended to support vmware3, but
wantrem parameter is useless because vmware3 uses SYSV shared memory
to talk with X server and X server is native application.
The patch worked because check for wantrem was not valid
(wantrem and SHMSEG_REMOVED was never checked for SHMSEG_ALLOCATED segments).

Add kern.ipc.shm_allow_removed (integer, rw) sysctl (default 0) which when set
to 1 allows to return removed segments in
shm_find_segment_by_shmid() and shm_find_segment_by_shmidx().

MFC after: 1 week


# 121307 21-Oct-2003 silby

Change all SYSCTLS which are readonly and have a related TUNABLE
from CTLFLAG_RD to CTLFLAG_RDTUN so that sysctl(8) can provide
more useful error messages.


# 118615 07-Aug-2003 nectar

Update some argument-documenting comments to match reality.

Add an explicit range check to those same arguments to reduce risk of
cardiac arrest in future code readers.


# 118607 07-Aug-2003 jhb

Consistently use the BSD u_int and u_short instead of the SYSV uint and
ushort. In most of these files, there was a mixture of both styles and
this change just makes them self-consistent.

Requested by: bde (kern_ktrace.c)


# 116182 10-Jun-2003 obrien

Use __FBSDID().


# 114724 05-May-2003 mbr

Change the semantics of sysv shm emulation to take a additional
argument to the functions shm{at,ctl}1 and shm_find_segment_by_shmid{x}.
The BSD semantics didn't allow the usage of shared segment after
being marked for removal through IPC_RMID.

The patch involves the following functions:
- shmat
- shmctl
- shm_find_segment_by_shmid
- shm_find_segment_by_shmidx
- linux_shmat
- linux_shmctl

Submitted by: Orlando Bassotto <orlando.bassotto@ieo-research.it>
Reviewed by: marcel


# 113448 13-Apr-2003 alc

Lock some manipulations of the vm object's flags.


# 111119 19-Feb-2003 imp

Back out M_* changes, per decision of the TRB.

Approved by: trb


# 111009 16-Feb-2003 alfred

Fix logic in loop so it actually executes.

Pointed out by: fjoe


# 110982 16-Feb-2003 alfred

prevent overflow in shminfo.shmmax


# 109831 25-Jan-2003 alfred

Bring shm functions closer the the opengroup standards.

PR: 47469
Submitted by: Craig Rodrigues <rodrigc@attbi.com>


# 109623 21-Jan-2003 alfred

Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0.
Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.


# 109205 13-Jan-2003 dillon

It is possible for an active aio to prevent shared memory from being
dereferenced when a process exits due to the vmspace ref-count being
bumped. Change shmexit() and shmexit_myhook() to take a vmspace instead
of a process and call it in vmspace_dofree(). This way if it is missed
in exit1()'s early-resource-free it will still be caught when the zombie is
reaped.

Also fix a potential race in shmexit_myhook() by NULLing out
vmspace->vm_shm prior to calling shm_delete_mapping() and free().

MFC after: 7 days


# 108556 02-Jan-2003 alc

Lock the vm object when performing back-to-back vm_object_clear_flag() and
vm_object_set_flag().


# 101891 15-Aug-2002 alfred

return foo -> return (foo)


# 100512 22-Jul-2002 alfred

Change struct vmspace->vm_shm from void * to struct shmmap_state *, this
removes the need for casts in several cases.


# 100511 22-Jul-2002 alfred

Remove caddr_t.


# 92723 19-Mar-2002 alfred

Remove __P.


# 91703 05-Mar-2002 jhb

- Use td_ucred for jail checks.
- Move jail checks and some other checks involving constants and stack
variables out from under Giant. This isn't perfectly safe atm because
jail_sysvipc_allowed is read w/o a lock meaning that its value could be
stale. This global variable will soon become a per-jail flag, however,
at which time it will either not need a lock or will use the prison lock.


# 91406 27-Feb-2002 jhb

Simple p_ucred -> td_ucred changes to start using the per-thread ucred
reference.


# 88633 29-Dec-2001 alfred

Make AIO a loadable module.

Remove the explicit call to aio_proc_rundown() from exit1(), instead AIO
will use at_exit(9).

Add functions at_exec(9), rm_at_exec(9) which function nearly the
same as at_exec(9) and rm_at_exec(9), these functions are called
on behalf of modules at the time of execve(2) after the image
activator has run.

Use a modified version of tegge's suggestion via at_exec(9) to close
an exploitable race in AIO.

Fix SYSCALL_MODULE_HELPER such that it's archetecuterally neutral,
the problem was that one had to pass it a paramater indicating the
number of arguments which were actually the number of "int". Fix
it by using an inline version of the AS macro against the syscall
arguments. (AS should be available globally but we'll get to that
later.)

Add a primative system for dynamically adding kqueue ops, it's really
not as sophisticated as it should be, but I'll discuss with jlemon when
he's around.


# 85623 28-Oct-2001 mr

Introduce [IPC|SHM]_[INFO|STAT] to shmctl to make
`/compat/linux/usr/bin/ipcs -m` happy.


# 84783 10-Oct-2001 ps

Make MAXTSIZ, DFLDSIZ, MAXDSIZ, DFLSSIZ, MAXSSIZ, SGROWSIZ loader
tunable.

Reviewed by: peter
MFC after: 2 weeks


# 83413 13-Sep-2001 mr

PR: kern/29698 (part)
Reviewed by: audit
Add tunables for the sem* and shm* syscontrols for tuning on boottime
until they become dynamic.
SAP R/3 doesn't like the compiled in defaults.


# 83366 12-Sep-2001 julian

KSE Milestone 2
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.

Sorry john! (your next MFC will be a doosie!)

Reviewed by: peter@freebsd.org, dillon@freebsd.org

X-MFC after: ha ha ha ha


# 82607 30-Aug-2001 dillon

Giant Pushdown: sysv shm, sem, and msg calls.


# 79224 04-Jul-2001 dillon

With Alfred's permission, remove vm_mtx in favor of a fine-grained approach
(this commit is just the first stage). Also add various GIANT_ macros to
formalize the removal of Giant, making it easy to test in a more piecemeal
fashion. These macros will allow us to test fine-grained locks to a degree
before removing Giant, and also after, and to remove Giant in a piecemeal
fashion via sysctl's on those subsystems which the authors believe can
operate without Giant.


# 77461 30-May-2001 dd

Export via sysctl:
* all members of msginfo from sysv_msg.c;
* msqids from sysv_msg.c;
* sema from sysv_sem.c; and
* shmsegs from sysv_shm.c;

These will be used by ipcs(1) in non-kvm mode.

Reviewed by: tmm


# 77104 23-May-2001 dd

Correct style bugs with regards to long lines and comments.

Reviewed by: bde


# 76972 22-May-2001 dd

Correct the vm_mtx handling; specifically, don't acquire it in
shm_deallocate_segment because shmexit_myhook calls it, and the latter
should always be called with it already held.

Submitted by: dwmalone, dd
Approved by: alfred


# 76941 21-May-2001 jhb

Sort includes.


# 76908 20-May-2001 alfred

Aquire vm mutex when releasing sysv shm segments.

Obtained from: Dima Dorfman <dima@unixfreak.org>


# 76827 18-May-2001 alfred

Introduce a global lock for the vm subsystem (vm_mtx).

vm_mtx does not recurse and is required for most low level
vm operations.

faults can not be taken without holding Giant.

Memory subsystems can now call the base page allocators safely.

Almost all atomic ops were removed as they are covered under the
vm mutex.

Alpha and ia64 now need to catch up to i386's trap handlers.

FFS and NFS have been tested, other filesystems will need minor
changes (grabbing the vm lock when twiddling page properties).

Reviewed (partially) by: jake, jhb


# 76275 04-May-2001 dillon

Raise the SysV shared memory defaults to more reasonable values.
Mainly increases the shared memory limit from 4M to 32M (approx).
Many more programs these days use SysV shared memory, especially X-related
programs.


# 76166 01-May-2001 markm

Undo part of the tangle of having sys/lock.h and sys/mutex.h included in
other "system" header files.

Also help the deprecation of lockmgr.h by making it a sub-include of
sys/lock.h and removing sys/lockmgr.h form kernel .c files.

Sort sys/*.h includes where possible in affected files.

OK'ed by: bde (with reservations)


# 72786 21-Feb-2001 rwatson

o Move per-process jail pointer (p->pr_prison) to inside of the subject
credential structure, ucred (cr->cr_prison).
o Allow jail inheritence to be a function of credential inheritence.
o Abstract prison structure reference counting behind pr_hold() and
pr_free(), invoked by the similarly named credential reference
management functions, removing this code from per-ABI fork/exit code.
o Modify various jail() functions to use struct ucred arguments instead
of struct proc arguments.
o Introduce jailed() function to determine if a credential is jailed,
rather than directly checking pointers all over the place.
o Convert PRISON_CHECK() macro to prison_check() function.
o Move jail() function prototypes to jail.h.
o Emulate the P_JAILED flag in fill_kinfo_proc() and no longer set the
flag in the process flags field itself.
o Eliminate that "const" qualifier from suser/p_can/etc to reflect
mutex use.

Notes:

o Some further cleanup of the linux/jail code is still required.
o It's now possible to consider resolving some of the process vs
credential based permission checking confusion in the socket code.
o Mutex protection of struct prison is still not present, and is
required to protect the reference count plus some fields in the
structure.

Reviewed by: freebsd-arch
Obtained from: TrustedBSD Project


# 72019 04-Feb-2001 green

It is _DEFINITELY_ not okay to change shmseg on a running system.


# 71038 14-Jan-2001 des

Use predictable internal names for the sysvipc modules, so we have a
chance of getting dependencies working.


# 69449 01-Dec-2000 alfred

sysvipc loadable.

new syscall entry lkmressys - "reserved loadable syscall"

Make syscall_register allow overwriting of such entries (lkmressys).


# 68024 30-Oct-2000 rwatson

o Deny access to System V IPC from within jail by default, as in the
current implementation, jail neither virtualizes the Sys V IPC namespace,
nor provides inter-jail protections on IPC objects.
o Support for System V IPC can be enabled by setting jail.sysvipc_allowed=1
using sysctl.
o This is not the "real fix" which involves virtualizing the System V
IPC namespace, but prevents processes within jail from influencing those
outside of jail when not approved by the administrator.

Reported by: Paulo Fragoso <paulo@nlink.com.br>


# 61081 29-May-2000 dillon

This is a cleanup patch to Peter's new OBJT_PHYS VM object type
and sysv shared memory support for it. It implements a new
PG_UNMANAGED flag that has slightly different characteristics
from PG_FICTICIOUS.

A new sysctl, kern.ipc.shm_use_phys has been added to enable the
use of physically-backed sysv shared memory rather then swap-backed.
Physically backed shm segments are not tracked with PV entries,
allowing programs which use a large shm segment as a rendezvous
point to operate without eating an insane amount of KVM in the
PV entry management. Read: Oracle.

Peter's OBJT_PHYS object will also allow us to eventually implement
page-table sharing and/or 4MB physical page support for such segments.
We're half way there.


# 60758 21-May-2000 peter

Provide a temporary undocumented option: SHM_PHYS_BACKED. This will
become sysctl and/or flags controlled later. It's mainly here for an
easy place to test the physical memory backed objects.


# 58820 30-Mar-2000 peter

Make sysv-style shared memory tuneable params fully runtime adjustable
via sysctl. It's done pretty simply but it should be quite adequate.
Also move SHMMAXPGS from $machine/include/vmparam.h as the comments that
went with it were wrong... we don't allocate KVM space for the pages so
that comment is bogus.. The only practical limit is how much physical
ram you want to lock up as this stuff isn't paged out or swap backed.


# 57884 10-Mar-2000 alc

shmat: If VM_PROT_READ_IS_EXEC is defined and prot includes VM_PROT_READ,
VM_PROT_EXECUTE must be added to prot before calling vm_map_find.

Without this change, an mprotect on a shmat'ed region fails (when
it shouldn't). This bug was reported Feb 28 by Brooks Davis
<brooks@one-eyed-alien.net> on -hackers.

Reviewed by: bde
Approved by: jkh


# 52635 29-Oct-1999 phk

useracc() the prequel:

Merge the contents (less some trivial bordering the silly comments)
of <vm/vm_prot.h> and <vm/vm_inherit.h> into <vm/vm.h>. This puts
the #defines for the vm_inherit_t and vm_prot_t types next to their
typedefs.

This paves the road for the commit to follow shortly: change
useracc() to use VM_PROT_{READ|WRITE} rather than B_{READ|WRITE}
as argument.


# 50477 27-Aug-1999 peter

$Id$ -> $FreeBSD$


# 48039 19-Jun-1999 alc

For consistency with other implementations, check for the existence
of the segment before checking its permissions.

PR: kern/11999
Submitted by: Brooks Davis <brooks@one-eyed-alien.net>


# 46116 27-Apr-1999 phk

Change suser_xxx() to suser() where it applies.


# 42957 21-Jan-1999 dillon

This is a rather large commit that encompasses the new swapper,
changes to the VM system to support the new swapper, VM bug
fixes, several VM optimizations, and some additional revamping of the
VM code. The specific bug fixes will be documented with additional
forced commits. This commit is somewhat rough in regards to code
cleanup issues.

Reviewed by: "John S. Dyson" <root@dyson.iquest.net>, "David Greenman" <dg@root.com>


# 40286 13-Oct-1998 dg

Fixed two potentially serious classes of bugs:

1) The vnode pager wasn't properly tracking the file size due to
"size" being page rounded in some cases and not in others.
This sometimes resulted in corrupted files. First noticed by
Terry Lambert.
Fixed by changing the "size" pager_alloc parameter to be a 64bit
byte value (as opposed to a 32bit page index) and changing the
pagers and their callers to deal with this properly.
2) Fixed a bogus type cast in round_page() and trunc_page() that
caused some 64bit offsets and sizes to be scrambled. Removing
the cast required adding casts at a few dozen callers.
There may be problems with other bogus casts in close-by
macros. A quick check seemed to indicate that those were okay,
however.


# 38517 24-Aug-1998 dfr

Change various syscalls to use size_t arguments instead of u_int.

Add some overflow checks to read/write (from bde).

Change all modifications to vm_page::flags, vm_page::busy, vm_object::flags
and vm_object::paging_in_progress to use operations which are not
interruptable.

Reviewed by: Bruce Evans <bde@zeta.org.au>


# 35694 04-May-1998 dyson

Fix the shm panic. I mistakenly used the shadow_count to keep the object
from being split, and instead added an OBJ_NOSPLIT.


# 35669 04-May-1998 dyson

Work around some VM bugs, the worst being an overly aggressive
swap space free calculation. More complete fixes will be forthcoming,
in a week.


# 34961 30-Mar-1998 phk

Eradicate the variable "time" from the kernel, using various measures.
"time" wasn't a atomic variable, so splfoo() protection were needed
around any access to it, unless you just wanted the seconds part.

Most uses of time.tv_sec now uses the new variable time_second instead.

gettime() changed to getmicrotime(0.

Remove a couple of unneeded splfoo() protections, the new getmicrotime()
is atomic, (until Bruce sets a breakpoint in it).

A couple of places needed random data, so use read_random() instead
of mucking about with time which isn't random.

Add a new nfs_curusec() function.

Mark a couple of bogosities involving the now disappeard time variable.

Update ffs_update() to avoid the weird "== &time" checks, by fixing the
one remaining call that passwd &time as args.

Change profiling in ncr.c to use ticks instead of time. Resolution is
the same.

Add new function "tvtohz()" to avoid the bogus "splfoo(), add time, call
hzto() which subtracts time" sequences.

Reviewed by: bde


# 33181 09-Feb-1998 eivind

Staticize.


# 31778 16-Dec-1997 eivind

Make COMPAT_43 and COMPAT_SUNOS new-style options.


# 30994 06-Nov-1997 phk

Move the "retval" (3rd) parameter from all syscall functions and put
it in struct proc instead.

This fixes a boatload of compiler warning, and removes a lot of cruft
from the sources.

I have not removed the /*ARGSUSED*/, they will require some looking at.

libkvm, ps and other userland struct proc frobbing programs will need
recompiled.


# 30354 12-Oct-1997 phk

Last major round (Unless Bruce thinks of somthing :-) of malloc changes.

Distribute all but the most fundamental malloc types. This time I also
remembered the trick to making things static: Put "static" in front of
them.

A couple of finer points by: bde


# 30309 11-Oct-1997 phk

Distribute and statizice a lot of the malloc M_* types.

Substantial input from: bde


# 27845 02-Aug-1997 bde

Removed unused #includes.


# 22975 22-Feb-1997 peter

Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not
ready for it yet.


# 22521 10-Feb-1997 dyson

This is the kernel Lite/2 commit. There are some requisite userland
changes, so don't expect to be able to run the kernel as-is (very well)
without the appropriate Lite/2 userland changes.

The system boots and can mount UFS filesystems.

Untested: ext2fs, msdosfs, NFS
Known problems: Incorrect Berkeley ID strings in some files.
Mount_std mounts will not work until the getfsent
library routine is changed.

Reviewed by: various people
Submitted by: Jeffery Hsu <hsu@freebsd.org>


# 21673 14-Jan-1997 jkh

Make the long-awaited change from $Id$ to $FreeBSD$

This will make a number of things easier in the future, as well as (finally!)
avoiding the Id-smashing problem which has plagued developers for so long.

Boy, I'm glad we're not using sup anymore. This update would have been
insane otherwise.


# 20821 22-Dec-1996 joerg

Make DFLDSIZ and MAXDSIZ fully-supported options.

"Don't forget to do a ``make depend''" :-)


# 18231 10-Sep-1996 dyson

Fix a problem with child inheritance of sysv shm. Problem brought
to my attention by Brad Lineberger <bil@mpgn.com> and Rob Miracle.


# 18199 09-Sep-1996 dyson

Make sure that the pager is allocated before it is needed. Hangs
can occur if the pager is not allocated in time.


# 18098 07-Sep-1996 dyson

Corrected an error where precious kernel virtual space was being allocated
for entire SYS5 SHM segments. This is totally unnecessary, and so the
correct allocation of VM objects has been substituted. (The vm_mmap
was misused -- vm_object_allocate is more appropriate.)


# 15636 05-May-1996 joerg

uninitialized auto variable shmseg is used in ...

Closes PR #kern/1174

Submitted by: enami@ba2.so-net.or.jp


# 15583 03-May-1996 phk

Another sweep over the pmap/vm macros, this time with more focus on
the usage. I'm not satisfied with the naming, but now at least there is
less bogus stuff around.


# 15543 02-May-1996 phk

removed:
CLBYTES PD_SHIFT PGSHIFT NBPG PGOFSET CLSIZELOG2 CLSIZE pdei()
ptei() kvtopte() ptetov() ispt() ptetoav() &c &c
new:
NPDEPG

Major macro cleanup.


# 14221 23-Feb-1996 peter

kern_descrip.c: add fdshare()/fdcopy()
kern_fork.c: add the tiny bit of code for rfork operation.
kern/sysv_*: shmfork() takes one less arg, it was never used.
sys/shm.h: drop "isvfork" arg from shmfork() prototype
sys/param.h: declare rfork args.. (this is where OpenBSD put it..)
sys/filedesc.h: protos for fdshare/fdcopy.
vm/vm_mmap.c: add minherit code, add rounding to mmap() type args where
it makes sense.
vm/*: drop unused isvfork arg.

Note: this rfork() implementation copies the address space mappings,
it does not connect the mappings together. ie: once the two processes
have split, the pages may be shared, but the address space is not. If one
does a mmap() etc, it does not appear in the other. This makes it not
useful for pthreads, but it is useful in it's own right for having
light-weight threads in a static shared address space.

Obtained from: Original by Ron Minnich, extended by OpenBSD


# 13255 05-Jan-1996 wollman

Somehow managed to miss these four files when converting the SYSV IPC
options over to the new style.


# 13033 26-Dec-1995 joerg

I report a problem about shmget(). (I'm using FreeBSD-2.1.0R)

int shmget(key_t key, int size, int shmflg);

If the 'key' has already existed in the system and set 'shmflg'
as '(IPC_CREAT|IPC_EXC)', then shmget() must return the error 'EEXIST'.

Submitted by: m_tanaka@pa.yokogawa.co.jp (Mihoko Tanaka)


# 12866 15-Dec-1995 peter

Update sysv_*.c to get their argument definitions from sysproto.h


# 12819 14-Dec-1995 phk

A Major staticize sweep. Generates a couple of warnings that I'll deal
with later.
A number of unused vars removed.
A number of unused procs removed or #ifdefed.


# 12662 07-Dec-1995 dg

Untangled the vm.h include file spaghetti.


# 12613 04-Dec-1995 jkh

Close PR: kern/865
Submitted by: Juergen Lock <nox@jelal.hb.north.de>


# 12546 30-Nov-1995 julian

Submitted by: jb@cimlogic.com.au (John Birrell)
Obtained from: NetBSD as well (He submitted it there too)

make sure that teh shm region is beyond the sum of the text and data segs
as it was big progs could collide with the shm region.


# 11626 21-Oct-1995 bde

Start including <sys/sysproto.h> to get the correct args structs and
prototypes for all syscalls. The args structs are still declared in
comments as in VOP implementation functions. I don't like the
duplication for this, but several more layers of changes are required
to get it right. First we need to catch up with 4.4lite2, which uses
macros to handle struct padding. Then we need to catch up with NetBSD,
which passes the args correctly (as void *). Then we need to handle
varargs functions and struct padding better. I think all the details
can be hidden in machine-generated functions so that the args structs
and verbose macros to reference them don't have to appear in the core
sources.

Add prototypes.

Add bogus casts to hide the evil type puns exposed by the previous
steps. &uap[1] was used to get at the args after the first. This
worked because only the first arg in *uap was declared. This broke
when the machine- genenerated args struct declared all the args
(actually it declares extra args in some cases and depends on the
user stack having some accessible junk after the last arg, not to
mention the user args being on the stack. It isn't possible to
declare a correct args struct for a varargs syscall). The msgsys(),
semsys() and shmsys() syscall interfaces are BAD because they
multiplex several syscalls that have different types of args.
There was no reason to duplicate this sysv braindamage but now
we're stuck with it. NetBSD has reimplemented the syscalls properly
as separate syscalls #220-231.

Declare static functions as static in both their prototype and their
implementation (the latter is optional, and this misfeature was used).

Remove gratuitous #includes.

Continue cleaning up new init stuff.


# 10653 09-Sep-1995 dg

Fixed init functions argument type - caddr_t -> void *. Fixed a couple of
compiler warnings.


# 10428 29-Aug-1995 bde

Fix several sysinit functions that had the wrong type and unnecessarily
external linkage.

Remove useless comments saying that SYSINIT() does system initialization.

shm.c:
Remove nearly useless comment that gave wrong pseudo-prototypes.


# 10358 28-Aug-1995 julian

Reviewed by: julian with quick glances by bruce and others
Submitted by: terry (terry lambert)
This is a composite of 3 patch sets submitted by terry.
they are:
New low-level init code that supports loadbal modules better
some cleanups in the namei code to help terry in 16-bit character support
some changes to the mount-root code to make it a little more
modular..

NOTE: mounting root off cdrom or NFS MIGHT be broken as I haven't been able
to test those cases..

certainly mounting root of disk still works just fine..
mfs should work but is untested. (tomorrows task)

The low level init stuff includes a total rewrite of init_main.c
to make it possible for new modules to have an init phase by simply
adding an entry to a TEXT_SET (or is it DATA_SET) list. thus a new module can
be added to the kernel without editing any other files other than the
'files' file.


# 9759 29-Jul-1995 bde

Eliminate sloppy common-style declarations. There should be none left for
the LINT configuation.


# 8876 30-May-1995 rgrimes

Remove trailing whitespace.


# 6579 20-Feb-1995 dg

Use of vm_allocate() and vm_deallocate() has been deprecated.


# 3308 02-Oct-1994 phk

All of this is cosmetic. prototypes, #includes, printfs and so on. Makes
GCC a lot more silent.


# 2828 16-Sep-1994 dfr

Added code for FreeBSD-1.1.5 backwards compatibility.


# 2729 13-Sep-1994 dfr

Added SYSV ipcs.

Obtained from: NetBSD and FreeBSD-1.1.5