History log of /freebsd-11-stable/sys/compat/linux/linux_socket.c
Revision Date Author Comments
(<<< Hide modified files)
(Show modified files >>>)
# 346816 28-Apr-2019 dchagin

MFC r329794, r329801 (by emaste@):

Correct proper nouns in the Linuxulator

- Capitalize Linux
- Spell FreeBSD out in full
- Address some style(9) on changed lines


# 346812 28-Apr-2019 dchagin

MFC r328890 (by emaste@):

Linuxolator whitespace cleanup

A version of each of the MD files by necessity exists for each CPU
architecture supported by the Linuxolator. Clean these up so that new
architectures do not inherit whitespace issues.

Clean up shared Linuxolator files while here.


# 343294 22-Jan-2019 markj

MFC r342864:
Specify the correct option level when emulating SO_PEERCRED.

PR: 234722


# 340758 22-Nov-2018 tijl

MFC r340674:

Fix another user address dereference in linux_sendmsg syscall.

This was hidden behind the LINUX_CMSG_NXTHDR macro which dereferences its
second argument. Stop using the macro as well as LINUX_CMSG_FIRSTHDR. Use
the size field of the kernel copy of the control message header to obtain
the next control message.

PR: 217901


# 340756 22-Nov-2018 tijl

MFC r340631:

Do proper copyin of control message data in the Linux sendmsg syscall.

Instead of calling m_append with a user address, allocate an mbuf cluster
and copy data into it using copyin. For the SCM_CREDS case, instead of
zeroing a stack variable and appending that to the mbuf, zero part of the
mbuf cluster directly. One mbuf cluster is also the size limit used by
the FreeBSD sendmsg syscall (uipc_syscalls.c:sockargs()).

PR: 217901
Reviewed by: kib


# 338618 12-Sep-2018 markj

MFC r337423:
Improve handling of control message truncation.

PR: 131876


# 330997 15-Mar-2018 emaste

MFC r329370, r330239: Rationalize license text on Linuxolator files

Many licenses on Linuxolator files contained small variations from the
standard FreeBSD license text. To avoid license proliferation switch to
the standard 2-clause FreeBSD license for those files where I have
permission from each of the listed copyright holders.

Approved by: dchagin, kan, marcel, rdivacky, sos
Sponsored by: The FreeBSD Foundation


# 315955 25-Mar-2017 dchagin

MFC r315499:

Remove superflous break statment.


# 315954 25-Mar-2017 dchagin

MFC r315503:

As noted by Roel Bouwman Linux allows a large buffer size than the
struct ucred size. Fix this.

PR: 102956


# 315549 19-Mar-2017 trasz

MFC r312988:

Add kern_listen(), kern_shutdown(), and kern_socket(), and use them
instead of their sys_*() counterparts in various compats. The svr4
is left untouched, because there's no point.

Sponsored by: DARPA, AFRL


# 315313 15-Mar-2017 dchagin

MFC r313913:

Initialize cap_rights before use.

MFC r313914:

Style(9), some XXX comments fix. No functional changes.


# 315312 15-Mar-2017 dchagin

MFC r305093 (by mjg@):

fd: add fdeget_locked and use in kern_descrip

MFC r305756 (by oshogbo@):

fd: add fget_cap and fget_cap_locked primitives.
They can be used to obtain capabilities along with a referenced fp.

MFC r306174 (by oshogbo@):

capsicum: propagate rights on accept(2)

Descriptor returned by accept(2) should inherits capabilities rights from
the listening socket.

PR: 201052

MFC r306184 (by oshogbo@):

fd: simplify fgetvp_rights by using fget_cap_locked.

MFC r306225 (by mjg@):

fd: fix up fgetvp_rights after r306184

fget_cap_locked returns a referenced file, but the fgetvp_rights does
not need it. Instead, due to the filedesc lock being held, it can
ref the vnode after the file was looked up.

Fix up fget_cap_locked to be consistent with other _locked helpers and not
ref the file.

This plugs a leak introduced in r306184.

MFC r306232 (by oshogbo@):

fd: fix up fget_cap

If the kernel is not compiled with the CAPABILITIES kernel options
fget_unlocked doesn't return the sequence number so fd_modify will
always report modification, in that case we got infinity loop.

MFC r311474 (by glebius@):

Use getsock_cap() instead of fgetsock().

MFC r312079 (by glebius@):

Use getsock_cap() instead of deprecated fgetsock().

MFC r312081 (by glebius@):

Use getsock_cap() instead of deprecated fgetsock().

MFC r312087 (by glebius@):

Remove deprecated fgetsock() and fputsock().

Bump __FreeBSD_version as getsock_cap changed and
fgetsock/fputsock pair removed.


# 314107 22-Feb-2017 dchagin

MFC r313284:

Update syscall.master to 4.10-rc6. Also fix comments, a typo,
and wrong numbering for a few unimplemented syscalls.

For 32-bit Linuxulator, socketcall() syscall was historically
the entry point for the sockets API. Starting in Linux 4.3, direct
syscalls are provided for the sockets API. Enable it.

The initial version of patch was provided by trasz@ and extended by me.

MFC r313285:

Regen after r313284.

MFC r313684:

Fix r313284.

Members of the syscall argument structures are padded to a word size. So,
for COMPAT_LINUX32 we should convert user supplied system call arguments
which is 32-bit in that case to the array of register_t.

MFC r313912:

Finish r313684.

Convert linux_recv(), linux_send() and linux_accept() system call arguments
to the register_t type too.


# 302408 07-Jul-2016 gjb

Copy head@r302406 to stable/11 as part of the 11.0-RELEASE cycle.
Prune svn:mergeinfo from the new branch, as nothing has been merged
here.

Additional commits post-branch will follow.

Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation


/freebsd-11-stable/MAINTAINERS
/freebsd-11-stable/cddl
/freebsd-11-stable/cddl/contrib/opensolaris
/freebsd-11-stable/cddl/contrib/opensolaris/cmd/dtrace/test/tst/common/print
/freebsd-11-stable/cddl/contrib/opensolaris/cmd/zfs
/freebsd-11-stable/cddl/contrib/opensolaris/lib/libzfs
/freebsd-11-stable/contrib/amd
/freebsd-11-stable/contrib/apr
/freebsd-11-stable/contrib/apr-util
/freebsd-11-stable/contrib/atf
/freebsd-11-stable/contrib/binutils
/freebsd-11-stable/contrib/bmake
/freebsd-11-stable/contrib/byacc
/freebsd-11-stable/contrib/bzip2
/freebsd-11-stable/contrib/com_err
/freebsd-11-stable/contrib/compiler-rt
/freebsd-11-stable/contrib/dialog
/freebsd-11-stable/contrib/dma
/freebsd-11-stable/contrib/dtc
/freebsd-11-stable/contrib/ee
/freebsd-11-stable/contrib/elftoolchain
/freebsd-11-stable/contrib/elftoolchain/ar
/freebsd-11-stable/contrib/elftoolchain/brandelf
/freebsd-11-stable/contrib/elftoolchain/elfdump
/freebsd-11-stable/contrib/expat
/freebsd-11-stable/contrib/file
/freebsd-11-stable/contrib/gcc
/freebsd-11-stable/contrib/gcclibs/libgomp
/freebsd-11-stable/contrib/gdb
/freebsd-11-stable/contrib/gdtoa
/freebsd-11-stable/contrib/groff
/freebsd-11-stable/contrib/ipfilter
/freebsd-11-stable/contrib/ldns
/freebsd-11-stable/contrib/ldns-host
/freebsd-11-stable/contrib/less
/freebsd-11-stable/contrib/libarchive
/freebsd-11-stable/contrib/libarchive/cpio
/freebsd-11-stable/contrib/libarchive/libarchive
/freebsd-11-stable/contrib/libarchive/libarchive_fe
/freebsd-11-stable/contrib/libarchive/tar
/freebsd-11-stable/contrib/libc++
/freebsd-11-stable/contrib/libc-vis
/freebsd-11-stable/contrib/libcxxrt
/freebsd-11-stable/contrib/libexecinfo
/freebsd-11-stable/contrib/libpcap
/freebsd-11-stable/contrib/libstdc++
/freebsd-11-stable/contrib/libucl
/freebsd-11-stable/contrib/libxo
/freebsd-11-stable/contrib/llvm
/freebsd-11-stable/contrib/llvm/projects/libunwind
/freebsd-11-stable/contrib/llvm/tools/clang
/freebsd-11-stable/contrib/llvm/tools/lldb
/freebsd-11-stable/contrib/llvm/tools/llvm-dwarfdump
/freebsd-11-stable/contrib/llvm/tools/llvm-lto
/freebsd-11-stable/contrib/mdocml
/freebsd-11-stable/contrib/mtree
/freebsd-11-stable/contrib/ncurses
/freebsd-11-stable/contrib/netcat
/freebsd-11-stable/contrib/ntp
/freebsd-11-stable/contrib/nvi
/freebsd-11-stable/contrib/one-true-awk
/freebsd-11-stable/contrib/openbsm
/freebsd-11-stable/contrib/openpam
/freebsd-11-stable/contrib/openresolv
/freebsd-11-stable/contrib/pf
/freebsd-11-stable/contrib/sendmail
/freebsd-11-stable/contrib/serf
/freebsd-11-stable/contrib/sqlite3
/freebsd-11-stable/contrib/subversion
/freebsd-11-stable/contrib/tcpdump
/freebsd-11-stable/contrib/tcsh
/freebsd-11-stable/contrib/tnftp
/freebsd-11-stable/contrib/top
/freebsd-11-stable/contrib/top/install-sh
/freebsd-11-stable/contrib/tzcode/stdtime
/freebsd-11-stable/contrib/tzcode/zic
/freebsd-11-stable/contrib/tzdata
/freebsd-11-stable/contrib/unbound
/freebsd-11-stable/contrib/vis
/freebsd-11-stable/contrib/wpa
/freebsd-11-stable/contrib/xz
/freebsd-11-stable/crypto/heimdal
/freebsd-11-stable/crypto/openssh
/freebsd-11-stable/crypto/openssl
/freebsd-11-stable/gnu/lib
/freebsd-11-stable/gnu/usr.bin/binutils
/freebsd-11-stable/gnu/usr.bin/cc/cc_tools
/freebsd-11-stable/gnu/usr.bin/gdb
/freebsd-11-stable/lib/libc/locale/ascii.c
/freebsd-11-stable/sys/cddl/contrib/opensolaris
/freebsd-11-stable/sys/contrib/dev/acpica
/freebsd-11-stable/sys/contrib/ipfilter
/freebsd-11-stable/sys/contrib/libfdt
/freebsd-11-stable/sys/contrib/octeon-sdk
/freebsd-11-stable/sys/contrib/x86emu
/freebsd-11-stable/sys/contrib/xz-embedded
/freebsd-11-stable/usr.sbin/bhyve/atkbdc.h
/freebsd-11-stable/usr.sbin/bhyve/bhyvegc.c
/freebsd-11-stable/usr.sbin/bhyve/bhyvegc.h
/freebsd-11-stable/usr.sbin/bhyve/console.c
/freebsd-11-stable/usr.sbin/bhyve/console.h
/freebsd-11-stable/usr.sbin/bhyve/pci_fbuf.c
/freebsd-11-stable/usr.sbin/bhyve/pci_xhci.c
/freebsd-11-stable/usr.sbin/bhyve/pci_xhci.h
/freebsd-11-stable/usr.sbin/bhyve/ps2kbd.c
/freebsd-11-stable/usr.sbin/bhyve/ps2kbd.h
/freebsd-11-stable/usr.sbin/bhyve/ps2mouse.c
/freebsd-11-stable/usr.sbin/bhyve/ps2mouse.h
/freebsd-11-stable/usr.sbin/bhyve/rfb.c
/freebsd-11-stable/usr.sbin/bhyve/rfb.h
/freebsd-11-stable/usr.sbin/bhyve/sockstream.c
/freebsd-11-stable/usr.sbin/bhyve/sockstream.h
/freebsd-11-stable/usr.sbin/bhyve/usb_emul.c
/freebsd-11-stable/usr.sbin/bhyve/usb_emul.h
/freebsd-11-stable/usr.sbin/bhyve/usb_mouse.c
/freebsd-11-stable/usr.sbin/bhyve/vga.c
/freebsd-11-stable/usr.sbin/bhyve/vga.h
# 302213 26-Jun-2016 dchagin

Fix a bug introduced in r283433.

[1] Remove unneeded sockaddr conversion before kern_recvit() call as the from
argument is used to record result (the source address of the received message) only.

[2] In Linux the type of msg_namelen member of struct msghdr is signed but native
msg_namelen has a unsigned type (socklen_t). So use the proper storage to fetch fromlen
from userspace and than check the user supplied value and return EINVAL if it is less
than 0 as a Linux do.

Reported by: Thomas Mueller <tmueller at sysgo dot com> [1]
Reviewed by: kib@
Approved by: re (gjb, kib)
MFC after: 3 days


# 300431 22-May-2016 dchagin

Convert proto family in both directions. The linux and native values for
local and inet are identical, but for inet6 values differ.

PR: 155040
Reported by: Simon Walton
MFC after: 2 week


# 300416 22-May-2016 dchagin

Add a missing errno translation for SO_ERROR optname.

PR: 135458
Reported by: Stefan Schmidt @ stadtbuch.de
MFC after: 1 week


# 298310 19-Apr-2016 pfg

kernel: use our nitems() macro when it is available through param.h.

No functional change, only trivial cases are done in this sweep,

Discussed in: freebsd-current


# 297313 27-Mar-2016 dchagin

Revert r297310 as the SOL_XXX are equal to the IPPROTO_XX except SOL_SOCKET.

Pointed out by: ae@


# 297310 27-Mar-2016 dchagin

iConvert Linux SOL_IPV6 level.

MFC after: 1 week


# 297309 27-Mar-2016 dchagin

Whitespaces and style(9) fix. No functional changes.

MFC after: 1 week


# 296557 09-Mar-2016 ae

Add support for IPPROTO_IPV6 socket layer for getsockopt/setsockopt calls.
Also add mapping for several options from RFC 3493 and 3542.

Reviewed by: dchagin
Tested by: Joe Love <joe at getsomwhere dot net>
MFC after: 2 weeks


# 296504 08-Mar-2016 dchagin

Does not leak fp. While here remove bogus cast of fp->f_data.

MFC after: 1 week


# 296503 08-Mar-2016 dchagin

Linux accept() system call return EOPNOTSUPP errno instead of EINVAL
for UDP sockets.

MFC after: 1 week


# 294233 17-Jan-2016 dchagin

Prevent double free of control in common sendmsg path as sosend
already freeing it.


# 284166 08-Jun-2015 jkim

Properly initialize flags for accept4(2) not to return spurious EINVAL.
Note this fixes a Linuxulator regression introduced in r283490.

PR: 200662


# 283497 24-May-2015 dchagin

Convert SCM_TIMESTAMP in recvmsg().


# 283494 24-May-2015 dchagin

Fix an mbuf(9) leak in sendmsg() under failure condition and
remove unneeded check for failed M_WAITOK allocation.

Found by: Brainy Code Scanner
Reported by: Maxime Villard


# 283490 24-May-2015 dchagin

Since FreeBSD supports SOCK_CLOEXEC & SOCK_NONBLOCK options
remove its emulation via fcntl call from Linuxulator.


# 283488 24-May-2015 dchagin

Implement recvmmsg() and sendmmsg() system calls.


# 283437 24-May-2015 dchagin

To avoid code duplication move open/fcntl definitions to the MI
header file.

Differential Revision: https://reviews.freebsd.org/D1087
Reviewed by: trasz


# 283433 24-May-2015 dchagin

Rewrite linux_recvfrom. To avoid double conversion of sockaddr use
kern_recvit() directly.
And check fromlen parameter before sockaddr copyin and conversion.

Differential Revision: https://reviews.freebsd.org/D1082


# 283427 24-May-2015 dchagin

Where possible we will use M_LINUX malloc(9) type.
Move M_FUTEX defines to the linux_common.ko.

Differential Revision: https://reviews.freebsd.org/D1077
Reviewed by: emaste


# 283415 24-May-2015 dchagin

Disable i386 call for x86-64 Linux.

Differential Revision: https://reviews.freebsd.org/D1067
Reviewed by: trasz


# 283413 24-May-2015 dchagin

64-bit paltforms, like x86_64, do not use multiplexing on
socketcall system calls.

Differential Revision: https://reviews.freebsd.org/D1065
Reviewed by: trasz


# 276512 01-Jan-2015 dchagin

Fix Clang -Wpointer-sign warnings.

MFC after: 1 week


# 274476 13-Nov-2014 kib

Remove the no-at variants of the kern_xx() syscall helpers. E.g., we
have both kern_open() and kern_openat(); change the callers to use
kern_openat().

This removes one (sometimes two) levels of indirection and
consolidates arguments checks.

Reviewed by: mckusick
Sponsored by: The FreeBSD Foundation
MFC after: 1 week


# 263233 16-Mar-2014 rwatson

Update kernel inclusions of capability.h to use capsicum.h instead; some
further refinement is required as some device drivers intended to be
portable over FreeBSD versions rely on __FreeBSD_version to decide whether
to include capability.h.

MFC after: 3 weeks


# 257179 26-Oct-2013 glebius

Provide includes that are needed in these files, and before were read
in implicitly via if.h -> if_var.h pollution.

Sponsored by: Netflix
Sponsored by: Nginx, Inc.


# 255219 04-Sep-2013 pjd

Change the cap_rights_t type from uint64_t to a structure that we can extend
in the future in a backward compatible (API and ABI) way.

The cap_rights_t represents capability rights. We used to use one bit to
represent one right, but we are running out of spare bits. Currently the new
structure provides place for 114 rights (so 50 more than the previous
cap_rights_t), but it is possible to grow the structure to hold at least 285
rights, although we can make it even larger if 285 rights won't be enough.

The structure definition looks like this:

struct cap_rights {
uint64_t cr_rights[CAP_RIGHTS_VERSION + 2];
};

The initial CAP_RIGHTS_VERSION is 0.

The top two bits in the first element of the cr_rights[] array contain total
number of elements in the array - 2. This means if those two bits are equal to
0, we have 2 array elements.

The top two bits in all remaining array elements should be 0.
The next five bits in all array elements contain array index. Only one bit is
used and bit position in this five-bits range defines array index. This means
there can be at most five array elements in the future.

To define new right the CAPRIGHT() macro must be used. The macro takes two
arguments - an array index and a bit to set, eg.

#define CAP_PDKILL CAPRIGHT(1, 0x0000000000000800ULL)

We still support aliases that combine few rights, but the rights have to belong
to the same array element, eg:

#define CAP_LOOKUP CAPRIGHT(0, 0x0000000000000400ULL)
#define CAP_FCHMOD CAPRIGHT(0, 0x0000000000002000ULL)

#define CAP_FCHMODAT (CAP_FCHMOD | CAP_LOOKUP)

There is new API to manage the new cap_rights_t structure:

cap_rights_t *cap_rights_init(cap_rights_t *rights, ...);
void cap_rights_set(cap_rights_t *rights, ...);
void cap_rights_clear(cap_rights_t *rights, ...);
bool cap_rights_is_set(const cap_rights_t *rights, ...);

bool cap_rights_is_valid(const cap_rights_t *rights);
void cap_rights_merge(cap_rights_t *dst, const cap_rights_t *src);
void cap_rights_remove(cap_rights_t *dst, const cap_rights_t *src);
bool cap_rights_contains(const cap_rights_t *big, const cap_rights_t *little);

Capability rights to the cap_rights_init(), cap_rights_set(),
cap_rights_clear() and cap_rights_is_set() functions are provided by
separating them with commas, eg:

cap_rights_t rights;

cap_rights_init(&rights, CAP_READ, CAP_WRITE, CAP_FSTAT);

There is no need to terminate the list of rights, as those functions are
actually macros that take care of the termination, eg:

#define cap_rights_set(rights, ...) \
__cap_rights_set((rights), __VA_ARGS__, 0ULL)
void __cap_rights_set(cap_rights_t *rights, ...);

Thanks to using one bit as an array index we can assert in those functions that
there are no two rights belonging to different array elements provided
together. For example this is illegal and will be detected, because CAP_LOOKUP
belongs to element 0 and CAP_PDKILL to element 1:

cap_rights_init(&rights, CAP_LOOKUP | CAP_PDKILL);

Providing several rights that belongs to the same array's element this way is
correct, but is not advised. It should only be used for aliases definition.

This commit also breaks compatibility with some existing Capsicum system calls,
but I see no other way to do that. This should be fine as Capsicum is still
experimental and this change is not going to 9.x.

Sponsored by: The FreeBSD Foundation


# 247764 04-Mar-2013 eadler

Remove check for NULL prior to free(9) and m_freem(9).

Approved by: cperciva (mentor)


# 245849 23-Jan-2013 jhb

Don't assume that all Linux TCP-level socket options are identical to
FreeBSD TCP-level socket options (only the first two are). Instead,
using a mapping function and fail unsupported options as we do for other
socket option levels.

MFC after: 2 weeks


# 243882 05-Dec-2012 glebius

Mechanically substitute flags from historic mbuf allocator with
malloc(9) flags within sys.

Exceptions:

- sys/contrib not touched
- sys/mbuf.h edited manually


# 230132 15-Jan-2012 uqs

Convert files to UTF-8


# 226079 06-Oct-2011 jkim

Use the caculated length instead of maximum length.


# 226078 06-Oct-2011 jkim

Remove a now-defunct variable.


# 226074 06-Oct-2011 jkim

Use uint32_t instead of u_int32_t. Fix style(9) nits.


# 226073 06-Oct-2011 jkim

Make sure to ignore the leading NULL byte from Linux abstract namespace.


# 226072 06-Oct-2011 jkim

Restore the original socket address length if it was not really AF_INET6.


# 226071 06-Oct-2011 jkim

Retern more appropriate errno when Linux path name is too long.


# 226069 06-Oct-2011 jkim

Inline do_sa_get() function and remove an unused return value.


# 226068 06-Oct-2011 jkim

Unroll inlined strnlen(9) and make it easier to read. No functional change.


# 226023 04-Oct-2011 cperciva

Fix a bug in UNIX socket handling in the linux emulator which was
exposed by the security fix in FreeBSD-SA-11:05.unix.

Approved by: so (cperciva)
Approved by: re (kib)
Security: Related to FreeBSD-SA-11:05.unix, but not actually
a security fix.


# 225617 16-Sep-2011 kmacy

In order to maximize the re-usability of kernel code in user space this
patch modifies makesyscalls.sh to prefix all of the non-compatibility
calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel
entry points and all places in the code that use them. It also
fixes an additional name space collision between the kernel function
psignal and the libc function of the same name by renaming the kernel
psignal kern_psignal(). By introducing this change now we will ease future
MFCs that change syscalls.

Reviewed by: rwatson
Approved by: re (bz)


# 224778 11-Aug-2011 rwatson

Second-to-last commit implementing Capsicum capabilities in the FreeBSD
kernel for FreeBSD 9.0:

Add a new capability mask argument to fget(9) and friends, allowing system
call code to declare what capabilities are required when an integer file
descriptor is converted into an in-kernel struct file *. With options
CAPABILITIES compiled into the kernel, this enforces capability
protection; without, this change is effectively a no-op.

Some cases require special handling, such as mmap(2), which must preserve
information about the maximum rights at the time of mapping in the memory
map so that they can later be enforced in mprotect(2) -- this is done by
narrowing the rights in the existing max_protection field used for similar
purposes with file permissions.

In namei(9), we assert that the code is not reached from within capability
mode, as we're not yet ready to enforce namespace capabilities there.
This will follow in a later commit.

Update two capability names: CAP_EVENT and CAP_KEVENT become
CAP_POST_KEVENT and CAP_POLL_KEVENT to more accurately indicate what they
represent.

Approved by: re (bz)
Submitted by: jonathan
Sponsored by: Google Inc


# 220186 31-Mar-2011 avg

Revert r220032:linux compat: add SO_PASSCRED option with basic handling

I have not properly thought through the commit. After r220031 (linux
compat: improve and fix sendmsg/recvmsg compatibility) the basic
handling for SO_PASSCRED is not sufficient as it breaks recvmsg
functionality for SCM_CREDS messages because now we would need to handle
sockcred data in addition to cmsgcred. And that is not implemented yet.

Pointyhat to: avg


# 220032 26-Mar-2011 avg

linux compat: add SO_PASSCRED option with basic handling

This seems to have been a part of a bigger patch by dchagin that either
haven't been committed or committed partially.

Submitted by: dchagin, nox
MFC after: 2 weeks


# 220031 26-Mar-2011 avg

linux compat: improve and fix sendmsg/recvmsg compatibility

- implement baseic stubs for capget, capset, prctl PR_GET_KEEPCAPS
and prctl PR_SET_KEEPCAPS.
- add SCM_CREDS support to sendmsg and recvmsg
- modify sendmsg to ignore control messages if not using UNIX
domain sockets

This should allow linux pulse audio daemon and client work on FreeBSD
and interoperate with native counter-parts modulo the differences in
pulseaudio versions.

PR: kern/149168
Submitted by: John Wehle <john@feith.com>
Reviewed by: netchild
MFC after: 2 weeks


# 203728 09-Feb-2010 delphij

- Return EAFNOSUPPORT instead of EINVAL for unsupported address family,
this matches the Linux behavior.
- Check if we have sufficient space allocated for socket structure, which
fixes a buffer overflow when wrong length is being passed into the
emulation layer. [1]

PR: kern/138860
Submitted by: Mateusz Guzik <mjguzik gmail com>
Reported by: Alexander Best [1]
MFC after: 2 weeks


# 198467 25-Oct-2009 bz

Unconditionally call the setsockopt for IPV6_V6ONLY for v6 linux sockets
no matter whether we are compiled as module or if our default of the
net.inet6.ip6.v6only sysctl already matches what we would set.

This avoids unnecessary complications with modules, VIMAGES, INET6 and
the sysctl value, especially considering that most users will use
linux compat as a module.

Discussed with: kib, rwatson (weeks ago)
Reviewed by: rwatson
MFC after: 6 weeks


# 196019 01-Aug-2009 rwatson

Merge the remainder of kern_vimage.c and vimage.h into vnet.c and
vnet.h, we now use jails (rather than vimages) as the abstraction
for virtualization management, and what remained was specific to
virtual network stacks. Minor cleanups are done in the process,
and comments updated to reflect these changes.

Reviewed by: bz
Approved by: re (vimage blanket)


# 195699 14-Jul-2009 rwatson

Build on Jeff Roberson's linker-set based dynamic per-CPU allocator
(DPCPU), as suggested by Peter Wemm, and implement a new per-virtual
network stack memory allocator. Modify vnet to use the allocator
instead of monolithic global container structures (vinet, ...). This
change solves many binary compatibility problems associated with
VIMAGE, and restores ELF symbols for virtualized global variables.

Each virtualized global variable exists as a "reference copy", and also
once per virtual network stack. Virtualized global variables are
tagged at compile-time, placing the in a special linker set, which is
loaded into a contiguous region of kernel memory. Virtualized global
variables in the base kernel are linked as normal, but those in modules
are copied and relocated to a reserved portion of the kernel's vnet
region with the help of a the kernel linker.

Virtualized global variables exist in per-vnet memory set up when the
network stack instance is created, and are initialized statically from
the reference copy. Run-time access occurs via an accessor macro, which
converts from the current vnet and requested symbol to a per-vnet
address. When "options VIMAGE" is not compiled into the kernel, normal
global ELF symbols will be used instead and indirection is avoided.

This change restores static initialization for network stack global
variables, restores support for non-global symbols and types, eliminates
the need for many subsystem constructors, eliminates large per-subsystem
structures that caused many binary compatibility issues both for
monitoring applications (netstat) and kernel modules, removes the
per-function INIT_VNET_*() macros throughout the stack, eliminates the
need for vnet_symmap ksym(2) munging, and eliminates duplicate
definitions of virtualized globals under VIMAGE_GLOBALS.

Bump __FreeBSD_version and update UPDATING.

Portions submitted by: bz
Reviewed by: bz, zec
Discussed with: gnn, jamie, jeff, jhb, julian, sam
Suggested by: peter
Approved by: re (kensmith)


# 193265 01-Jun-2009 dchagin

Add forgotten in previous commit flags argument.

Approved by: kib (mentor)
MFC after: 1 month


# 193264 01-Jun-2009 dchagin

Implement accept4 syscall.

Approved by: kib (mentor)
MFC after: 1 month


# 193263 01-Jun-2009 dchagin

Implement a variation of the accept_common() which takes
a flags argument.

Do not preserve td_retval before kern_fcntl(F_SETFL) as it does not
changed.

Approved by: kib (mentor)
MFC after: 1 month


# 193262 01-Jun-2009 dchagin

Split linux_accept() syscall onto linux_accept_common() which should
be used by linuxulator and linux_accept() itself.

Approved by: kib (mentor)
MFC after: 1 month


# 193168 31-May-2009 dchagin

Implement a variation of the socketpair() syscall which takes a flags
in addition to the type argument.

Approved by: kib (mentor)
MFC after: 1 month


# 193165 31-May-2009 dchagin

Move new socket flags handling into a separate function as Linux
introduced more syscalls which uses these flags.

Approved by: kib (mentor)
MFC after: 1 month


# 193164 31-May-2009 dchagin

Remove empty lines.

Approved by: kib (mentor)
MFC after: 1 month


# 192373 19-May-2009 dchagin

Validate user-supplied arguments values.
Args argument is a pointer to the structure located in user space in
which the socketcall arguments are packed. The structure must be
copied to the kernel instead of direct dereferencing.

Approved by: kib (mentor)
MFC after: 1 week


# 192284 18-May-2009 dchagin

Implement MSG_CMSG_CLOEXEC flag for linux_recvmsg().

Approved by: kib (mentor)
MFC after: 1 month


# 192206 16-May-2009 dchagin

Somewhere between 2.6.23 and 2.6.27, Linux added SOCK_CLOEXEC and
SOCK_NONBLOCK flags, that allow to save fcntl() calls.

Implement a variation of the socket() syscall which takes a flags
in addition to the type argument.

Approved by: kib (mentor)
MFC after: 1 month


# 192205 16-May-2009 dchagin

Return EINVAL in case when the incorrect or unsupported
type argument is specified.

Do not map type argument value as its Linux values are
identical to FreeBSD values.

Approved by: kib (mentor)


# 192204 16-May-2009 dchagin

Use the protocol family constants for the domain argument validation.
Return immediately when the socket() failed.

Approved by: kib (mentor)
MFC after: 1 month


# 192203 16-May-2009 dchagin

Emulate SO_PEERCRED socket option.
Temporarily use 0 for pid member as the FreeBSD does not cache remote
UNIX domain socket peer pid.

PR: kern/102956
Reviewed by: rwatson
Approved by: kib (mentor)
MFC after: 1 month


# 191989 11-May-2009 dchagin

Translate l_timeval arg to native struct timeval in
linux_setsockopt()/linux_getsockopt() for SO_RCVTIMEO,
SO_SNDTIMEO opts as l_timeval has MD members.

Remove bogus __packed attribute from l_timeval struct on __amd64__.

PR: kern/134276
Submitted by: Thomas Mueller <tmueller sysgo com>
Approved by: kib (mentor)
MFC after: 2 weeks


# 191988 11-May-2009 dchagin

Add forgotten linux to bsd flags argument mapping into the linux_recv().

PR: kern/134276
Submitted by: Thomas Mueller <tmueller sysgo com>
Approved by: kib (mentor)
MFC after: 2 weeks


# 191875 07-May-2009 dchagin

Return EAFNOSUPPORT instead of EINVAL in case when the incorrect or
unsupported domain argument is specified.

Approved by: kib (mentor)


# 191871 07-May-2009 dchagin

Rework r191742.
Use the protocol family constants for the domain argument validation.

Return EAFNOSUPPORT in case when the incorrect domain argument
is specified.

Return EPROTONOSUPPORT instead of passing values that are not 0
to the BSD layer.

Suggested by: rwatson

Approved by: kib (mentor)
MFC after: 1 month


# 191742 02-May-2009 dchagin

Linux socketpair() call expects explicit specified protocol for
AF_LOCAL domain unlike FreeBSD which expects 0 in this case.

Approved by: kib (mentor)
MFC after: 1 month


# 191548 26-Apr-2009 zec

In preparation for turning on options VIMAGE in next commits,
rearrange / replace / adjust several INIT_VNET_* initializer
macros, all of which currently resolve to whitespace.

Reviewed by: bz (an older version of the patch)
Approved by: julian (mentor)


# 185571 02-Dec-2008 bz

Rather than using hidden includes (with cicular dependencies),
directly include only the header files needed. This reduces the
unneeded spamming of various headers into lots of files.

For now, this leaves us with very few modules including vnet.h
and thus needing to depend on opt_route.h.

Reviewed by: brooks, gnn, des, zec, imp
Sponsored by: The FreeBSD Foundation


# 185442 29-Nov-2008 kib

Make linux_sendmsg() and linux_recvmsg() work on linux32/amd64.
Change types used in the linux' struct msghdr and struct cmsghdr
definitions to the properly-sized architecture-specific types.
Move ancillary data handler from linux_sendit() to linux_sendmsg().

Submitted by: dchagin


# 184205 23-Oct-2008 des

Retire the MALLOC and FREE macros. They are an abomination unto style(9).

MFC after: 3 months


# 183550 02-Oct-2008 zec

Step 1.5 of importing the network stack virtualization infrastructure
from the vimage project, as per plan established at devsummit 08/08:
http://wiki.freebsd.org/Image/Notes200808DevSummit

Introduce INIT_VNET_*() initializer macros, VNET_FOREACH() iterator
macros, and CURVNET_SET() context setting macros, all currently
resolving to NOPs.

Prepare for virtualization of selected SYSCTL objects by introducing a
family of SYSCTL_V_*() macros, currently resolving to their global
counterparts, i.e. SYSCTL_V_INT() == SYSCTL_INT().

Move selected #defines from sys/sys/vimage.h to newly introduced header
files specific to virtualized subsystems (sys/net/vnet.h,
sys/netinet/vinet.h etc.).

All the changes are verified to have zero functional impact at this
point in time by doing MD5 comparision between pre- and post-change
object files(*).

(*) netipsec/keysock.c did not validate depending on compile time options.

Implemented by: julian, bz, brooks, zec
Reviewed by: julian, bz, brooks, kris, rwatson, ...
Approved by: julian (mentor)
Obtained from: //depot/projects/vimage-commit2/...
X-MFC after: never
Sponsored by: NLnet Foundation, The FreeBSD Foundation


# 182890 09-Sep-2008 kib

Remove superfluous copyin() of args, structures are already in kernel space.

Submitted by: dchagin
MFC after: 1 week


# 181803 17-Aug-2008 bz

Commit step 1 of the vimage project, (network stack)
virtualization work done by Marko Zec (zec@).

This is the first in a series of commits over the course
of the next few weeks.

Mark all uses of global variables to be virtualized
with a V_ prefix.
Use macros to map them back to their global names for
now, so this is a NOP change only.

We hope to have caught at least 85-90% of what is needed
so we do not invalidate a lot of outstanding patches again.

Obtained from: //depot/projects/vimage-commit2/...
Reviewed by: brooks, des, ed, mav, julian,
jamie, kris, rwatson, zec, ...
(various people I forgot, different versions)
md5 (with a bit of help)
Sponsored by: NLnet Foundation, The FreeBSD Foundation
X-MFC after: never
V_Commit_Message_Reviewed_By: more people than the patch


# 171744 06-Aug-2007 rwatson

Remove the now-unused NET_{LOCK,UNLOCK,ASSERT}_GIANT() macros, which
previously conditionally acquired Giant based on debug.mpsafenet. As that
has now been removed, they are no longer required. Removing them
significantly simplifies error-handling in the socket layer, eliminated
quite a bit of unwinding of locking in error cases.

While here clean up the now unneeded opt_net.h, which previously was used
for the NET_WITH_GIANT kernel option. Clean up some related gotos for
consistency.

Reviewed by: bz, csjp
Tested by: kris
Approved by: re (kensmith)


# 168711 14-Apr-2007 rwatson

Some Linux applications (ping) pass a non-NULL msg_control argument to
sendmsg() while using a 0-length msg_controllen. This isn't allowed in
the FreeBSD system call ABI, so detect this case and set msg_control to
NULL. This allows Linux ping to work.

Submitted by: rdivacky


# 166398 01-Feb-2007 kib

Introduce some more SO_ option equivalents from Linux to FreeBSD.

The msg variable in linux_recvmsg() was not initialized.
Copy it from userspace.

Submitted by: rdivacky


# 162585 23-Sep-2006 netchild

MFp4:
- Linux returns ENOPROTOOPT in a case of not supported opt to setsockopt.
- Return EISDIR in pread() when arg is a directory.
- Return EINVAL instead of EFAULT when namelen is not correct in accept().
- Return EINVAL instead of EACCESS if invalid access mode is entered in
access().
- Return EINVAL instead of EADDRNOTAVAIL in a case of bad salen param
to bind().

Submitted by: rdivacky
Tested with: LTP (vfork01 fails now, but it seems to be a race and
not caused by those changes)
MFC after: 1 week


# 160506 19-Jul-2006 jhb

Don't free the sockaddr in kern_bind() and kern_connect() as not all
callers pass a sockaddr allocated via malloc() from M_SONAME anymore.
Instead, free it in the callers when necessary.


# 160190 08-Jul-2006 jhb

Add a kern_close() so that the ABIs can close a file descriptor w/o having
to populate a close_args struct and change some of the places that do.


# 158415 10-May-2006 netchild

Now that we don't have a linuxolator on alpha anymore:
- unifdef __alpha__
- revert rev. 1.66 of linux_socket.c


# 157369 01-Apr-2006 rwatson

Annotate uses of fgetsock() with indications that they should rely
on their existing file descriptor references to sockets, rather than
use fgetsock() to retrieve a direct socket reference.

MFC after: 3 months


# 156976 21-Mar-2006 netchild

Fix the LINT build on alpha:
- rename some file local structure definitions, the names clash with
autogenerated names
- on !alpha add some compatibility defines for those renamed structures
- make some functions globally visible on alpha


# 156874 19-Mar-2006 ru

Unbreak COMPAT_LINUX32 option support on amd64.

Broken by: netchild


# 156850 18-Mar-2006 netchild

Fixup some problems in my previous commit (COMPAT_43).

Pointyhat to: netchild


# 156842 18-Mar-2006 netchild

Get rid of the need of COMPAT_43 in the linuxolator.

Submitted by: Divacky Roman <xdivac02@stud.fit.vutbr.cz>
Obtained from: DragonFly (some parts)


# 153744 26-Dec-2005 glebius

Add \n to log() message.

Submitted by: Stanislaw Halik <weirdo tehran.lain.pl>


# 150663 28-Sep-2005 rwatson

Back out alpha/alpha/trap.c:1.124, osf1_ioctl.c:1.14, osf1_misc.c:1.57,
osf1_signal.c:1.41, amd64/amd64/trap.c:1.291, linux_socket.c:1.60,
svr4_fcntl.c:1.36, svr4_ioctl.c:1.23, svr4_ipc.c:1.18, svr4_misc.c:1.81,
svr4_signal.c:1.34, svr4_stat.c:1.21, svr4_stream.c:1.55,
svr4_termios.c:1.13, svr4_ttold.c:1.15, svr4_util.h:1.10,
ext2_alloc.c:1.43, i386/i386/trap.c:1.279, vm86.c:1.58,
unaligned.c:1.12, imgact_elf.c:1.164, ffs_alloc.c:1.133:

Now that Giant is acquired in uprintf() and tprintf(), the caller no
longer leads to acquire Giant unless it also holds another mutex that
would generate a lock order reversal when calling into these functions.
Specifically not backed out is the acquisition of Giant in nfs_socket.c
and rpcclnt.c, where local mutexes are held and would otherwise violate
the lock order with Giant.

This aligns this code more with the eventual locking of ttys.

Suggested by: bde


# 150335 19-Sep-2005 rwatson

Add GIANT_REQUIRED and WITNESS sleep warnings to uprintf() and tprintf(),
as they both interact with the tty code (!MPSAFE) and may sleep if the
tty buffer is full (per comment).

Modify all consumers of uprintf() and tprintf() to hold Giant around
calls into these functions. In most cases, this means adding an
acquisition of Giant immediately around the function. In some cases
(nfs_timer()), it means acquiring Giant higher up in the callout.

With these changes, UFS no longer panics on SMP when either blocks are
exhausted or inodes are exhausted under load due to races in the tty
code when running without Giant.

NB: Some reduction in calls to uprintf() in the svr4 code is probably
desirable.

NB: In the case of nfs_timer(), calling uprintf() while holding a mutex,
or even in a callout at all, is a bad idea, and will generate warnings
and potential upset. This needs to be fixed, but was a problem before
this change.

NB: uprintf()/tprintf() sleeping is generally a bad ideas, as is having
non-MPSAFE tty code.

MFC after: 1 week


# 147853 09-Jul-2005 jhb

Add missing locking to linux_connect() so that it can be marked MP safe:
- Conditionally grab Giant around the EISCONN hack at the end based on
debug.mpsafenet.
- Protect access to so_emuldata via SOCK_LOCK.

Reviewed by: rwatson
Approved by: re (scottl)


# 144012 23-Mar-2005 das

Reject packets larger than IP_MAXPACKET in linux_sendto() for sockets
with the IP_HDRINCL option set. Without this change, a Linux process
with access to a raw socket could cause a kernel panic. Raw sockets
must be created by root, and are generally not consigned to untrusted
applications; hence, the security implications of this bug are
minimal. I believe this only affects 6-CURRENT on or after 2005-01-30.

Found by: Coverity Prevent analysis tool
Security: Local DOS


# 143295 08-Mar-2005 sobomax

Add kernel-only flag MSG_NOSIGNAL to be used in emulation layers to surpress
SIGPIPE signal for the duration of the sento-family syscalls. Use it to
replace previously added hack in Linux layer based on temporarily setting
SO_NOSIGPIPE flag.

Suggested by: alfred


# 143233 07-Mar-2005 sobomax

Handle MSG_NOSIGNAL flag in linux_send() by setting SO_NOSIGPIPE on socket
for the duration of the send() call. Such approach may be less than ideal
in threading environment, when several threads share the same socket and it
might happen that several of them are calling linux_send() at the same time
with and without SO_NOSIGPIPE set.

However, such race condition is very unlikely in practice, therefore this
change provides practical improvement compared to the previous behaviour.

PR: kern/76426
Submitted by: Steven Hartland <killing@multiplay.co.uk>
MFC after: 3 days


# 141029 30-Jan-2005 sobomax

Extend kern_sendit() to take another enum uio_seg argument, which specifies
where the buffer to send lies and use it to eliminate yet another stackgap
in linuxlator.

MFC after: 2 weeks


# 140214 14-Jan-2005 obrien

Match the LINUX32's style with existing style
Submitted by: Jung-uk Kim <jkim@niksun.com>

Use positive, not negative logic.


# 134266 24-Aug-2004 jhb

Fix the ABI wrappers to use kern_fcntl() rather than calling fcntl()
directly. This removes a few more users of the stackgap and also marks
the syscalls using these wrappers MP safe where appropriate.

Tested on: i386 with linux acroread5
Compiled on: i386, alpha LINT


# 134209 23-Aug-2004 des

Don't try to translate the control message unless we're certain it's
valid; otherwise a caller could trick us into changing any 32-bit word
in kernel memory to LINUX_SOL_SOCKET (0x00000001) if its previous value
is SOL_SOCKET (0x0000ffff).

MFC after: 3 days


# 133816 16-Aug-2004 tjr

Changes to MI Linux emulation code necessary to run 32-bit Linux binaries
on AMD64, and the general case where the emulated platform has different
size pointers than we use natively:
- declare certain structure members as l_uintptr_t and use the new PTRIN
and PTROUT macros to convert to and from native pointers.
- declare some structures __packed on amd64 when the layout would differ
from that used on i386.
- include <machine/../linux32/linux.h> instead of <machine/../linux/linux.h>
if compiling with COMPAT_LINUX32. This will need to be revisited before
32-bit and 64-bit Linux emulation support can coexist in the same kernel.
- other small scattered changes.

This should be a no-op on i386 and Alpha.


# 132347 18-Jul-2004 dwmalone

I missed two pieces of the commit to this file. Robert has already
added one, this adds the other.


# 132331 18-Jul-2004 rwatson

Remove 'sg' argument to linux_sendto_hdrincl, which is what I think was
intended. This fixes the build, but might require revision.


# 132313 17-Jul-2004 dwmalone

Add a kern_setsockopt and kern_getsockopt which can read the option
values from either user land or from the kernel. Use them for
[gs]etsockopt and to clean up some calls to [gs]etsockopt in the
Linux emulation code that uses the stackgap.


# 131897 10-Jul-2004 phk

Clean up and wash struct iovec and struct uio handling.

Add copyiniov() which copies a struct iovec array in from userland into
a malloc'ed struct iovec. Caller frees.

Change uiofromiov() to malloc the uio (caller frees) and name it
copyinuio() which is more appropriate.

Add cloneuio() which returns a malloc'ed copy. Caller frees.

Use them throughout.


# 131796 08-Jul-2004 phk

Use a couple of regular kernel entry points, rather than COMPAT_43
entry points.


# 123828 25-Dec-2003 bde

Quick fix for LINT breakage caused by interface changes in accept(2), etc.
The log message for rev.1.160 of kern/uipc_syscalls.c and associated
changes only claimed to add restrict qualifiers (which have no effect in
the kernel so they probably shouldn't be added), but the following
interface changes were also made:
- caddr_t to `void *' and `struct sockaddr_t *'
- `int *' to `socklen_t *'.
These interface changes are not quite null, and this fix is quick (like
the changes in uipc_syscalls 1.160) because it uses bogus casts instead
of complete bounds-checked conversions.

Things should be fixed better when the conversions can be done without
using the stack gap. linux_check_hdrincl() already uses the stack gap
and is fixed completely though the type mismatches in it were not fatal
(there were only fatal type mismatches from unopaquing pointers to
[o]sockaddr't's -- the difference between accept()'s args and oaccept()'s
args is now non-opaque, but this is not reflected in their args structs).


# 122358 09-Nov-2003 dwmalone

Use kern_sendit rather than sendit for the Linux send* syscalls.
This means we can avoid using the stack gap for most send* syscalls
now (it is still used in the IP_HDRINCL case).


# 121008 11-Oct-2003 iwasaki

Fix some problems in linux_sendmsg() and linux_recvmsg().
- Allocate storage for uap->msg always because it is copyin()'ed in
native sendmsg().
- Convert sockopt level from Linux to FreeBSD after native recvmsg() calling.
- Some cleanups.

Tested with: Oracle 9i shared server connection mode.

MFC after: 1 week


# 116173 10-Jun-2003 obrien

Use __FBSDID().


# 114216 29-Apr-2003 kan

Deprecate machine/limits.h in favor of new sys/limits.h.
Change all in-tree consumers to include <sys/limits.h>

Discussed on: standards@
Partially submitted by: Craig Rodrigues <rodrigc@attbi.com>


# 111798 03-Mar-2003 des

Clean up whitespace and remove register keyword.


# 111797 03-Mar-2003 des

More caddr_t removal, in conjunction with copy{in,out}(9) this time.
Also clean up some egregious casts and incorrect use of sizeof.


# 111173 20-Feb-2003 ume

Add M_WAITOK


# 110538 08-Feb-2003 dwmalone

1) Linux_sendto was trashing the BSD sockaddr it put in the stackgap,
so be more careful about calling stackgap_init.

Tested by: Fred Souza <fred@storming.org>

2) Linux_sendmsg was forgetting to fill out the bsd_args struct.

Reviewed by: ume

3) The args to linux_connect have differently named types on alpha and
i386, so add a cast to stop gcc complaining.

Spotted by: peter


# 110376 05-Feb-2003 ume

Avoid undefined symbol error with an IPv4 only kernel.

Reported by: "Sergey A. Osokin" <osa@freebsd.org.ru>


# 110295 03-Feb-2003 ume

Add IPv6 support for Linuxlator.

Reviewed by: dwmalone
MFC after: 10 days


# 103886 24-Sep-2002 mini

Back out last commit. Linux uses the old 4.3BSD sockaddr format.


# 103839 23-Sep-2002 mini

Don't use compatability syscall wrappers in emulation code.
This is needed for the COMPAT_FREEBSD3 option split.

Reviewed by: alfred, jake


# 97748 02-Jun-2002 schweikh

Fix typo in the BSD copyright: s/withough/without/

Spotted and suggested by: des
MFC after: 3 weeks


# 86504 17-Nov-2001 dillon

Fix missing holdsock()->fgetsock()

Submitted by: Hisashi Hiramoto <hiramoto@phys.chs.nihon-u.ac.jp>


# 85569 26-Oct-2001 fenner

Force the length of the sockaddr to be correct for AF_INET and AF_INET6
in bind() and connect(). Linux doesn't care if the length of the
sockaddr matches its address family; FreeBSD does. This fixes the
known issues with the resolver in linux_base-7.


# 83366 12-Sep-2001 julian

KSE Milestone 2
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.

Sorry john! (your next MFC will be a doosie!)

Reviewed by: peter@freebsd.org, dillon@freebsd.org

X-MFC after: ha ha ha ha


# 83221 08-Sep-2001 marcel

Round of cleanups and enhancements. These include (in random order):

o Introduce private types for use in linux syscalls for two reasons:
1. establish type independence for ease in porting and,
2. provide a visual queue as to which syscalls have proper
prototypes to further cleanup the i386/alpha split.
Linuxulator types are prefixed by 'l_'. void and char have not
been "virtualized".

o Provide dummy functions for all syscalls and remove dummy functions
or implementations of truely obsolete syscalls.

o Sanitize the shm*, sem* and msg* syscalls.

o Make a first attempt to implement the linux_sysctl syscall. At this
time it only returns one MIB (KERN_VERSION), but most importantly,
it tells us when we need to add additional sysctls :-)

o Bump the kenel version up to 2.4.2 (this is not the same as the
KERN_VERSION MIB, BTW).

o Implement new syscalls, of which most are specific to i386. Our
syscall table is now up to date with Linux 2.4.2. Some highlights:
- Implement the 32-bit uid_t and gid_t bases syscalls.
- Implement a couple of 64-bit file size/offset bases syscalls.

o Fix or improve numerous syscalls and prototypes.

o Reduce style(9) violations while I'm here. Especially indentation
inconsistencies within the same file are addressed. Re-indenting
did not obfuscate actual changes to the extend that it could not
be combined.

NOTE: I spend some time testing these changes and found that if there
were regressions, they were not caused by these changes AFAICT.
It was observed that installing a RH 7.1 runtime environment
did make matters worse. Hangs and/or reboots have been observed
with and without these changes, so when it failed to make life
better in cases it doesn't look like it made it worse.


# 73353 02-Mar-2001 jlemon

Only pick up so_error the first time through with EISCONN, as advertised.
The sense of the test was reversed, so we were returning EISCONN, then 0.

Pointed out and tested by: Martin Blapp <mb@imp.ch>


# 73288 01-Mar-2001 jlemon

Correctly emulate linux_connect. For nonblocking sockets, the behavior
is to return EINPROGRESS, EALREADY, (so_error ONCE), EISCONN. Certain
linux applications rely on the so_error (normally 0) being returned in
order to operate properly.

Tested by: Thomas Moestl <tmoestl@gmx.net>


# 70178 18-Dec-2000 assar

translate the flags in recvfrom and recvmsg from linux to bsd ones

Approved by: marcel


# 69539 02-Dec-2000 marcel

Don't auto-generate the syscalls.


# 68803 15-Nov-2000 gallatin

Use the linux_connect() on alpha rather than passing directly through
to our native connect(). This is required to deal with the differences
in the way linux handles connects on non-blocking sockets.

This gets the private beta of the Compaq Linux/alpha JDK working
on FreeBSD/alpha

Approved by: marcel


# 68583 10-Nov-2000 marcel

Revert auto-generation. The Alpha port is broken.
Syncing with it is wrong.


# 68519 09-Nov-2000 marcel

Sync with Alpha:
Do not use sysent.c, proto.h and syscall.h in source tree;
use auto-generated versions.


# 68201 01-Nov-2000 obrien

The MI/MD split wasn't perfect and the MI files need hacks for the
AlphaLinux compat bits. This will be better cleaned up soon.

Agreed to what ever was necessary by: marcel


# 65108 26-Aug-2000 marcel

Whitespace change: (near) KNF


# 64913 21-Aug-2000 marcel

Update include directives.


# 57564 28-Feb-2000 marcel

Fix accept(2) behavior in that accepted sockets don't inherit the
parents flags.

Note on the PR:
The PR contains another patch that's not being committed without
further background information. The PR stays open for now.

PR: 16946 (Victor A. Salaman <salaman@teknos.com>)
Prompted by: msmith
Indirect/implicit approval: jkh (shoot me if I'm wrong :-)


# 50477 27-Aug-1999 peter

$Id$ -> $FreeBSD$


# 42509 11-Jan-1999 msmith

Fix linux sendmsg() emulation

Submitted by: Brian Feldman <green@unixhelp.org>


# 42186 30-Dec-1998 sos

Commit patch in

PR: 9232
Submitted by: marcel@scc.nl <Marcel Moolenaar>


# 34924 28-Mar-1998 bde

Moved some #includes from <sys/param.h> nearer to where they are actually
used.


# 33148 07-Feb-1998 msmith

In the words of the submitter:

----
I've worked to enhance the connect() patches.

I've just tested this with the Linux JDK appletviewer on an applet
that does a lot of connects, and it works as well as during my
previous tests.

The connect() patch is now a merge between my older patch and the
OpenBSD stuff. It ensures that any async error is returned by
connect() instead of getsockopt(SOL_SOCKET, SO_ERROR) as reasonnable
systems do.

There are also minor patches to implement IPPROTO_TCP for
get/setsocktopt(). These are also tested (with Linux Apache).
----

I would appreciate any feedback regarding these changes, as they'd
be very useful in 2.2.6.

Submitted by: pb@fasterix.freenix.org (Pierre Beyssac)


# 31778 16-Dec-1997 eivind

Make COMPAT_43 and COMPAT_SUNOS new-style options.


# 31711 14-Dec-1997 msmith

As described by the submitter:

- emulate Linux IP_HDRINCL behaviour in sendto(): byte order fixed
Note that we do an extra getsockopt() on every sendto()
to check if the option is set because we don't keep state
in the emulator code. Is there a better way to implement
this?
- correct a bug (value of "name" not passed) with
getsockopt()

Submitted by: pb@fasterix.freenix.org (Pierre Beyssac)


# 30994 06-Nov-1997 phk

Move the "retval" (3rd) parameter from all syscall functions and put
it in struct proc instead.

This fixes a boatload of compiler warning, and removes a lot of cruft
from the sources.

I have not removed the /*ARGSUSED*/, they will require some looking at.

libkvm, ps and other userland struct proc frobbing programs will need
recompiled.


# 27557 20-Jul-1997 bde

Removed unused #includes.


# 22975 22-Feb-1997 peter

Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not
ready for it yet.


# 21673 14-Jan-1997 jkh

Make the long-awaited change from $Id$ to $FreeBSD$

This will make a number of things easier in the future, as well as (finally!)
avoiding the Id-smashing problem which has plagued developers for so long.

Boy, I'm glad we're not using sup anymore. This update would have been
insane otherwise.


# 20101 03-Dec-1996 fenner

Add IP_OPTIONS and the multicast-related setsockopts to the
list of IP setsockopts the Linux emulator recognizes.

Explicitly disallow IP_HDRINCL since Linux's handling of
raw output is different than BSD's.

Closes PR#kern/2111.

Submitted by: y-nakaga@ccs.mt.nec.co.jp (Yoshihisa NAKAGAWA)


# 14331 02-Mar-1996 peter

Mega-commit for Linux emulator update.. This has been stress tested under
netscape-2.0 for Linux running all the Java stuff. The scrollbars are now
working, at least on my machine. (whew! :-)

I'm uncomfortable with the size of this commit, but it's too
inter-dependant to easily seperate out.

The main changes:

COMPAT_LINUX is *GONE*. Most of the code has been moved out of the i386
machine dependent section into the linux emulator itself. The int 0x80
syscall code was almost identical to the lcall 7,0 code and a minor tweak
allows them to both be used with the same C code. All kernels can now
just modload the lkm and it'll DTRT without having to rebuild the kernel
first. Like IBCS2, you can statically compile it in with "options LINUX".

A pile of new syscalls implemented, including getdents(), llseek(),
readv(), writev(), msync(), personality(). The Linux-ELF libraries want
to use some of these.

linux_select() now obeys Linux semantics, ie: returns the time remaining
of the timeout value rather than leaving it the original value.

Quite a few bugs removed, including incorrect arguments being used in
syscalls.. eg: mixups between passing the sigset as an int, vs passing
it as a pointer and doing a copyin(), missing return values, unhandled
cases, SIOC* ioctls, etc.

The build for the code has changed. i386/conf/files now knows how
to build linux_genassym and generate linux_assym.h on the fly.

Supporting changes elsewhere in the kernel:

The user-mode signal trampoline has moved from the U area to immediately
below the top of the stack (below PS_STRINGS). This allows the different
binary emulations to have their own signal trampoline code (which gets rid
of the hardwired syscall 103 (sigreturn on BSD, syslog on Linux)) and so
that the emulator can provide the exact "struct sigcontext *" argument to
the program's signal handlers.

The sigstack's "ss_flags" now uses SS_DISABLE and SS_ONSTACK flags, which
have the same values as the re-used SA_DISABLE and SA_ONSTACK which are
intended for sigaction only. This enables the support of a SA_RESETHAND
flag to sigaction to implement the gross SYSV and Linux SA_ONESHOT signal
semantics where the signal handler is reset when it's triggered.

makesyscalls.sh no longer appends the struct sysentvec on the end of the
generated init_sysent.c code. It's a lot saner to have it in a seperate
file rather than trying to update the structure inside the awk script. :-)

At exec time, the dozen bytes or so of signal trampoline code are copied
to the top of the user's stack, rather than obtaining the trampoline code
the old way by getting a clone of the parent's user area. This allows
Linux and native binaries to freely exec each other without getting
trampolines mixed up.


# 12858 15-Dec-1995 peter

Clean up some warnings by using the generated structures in <sys/sysproto.h>
for passing to the bsd system calls, rather than inveninting our own
equivalent structures.


# 12458 22-Nov-1995 bde

Completed function declarations and added prototypes.

Removed some unnecessary #includes.

Fixed warnings about nested externs.


# 9313 25-Jun-1995 sos

First incarnation of our Linux emulator or rather compatibility code.
This first shot only incorporaties so much functionality that DOOM
can run (the X version), signal handling is VERY weak, so is many
other things. But it meets my milestone number one (you guessed it
- running DOOM).

Uses /compat/linux as prefix for loading shared libs, so it won't
conflict with our own libs.

Kernel must be compiled with "options COMPAT_LINUX" for this to work.