History log of /netbsd-current/sys/kern/uipc_syscalls.c
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
# 1.211 03-Feb-2024 jdolecek

fix PIPE_SOCKETPAIR variant of pipe1() to apply correctly the 'flags'
passed when called via pipe2(2), fixing repeatable process hang during
compilation with 'gcc -pipe'

refactor fsocreate() to return the new socket and file pointers,
expect the caller to call fd_affix() once initialization is fully complete

use the new fsocreate() to replace the duplicate open-coded 'flags' handling
in makesocket() used for socketpair(2), and in the PIPE_SOCKETPAIR pipe1()

this also fixes lib/libc/sys/t_pipe2 pipe2_cloexec test to succeed
on PIPE_SOCKETPAIR kernel

fixes PR kern/55690


Revision tags: thorpej-ifq-base thorpej-altq-separation-base
# 1.210 02-Nov-2023 martin

Back out the following revisions on behalf of core:

sys/sys/lwp.h: revision 1.228
sys/sys/pipe.h: revision 1.40
sys/kern/uipc_socket.c: revision 1.306
sys/kern/kern_sleepq.c: revision 1.84
sys/rump/librump/rumpkern/locks_up.c: revision 1.13
sys/kern/sys_pipe.c: revision 1.165
usr.bin/fstat/fstat.c: revision 1.119
sys/rump/librump/rumpkern/locks.c: revision 1.87
sys/ddb/db_xxx.c: revision 1.78
sys/ddb/db_command.c: revision 1.187
sys/sys/condvar.h: revision 1.18
sys/ddb/db_interface.h: revision 1.42
sys/sys/socketvar.h: revision 1.166
sys/kern/uipc_syscalls.c: revision 1.209
sys/kern/kern_condvar.c: revision 1.60

Add cv_fdrestart() [...]
Use cv_fdrestart() to implement fo_restart.
Simplify/streamline pipes a little bit [...]

This changes have caused regressions and need to be debugged.
The cv_fdrestart() addition needs more discussion.


# 1.209 13-Oct-2023 ad

Use cv_fdrestart() to implement fo_restart.


# 1.208 04-Oct-2023 ad

kauth_cred_hold(): return cred verbatim so that donating a reference to
another data structure can be done more elegantly.


# 1.207 09-Sep-2023 ad

Fix a ~16 year old perf regression: when accepting a connection, add a
reference to the caller's credentials rather than copying them.


Revision tags: netbsd-10-0-RC3 netbsd-10-0-RC2 netbsd-10-0-RC1 netbsd-10-base bouyer-sunxi-drm-base
# 1.206 01-Jul-2022 riastradh

branches: 1.206.4;
sendto(2), recvfrom(2): Scrub internal struct msghdr on stack.

Otherwise this is kernel stack disclosure via ktrace.

Reported-by: syzbot+1d40303b310063778194@syzkaller.appspotmail.com


# 1.205 29-Jun-2022 riastradh

recvmmsg(2): More timespec validation.

Reported-by: syzbot+004ed2f264534bd27312@syzkaller.appspotmail.com
Reported-by: syzbot+6f9014c842c4e78df7bc@syzkaller.appspotmail.com


# 1.204 28-Jun-2022 riastradh

recvmmsg(2): Avoid arithmetic overflow in timeout calculations.

XXX This is not right -- it doesn't actually do anything to time
out...

Reported-by: syzbot+784209d76a94fcc6417b@syzkaller.appspotmail.com


# 1.203 27-Jun-2022 riastradh

sendmsg(2): Avoid buffer overrun in ktrace of invalid cmsghdr.

Reported-by: syzbot+efded148140b23425f5c@syzkaller.appspotmail.com


# 1.202 02-Oct-2021 thorpej

...and correct my terrible spelling.


# 1.201 02-Oct-2021 thorpej

- Strenghen the poll(2) fifo_inout test to ensure that once the reader
has read enough that exactly PIPE_BUF space is available that the FIFO
becomes writable again.
- When creating a FIFO, ensure that the receive low water mark is 1
(a FIFO must be readable when at least 1 byte is available); this
was already the case implicitly, but this makes it explicit.
- Similarly, set the send low water mark to PIPE_BUF to ensure that
the pipe is writable when at least PIPE_BUF bytes of space are available
in the send buffer. Without this change, the strengthened test case
above does not pass (the default send low water mark is larger than
PIPE_BUF; see soreserve()).
- Make the same low water mark changes to the PIPE_SOCKETPAIR case.


Revision tags: thorpej-i2c-spi-conf2-base thorpej-futex2-base thorpej-cfargs2-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base thorpej-i2c-spi-conf-base thorpej-cfargs-base thorpej-futex-base
# 1.200 23-May-2020 ad

Move proc_lock into the data segment. It was dynamically allocated because
at the time we had mutex_obj_alloc() but not __cacheline_aligned.


Revision tags: netbsd-9-2-RELEASE netbsd-9-1-RELEASE bouyer-xenpvh-base2 phil-wifi-20200421 bouyer-xenpvh-base1 phil-wifi-20200411 bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3 netbsd-9-0-RELEASE netbsd-9-0-RC2 ad-namecache-base2 ad-namecache-base1 ad-namecache-base netbsd-9-0-RC1 phil-wifi-20191119 netbsd-9-base phil-wifi-20190609 isaki-audio2-base pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126
# 1.199 12-Nov-2018 hannken

branches: 1.199.4;
sys_recvmmsg: don't defer an error that already gets returned.


# 1.198 07-Nov-2018 hannken

Don't defer errors from sendmmsg(). This matches the linux manpage.

Defer errors from recvmmsg() through so_rerror and
tests and return a deferred error on entry.

Ok: christos@


Revision tags: pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.197 03-Sep-2018 riastradh

Rename min/max -> uimin/uimax for better honesty.

These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER! Some subsystems have

#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)


# 1.196 01-Aug-2018 rjs

Add ioctl(2) handler for kernel part of sctp_peeloff().


# 1.195 31-Jul-2018 rjs

Add getsockopt2() syscall.


Revision tags: pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521
# 1.194 04-May-2018 christos

branches: 1.194.2;
define MBUFTYPES here.


# 1.193 03-May-2018 christos

Fix COMPAT_NETBSD32 cmsg handling:

1. alignment was wrong for > 1 message
2. macros were doing incorrect pointer comparisons, fortunately ending
the iteration early after the fists cmsg instead of crashing.
3. don't output 32 bit ktrace records for cmsg. 32 bit programs running
under emulation on 64 bit systems should produce 64 bit ktrace records
so that the native ktrace can handle the records; remove extra arguments
that are now not needed (the 32 bit msghdr).
4. output the correct type for cmsg trace records.
5. output all the cmsg records in traces instead of just the first one.

Welcome to 8.99.15 because of the argument removal.

XXX: Really all the code should be changed to use the CMSG_{FIRST,NXT}HDR
macros...


Revision tags: pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322
# 1.192 16-Mar-2018 christos

PR/53103: Timo Buhrmester: linux emulation of sendto(2) broken

The sockargs refactoring broke it, because sockargs only works with a user
address. Added an argument to sockargs to indicate where the address is
coming from. Welcome to 8.99.14.


Revision tags: pgoyette-compat-0315 pgoyette-compat-base
# 1.191 12-Feb-2018 maxv

branches: 1.191.2;
Add a KASSERT; we expect *from to be a single mbuf (not chained).


# 1.190 04-Jan-2018 christos

Add a new sockopt member to keep track of the actual size of the option
that should be returned to the caller in getsockopt(2).

(Tom Ivar Helbekkmo)


# 1.189 31-Dec-2017 christos

pass valsize for getsockopt like we do for setsockopt


# 1.188 26-Dec-2017 kamil

Refactor pipe1() and correct a bug in sys_pipe2() (SYS_pipe2)

sys_pipe2() returns two integers (values), the 2nd one is a copy of the 2nd
file descriptor that lands in fildes[2]. This is a side effect of reusing
the code for sys_pipe() (SYS_pipe) and not cleaning it up.

The first returned value is (on success) 0.

Introduced a small refactoring in pipe1() that it does not operate over
retval[], but on an array int[2]. A user sets retval[] for pipe() when
desired and needed.

This refactoring touches compat code: netbsd32, linux, linux32.

Before the changes on NetBSD/amd64:

$ ktruss -i ./a.out
[...]
15131 1 a.out pipe2(0x7f7fff2e62b8, 0) = 0, 4
[...]

After the changes:

$ ktruss -i ./a.out
[...]
782 1 a.out pipe2(0x7f7fff97e850, 0) = 0
[...]

There should not be a visible change for current users.

Sponsored by <The NetBSD Foundation>


Revision tags: tls-maxphys-base-20171202 nick-nhusb-base-20170825 perseant-stdc-iso10646-base
# 1.187 20-Jun-2017 christos

Change len type to be unsigned int for consistency with the input type.
Don't check for negative; it does not matter we clamp anyway. This
broke the compat32 getsockname() where an unitialized socklen_t ended
up randomly negative causing it to fail.


Revision tags: matt-nb8-mediatek-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204
# 1.186 03-Feb-2017 christos

branches: 1.186.6;
expose sendmsg_so and recvmsg_so.


# 1.185 02-Feb-2017 christos

expose copyout_sockname_sb


Revision tags: bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204
# 1.184 03-Dec-2016 christos

branches: 1.184.2;
Add missing ktrkuser


Revision tags: pgoyette-localcount-20161104 nick-nhusb-base-20161004 localcount-20160914
# 1.183 13-Sep-2016 martin

Make the ktrace record written by do_sys_sendmsg/do_sys_recvmsg overridable
by the caller. Use this in compat_netbsd32 to log the 32bit version, so
the 32bit userland kdump is happy.


Revision tags: pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.182 07-Jul-2016 msaitoh

branches: 1.182.2;
KNF. Remove extra spaces. No functional change.


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.181 01-Nov-2015 christos

Don't overwrite the user iov pointer in sendmmsg. Make the send and receive
code look the same.


Revision tags: nick-nhusb-base-20150921
# 1.180 24-Aug-2015 pooka

sprinkle _KERNEL_OPT


# 1.179 22-Jul-2015 maxv

Memory leak. Triggerable from an unprivileged user via COMPAT_43.


Revision tags: nick-nhusb-base-20150606
# 1.178 09-May-2015 rtr

change sosend() to accept sockaddr * instead of mbuf * for nam.

bump to 7.99.16


# 1.177 02-May-2015 rtr

make connect syscall use sockaddr_big and modify pr_{send,connect}
nam parameter type from buf * to sockaddr *.

final commit for parameter type changes to protocol user requests

* bump kernel version to 7.99.15 for parameter type changes to pr_{send,connect}


# 1.176 24-Apr-2015 rtr

make accept, getsockname and getpeername syscalls use sockaddr_big and modify
pr_{accept,sockname,peername} nam parameter type from mbuf * to sockaddr *.

* retained use of mbuftypes[MT_SONAME] for now.
* bump to netbsd version 7.99.12 for parameter type change.

patch posted to tech-net@ 2015/04/19


Revision tags: nick-nhusb-base-20150406
# 1.175 03-Apr-2015 rtr

* change pr_bind to accept struct sockaddr * instead of struct mbuf *
* update protocol bind implementations to use/expect sockaddr *
instead of mbuf *
* introduce sockaddr_big struct for storage of addr data passed via
sys_bind; sockaddr_big is of sufficient size and alignment to
accommodate all addr data sizes received.
* modify sys_bind to allocate sockaddr_big instead of using an mbuf.
* bump kernel version to 7.99.9 for change to pr_bind() parameter type.

Patch posted to tech-net@
http://mail-index.netbsd.org/tech-net/2015/03/15/msg005004.html

The choice to use a new structure sockaddr_big has been retained since
changing sockaddr_storage size would lead to unnecessary ABI change. The
use of the new structure does not preclude future work that increases
the size of sockaddr_storage and at that time sockaddr_big may be
trivially replaced.

Tested by mrg@ and myself, discussed with rmind@, posted to tech-net@


# 1.174 06-Mar-2015 rtr

Return EINVAL if namelen isn't large enough to encompass the expected
members of sockaddr structures. i.e. sa_len and sa_family.

Discussed with and patch by christos@


Revision tags: nick-nhusb-base
# 1.173 05-Sep-2014 matt

branches: 1.173.2;
Try not to use f_data, use f_{vnode,socket,pipe,mqueue,kqueue,ksem} to get
a correctly typed pointer.


Revision tags: netbsd-7-0-RC2 netbsd-7-0-RC1 netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.172 09-Aug-2014 rtr

branches: 1.172.2;
split PRU_CONNECT2 & PRU_PURGEIF function out of pr_generic() usrreq
switches and put into separate functions

- always KASSERT(solocked(so)) even if not implemented
(for PRU_CONNECT2 only)

- replace calls to pr_generic() with req = PRU_CONNECT2 with calls to
pr_connect2()

- replace calls to pr_generic() with req = PRU_PURGEIF with calls to
pr_purgeif()

put common code from unp_connect2() (used by unp_connect() into
unp_connect1() and call out to it when needed

patch only briefly reviewed by rmind@


# 1.171 09-Jul-2014 rtr

* split PRU_PEERADDR and PRU_SOCKADDR function out of pr_generic()
usrreq switches and put into separate functions
xxx_{peer,sock}addr(struct socket *, struct mbuf *).

- KASSERT(solocked(so)) always in new functions even if request
is not implemented

- KASSERT(pcb != NULL) and KASSERT(nam) if the request is
implemented and not for tcp.

* for tcp roll #ifdef KPROF and #ifdef DEBUG code from tcp_usrreq() into
easier to cut & paste functions tcp_debug_capture() and
tcp_debug_trace()

- functions provided by rmind
- remaining use of PRU_{PEER,SOCK}ADDR #define to be removed in a
future commit.

* rename netbt functions to permit consistency of pru function names
(as has been done with other requests already split out).

- l2cap_{peer,sock}addr() -> l2cap_{peer,sock}_addr_pcb()
- rfcomm_{peer,sock}addr() -> rfcomm_{peer,sock}_addr_pcb()
- sco_{peer,sock}addr() -> sco_{peer,sock}_addr_pcb()

* split/refactor do_sys_getsockname(lwp, fd, which, nam) into
two functions do_sys_get{peer,sock}name(fd, nam).

- move PRU_PEERADDR handling into do_sys_getpeername() from
do_sys_getsockname()
- have svr4_stream directly call do_sys_get{sock,peer}name()
respectively instead of providing `which' & fix a DPRINTF string
that incorrectly wrote "getpeername" when it meant "getsockname"
- fix sys_getpeername() and sys_getsockname() to call
do_sys_get{sock,peer}name() without `which' and `lwp' & adjust
comments
- bump kernel version for removal of lwp & which parameters from
do_sys_getsockname()

note: future cleanup to remove struct mbuf * abuse in
xxx_{peer,sock}name()
still to come, not done in this commit since it is easier to do post
split.

patch reviewed by rmind

welcome to 6.99.47


Revision tags: rmind-smpnet-nbase rmind-smpnet-base
# 1.170 18-May-2014 rmind

Add struct pr_usrreqs with a pr_generic function and prepare for the
dismantling of pr_usrreq in the protocols; no functional change intended.
PRU_ATTACH/PRU_DETACH changes will follow soon.

Bump for struct protosw. Welcome to 6.99.62!


# 1.169 17-May-2014 rmind

- fsocreate: set SS_NBIO before the file descriptor is affixed as there is
a theoretical race condition (hard to trigger, though); remove the LWP
parameter and clean up the code a little.
- Sprinkle few comments.
- Remove M_SOOPTS while here.


# 1.168 17-May-2014 rmind

makesocket: set SS_NBIO slightly earlier.


# 1.167 17-May-2014 rmind

Remove trailing whitespaces, wrap long lines, minor KNF; no functional changes.


Revision tags: yamt-pagecache-base9
# 1.166 07-Apr-2014 seanb

Fix a case where an erroneous EAGAIN was returned out of recvmmsg.
This occured when some, but not all of the mmsg array members
were filled with data from a non-blocking socket.
PR kern/48725


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.165 09-Oct-2013 christos

branches: 1.165.2;
delete extra m_len initialization.


# 1.164 09-Oct-2013 christos

PR/48292: Justin Cormack: paccept creates sockets that cannot be made blocking
Reset the socket flags not just the file flags for non-blocking I/O.
XXX: pullup 6


# 1.163 08-Oct-2013 christos

PR/47591: Michael Plass: If the unix socket is closed before accept,
unp->unp_conn will be NULL in PRU_ACCEPT, as called from
sys_accept->so_accept. This will cause the usrreq to return with
no error, leaving the mbuf gotten from m_get() with an uninitialized
length, containing junk from a previous call. Initialize m_len to
be 0 to handle this case. This is yet another reason why Beverly's
idea of setting m_len = 0 in m_get() makes a lot of sense. Arguably
this could be an error, since the data we return now has 0 family
and length.


# 1.162 29-Aug-2013 rmind

Remove SS_ISCONFIRMING, it is unused and TP4 will not come back.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.161 03-Jun-2013 christos

branches: 1.161.2;
use the proper name for kdump pretty-printing.


Revision tags: agc-symver-base
# 1.160 14-Feb-2013 christos

PR/47569: Valery Ushakov: SOCK_NONBLOCK does not work because it does not
set SS_NBIO.
XXX: there are too many flags that mean the same thing in too many places,
and too many flags that mean the same thing and are different.


# 1.159 14-Feb-2013 riastradh

Fix some screw cases in cmsg file descriptor passing.

- Don't leave garbage in the control buffer if allocating file
descriptors fails in unp_externalize.

- Scrub the space between CMSG_LEN and CMSG_SPACE to avoid kernel
memory disclosure in unp_externalize.

- Don't read past cmsg_len when closing file descriptors that
couldn't get delivered, in free_rights.

ok christos


Revision tags: yamt-pagecache-base8
# 1.158 29-Dec-2012 mlelstv

The sanity check prevented messages that carry only ancillary data.


# 1.157 29-Dec-2012 mlelstv

If an untraced process sleeps in recvmsg/sendmsg, the syscall does not
allocate an iov structure for ktrace. When tracing is then enabled
and the process wakes up, it crashes the kernel.

Undo the last commit which introduced this error path.

Avoid the mentioned kmem_alloc assertion by adding a sanity check analog
to similar code in sys_generic.c for I/O on file handles instead of
sockets.

This also causes the syscall to return EMSGSIZE if the msg_iovlen member
of the msg structure is less than or equal to 0, as defined in
recvmsg(2)/sendmsg(2).


Revision tags: yamt-pagecache-base7 yamt-pagecache-base6
# 1.156 17-Jul-2012 njoly

branches: 1.156.2;
Avoid kmem_alloc KASSERT for 0 byte allocation, when tracing processes
that use empty messages with sendmsg/recvmsg.


# 1.155 22-Jun-2012 christos

Add {send,recv}mmsg from Linux


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2 netbsd-6-base
# 1.154 25-Jan-2012 christos

branches: 1.154.2;
revert atomics for so_options since it is a short.


# 1.153 25-Jan-2012 christos

need <sys/atomic.h>


# 1.152 25-Jan-2012 christos

Add locking, requested by yamt. Note that locking is not used everywhere
for these.


# 1.151 25-Jan-2012 christos

As discussed in tech-kern, provide the means to prevent delivery of SIGPIPE
on EPIPE for all file descriptor types:

- provide O_NOSIGPIPE for open,kqueue1,pipe2,dup3,fcntl(F_{G,S}ETFL) [NetBSD]
- provide SOCK_NOSIGPIPE for socket,socketpair [NetBSD]
- provide SO_NOSIGPIPE for {g,s}seckopt [NetBSD/FreeBSD/MacOSX]
- provide F_{G,S}ETNOSIGPIPE for fcntl [MacOSX]


# 1.150 21-Dec-2011 christos

simplify expression


# 1.149 20-Dec-2011 christos

- Eliminate so_nbio and turn it into a bit SS_NBIO in so_state.
- Introduce MSG_NBIO so that we can turn non blocking i/o on a per call basis
- Use MSG_NBIO to fix the XXX: multi-threaded issues on the fifo sockets.
- Don't set SO_CANTRCVMORE, if we were interrupted (perhaps do it for all
errors?).


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2
# 1.148 04-Nov-2011 christos

branches: 1.148.4;
Fix error I introduced in previous commit that caused asymmetric connects
when SOCK_NONBLOCK or SOCK_CLOEXEC was specified. Factor out common code
and simplify error return.


Revision tags: yamt-pagecache-base
# 1.147 21-Sep-2011 christos

branches: 1.147.2;
Put the mbuf type in the ktrace record so that we know how to decode it
in userland.


# 1.146 27-Jul-2011 uebayasi

These don't need uvm/uvm_extern.h.


# 1.145 15-Jul-2011 christos

fail with EINVAL if flags not are not O_CLOEXEC|O_NONBLOCK in pipe2(2) and
dup3(2)


# 1.144 26-Jun-2011 christos

* Arrange for interfaces that create new file descriptors to be able to
set close-on-exec on creation (http://udrepper.livejournal.com/20407.html).

- Add F_DUPFD_CLOEXEC to fcntl(2).
- Add MSG_CMSG_CLOEXEC to recvmsg(2) for unix file descriptor passing.
- Add dup3(2) syscall with a flags argument for O_CLOEXEC, O_NONBLOCK.
- Add pipe2(2) syscall with a flags argument for O_CLOEXEC, O_NONBLOCK.
- Add flags SOCK_CLOEXEC, SOCK_NONBLOCK to the socket type parameter
for socket(2) and socketpair(2).
- Add new paccept(2) syscall that takes an additional sigset_t to alter
the sigmask temporarily and a flags argument to set SOCK_CLOEXEC,
SOCK_NONBLOCK.
- Add new mode character 'e' to fopen(3) and popen(3) to open pipes
and file descriptors for close on exec.
- Add new kqueue1(2) syscall with a new flags argument to open the
kqueue file descriptor with O_CLOEXEC, O_NONBLOCK.

* Fix the system calls that take socklen_t arguments to actually do so.

* Don't include userland header files (signal.h) from system header files
(rump_syscallargs.h).

* Bump libc version for the new syscalls.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.143 24-Apr-2011 rmind

- Replace few malloc(9) uses with kmem(9).
- Rename buf_malloc() to buf_alloc(), fix comments.
- Remove some unnecessary inclusions.


# 1.142 10-Apr-2011 christos

- Add O_CLOEXEC to open(2)
- Add fd_set_exclose() to encapsulate uses of FIO{,N}CLEX, O_CLOEXEC, F{G,S}ETFD
- Add a pipe1() function to allow passing flags to the fd's that pipe(2)
opens to ease implementation of linux pipe2(2)
- Factor out fp handling code from open(2) and fhopen(2)


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base matt-mips64-premerge-20101231 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10 uebayasi-xip-base1
# 1.141 23-Apr-2010 rmind

branches: 1.141.2;
Replace M_IOV and some malloc(9)s with kmem(9), and while there:
- Fix invalid free (M_TEMP vs M_IOV) in do_sys_recvmsg(), spotted by jakllsch@.
Also, same fix in osf1_sys_sendmsg_xopen().
- Fix attempt to free non-allocated memory in error path in netbsd32___getfh30().
- Plug a memory leak in compat_43_netbsd32_orecvmsg().


Revision tags: yamt-nfs-mp-base9 uebayasi-xip-base
# 1.140 21-Jan-2010 pgoyette

branches: 1.140.2; 1.140.4;
Remove unnecessary call to kauth_cred_free().

This resolves an occassional crash I'd been experiencing as reported on
current-users@

Fix suggested by and OK elad@


# 1.139 29-Dec-2009 elad

Add credentials to to sockets.

We don't need any deferred free etc. because we no longer free the
credentials in interrupt context.

Tons of help from matt@, thanks!


# 1.138 20-Dec-2009 dsl

If a multithreaded app closes an fd while another thread is blocked in
read/write/accept, then the expectation is that the blocked thread will
exit and the close complete.
Since only one fd is affected, but many fd can refer to the same file,
the close code can only request the fs code unblock with ERESTART.
Fixed for pipes and sockets, ERESTART will only be generated after such
a close - so there should be no change for other programs.
Also rename fo_abort() to fo_restart() (this used to be fo_drain()).
Fixes PR/26567


Revision tags: matt-premerge-20091211
# 1.137 09-Dec-2009 dsl

Rename fo_drain() to fo_abort(), 'drain' is used to mean 'wait for output
do drain' in many places, whereas fo_drain() was called in order to force
blocking read()/write() etc calls to return to userspace so that a close()
call from a different thread can complete.
In the sockets code comment out the broken code in the inner function,
it was being called from compat code.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 jym-xensuspend-nbase yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.136 04-Apr-2009 ad

Add fileops::fo_drain(), to be called from fd_close() when there is more
than one active reference to a file descriptor. It should dislodge threads
sleeping while holding a reference to the descriptor. Implemented only for
sockets but should be extended to pipes, fifos, etc.

Fixes the case of a multithreaded process doing something like the
following, which would have hung until the process got a signal.

thr0 accept(fd, ...)
thr1 close(fd)


Revision tags: nick-hppapmap-base2
# 1.135 21-Jan-2009 yamt

branches: 1.135.2;
restore the pre socket locking patch signal behaviour.
this fixes a busy-loop in nfs_connect.


Revision tags: netbsd-5-0-RC1 haad-dm-base2 haad-nbase2 ad-audiomp2-base netbsd-5-base matt-mips64-base2 haad-dm-base1 wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 haad-dm-base mjf-devfs2-base
# 1.134 06-Aug-2008 plunky

branches: 1.134.2; 1.134.4;
Convert socket options code to use a sockopt structure
instead of laying everything into an mbuf.

approved by core


Revision tags: simonb-wapbl-nbase simonb-wapbl-base
# 1.133 24-Jun-2008 ad

branches: 1.133.2;
Nothing uses getsock/getvnode any more.


Revision tags: wrstuden-revivesa-base-1 yamt-pf42-base4 yamt-pf42-base3 wrstuden-revivesa-base
# 1.132 30-May-2008 rmind

branches: 1.132.2;
do_sys_accept: release the reference to sock in few error paths.
Should fix PR/38790, report and test-case by Nicolas Joly.


Revision tags: hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2
# 1.131 28-Apr-2008 martin

branches: 1.131.2;
Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.130 24-Apr-2008 ad

branches: 1.130.2;
Network protocol interrupts can now block on locks, so merge the globals
proclist_mutex and proclist_lock into a single adaptive mutex (proc_lock).
Implications:

- Inspecting process state requires thread context, so signals can no longer
be sent from a hardware interrupt handler. Signal activity must be
deferred to a soft interrupt or kthread.

- As the proc state locking is simplified, it's now safe to take exit()
and wait() out from under kernel_lock.

- The system spends less time at IPL_SCHED, and there is less lock activity.


# 1.129 24-Apr-2008 ad

Merge the socket locking patch:

- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.

With much feedback from matt@ and plunky@.


Revision tags: yamt-pf42-baseX yamt-pf42-base ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14
# 1.128 21-Mar-2008 ad

branches: 1.128.2;
Catch up with descriptor handling changes. See kern_descrip.c revision
1.173 for details.


Revision tags: keiichi-mipv6-nbase nick-net80211-sync-base keiichi-mipv6-base matt-armv6-nbase mjf-devfs-base hpcarm-cleanup-base
# 1.127 06-Feb-2008 ad

branches: 1.127.6;
Don't lock the socket to set/clear FNONBLOCK. Just set it atomically.


Revision tags: vmlocking2-base3 bouyer-xeni386-nbase bouyer-xeni386-base matt-armv6-base
# 1.126 26-Dec-2007 ad

Merge more changes from vmlocking2, mainly:

- Locking improvements.
- Use pool_cache for more items.


# 1.125 20-Dec-2007 dsl

Convert all the system call entry points from:
int foo(struct lwp *l, void *v, register_t *retval)
to:
int foo(struct lwp *l, const struct foo_args *uap, register_t *retval)
Fixup compat code to not write into 'uap' and (in some cases) to actually
pass a correctly formatted 'uap' structure with the right name to the
next routine.
A few 'compat' routines that just call standard ones have been deleted.
All the 'compat' code compiles (along with the kernels required to test
build it).
98% done by automated scripts.


# 1.124 16-Dec-2007 elad

Make solisten() take an lwp pointer like the rest, so it can be passed down
to pr_usrreq.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 vmlocking-nbase jmcneill-pm-base reinoud-bufcleanup-base
# 1.123 24-Nov-2007 dyoung

branches: 1.123.2; 1.123.6;
Pass the mbuf type (e.g., MT_SONAME, MT_SOOPTS) as the second
argument to getsockmbuf().


Revision tags: jmcneill-base bouyer-xenamd64-base2 yamt-x86pmap-base4 bouyer-xenamd64-base yamt-x86pmap-base3 yamt-x86pmap-base2 vmlocking-base
# 1.122 05-Oct-2007 dyoung

branches: 1.122.4;
Use getsombuf().


Revision tags: yamt-x86pmap-base
# 1.121 19-Sep-2007 christos

branches: 1.121.2;
minor nits; no code change.


# 1.120 19-Sep-2007 dyoung

1) Introduce a new socket option, (SOL_SOCKET, SO_NOHEADER), that
tells a socket that it should both add a protocol header to tx'd
datagrams and remove the header from rx'd datagrams:

int onoff = 1, s = socket(...);
setsockopt(s, SOL_SOCKET, SO_NOHEADER, &onoff);

2) Add an implementation of (SOL_SOCKET, SO_NOHEADER) for raw IPv4
sockets.

3) Reorganize the protocols' pr_ctloutput implementations a bit.
Consistently return ENOPROTOOPT when an option is unsupported,
and EINVAL if a supported option's arguments are incorrect.
Reorganize the flow of code so that it's more clear how/when
options are passed down the stack until they are handled.

Shorten some pr_ctloutput staircases for readability.

4) Extract common mbuf code into subroutines, add new sockaddr
methods, and introduce a new subroutine, fsocreate(), for reuse
later; use it first in sys_socket():

struct mbuf *m_getsombuf(struct socket *so)

Create an mbuf and make its owner the socket `so'.

struct mbuf *m_intopt(struct socket *so, int val)

Create an mbuf, make its owner the socket `so', put the
int `val' into it, and set its length to sizeof(int).


int fsocreate(..., int *fd)

Create a socket, a la socreate(9), put the socket into the
given LWP's descriptor table, return the descriptor at `fd'
on success.

void *sockaddr_addr(struct sockaddr *sa, socklen_t *slenp)
const void *sockaddr_const_addr(const struct sockaddr *sa, socklen_t *slenp)

Extract a pointer to the address part of a sockaddr. Write
the length of the address part at `slenp', if `slenp' is
not NULL.

socklen_t sockaddr_getlen(const struct sockaddr *sa)

Return the length of a sockaddr. This just evaluates to
sa->sa_len. I only add this for consistency with code that
appears in a portable userland library that I am going to
import.

const struct sockaddr *sockaddr_any(const struct sockaddr *sa)

Return the "don't care" sockaddr in the same family as
`sa'. This is the address a client should sobind(9) if it
does not care the source address and, if applicable, the
port et cetera that it uses.

const void *sockaddr_anyaddr(const struct sockaddr *sa, socklen_t *slenp)

Return the "don't care" sockaddr in the same family as
`sa'. This is the address a client should sobind(9) if it
does not care the source address and, if applicable, the
port et cetera that it uses.


Revision tags: nick-csl-alignment-base5
# 1.119 06-Sep-2007 rmind

do_sys_sendmsg: Plug a possible leak.
From CID: 4535


# 1.118 01-Sep-2007 dsl

Don't error calls to copy socket addresses to userspace when the application
has provided a non-null buffer pointer and a zero length.


# 1.117 27-Aug-2007 dsl

ktrace socket control structures (ie msghdr, address etc) using ktrkuser().


# 1.116 15-Aug-2007 ad

branches: 1.116.2;
Changes to make ktrace LKM friendly and reduce ifdef KTRACE. Proposed
on tech-kern.


Revision tags: matt-mips64-base nick-csl-alignment-base
# 1.115 15-Jul-2007 dsl

branches: 1.115.2; 1.115.6;
Remove non-user flags (especially MSG_IOVUSRSPACE) from mp->msg_flags
before passing to so_receive.
This may (or may not) have any effect...


Revision tags: mjf-ufs-trans-base
# 1.114 01-Jul-2007 dsl

Check for SOL_SOCKET when checking for SCM_RIGHTS.


# 1.113 24-Jun-2007 dsl

Split sys_getpeername() and sys_getsockname() so they can be called when the
'name' is wanted in kernel code.
Similarly split sys_accept() and change the split in recvmsg() so that it
is useful to the compat functions, recvit() is removed and replaced by
do_sys_recvmsg().
Factor out the code that writes socked names to userspace (from mbuf) to
avoid replicated code.
Extract the code that writes socket 'control' (CMSG) data out to userspace,
being more careful about the 'fd' that may exist inside SCM_RIGHTS msgs.
(they still get lost if some of the latter copyout calls fail).
Since these are new functions, and old LKMs will fail to load.


# 1.112 02-Jun-2007 enami

- Fix obvious typos so that sendto(2) works.
- Wrap lines again.


# 1.111 01-Jun-2007 dsl

Split sys_bind() and sys_connect() so that compat code can use common code
once the 'address' has been copied into an mbuf.
Add extra flags for 'struct msghdr.msg_flags' to indicate that the address
and control are already in mbufs, and that the uio structure is in userspace
for sending data, rename sendit() to do_sys_sendmsg() to ensure no old code
passes in random flags.
Changes to compat code to use new functions - removing some stackgap use.
Fix a 'use after free' in compat_43_sys_recvmsg.
I ***THINK*** the code that converts 'cmsg' formatted data is borked!
svr4_stream.c ought to be generated from svr4_32_stream.c during the build.


Revision tags: yamt-idlelwp-base8
# 1.110 13-May-2007 dsl

Fallout from caddr_t deletion - remove a load of redundant (void *) casts.


# 1.109 18-Apr-2007 yamt

sys_accept: fix usecount botch and double soclose in rev.1.108.


# 1.108 15-Apr-2007 yamt

sys_accept: don't leak a socket on error.


Revision tags: thorpej-atomic-base
# 1.107 04-Mar-2007 christos

branches: 1.107.2; 1.107.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


Revision tags: ad-audiomp-base post-newlock2-merge
# 1.106 09-Feb-2007 ad

branches: 1.106.2;
Merge newlock2 to head.


Revision tags: newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base netbsd-4-base
# 1.105 01-Nov-2006 yamt

branches: 1.105.2;
remove some __unused from function parameters.


# 1.104 23-Oct-2006 elad

PR/34873: Ryo Shimizu: sendmsg() can cause kernel panic

Patch applied, thanks!


Revision tags: yamt-splraiseipl-base2
# 1.103 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


Revision tags: yamt-splraiseipl-base yamt-pdpolicy-base9 yamt-pdpolicy-base8 rpaulo-netinet-merge-pcb-base
# 1.102 22-Aug-2006 seanb

branches: 1.102.2; 1.102.4;
Don't leave a dangling socket (no associated struct file) if
user supplied a bad name or anamelen parameter to accept(2).
If bad paramaters were suplied and a copyout() failed, the
struct file was cleaned up but not the associated socket. This
could leave sockets in CLOSE_WAIT that could never be closed.


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base7
# 1.101 23-Jul-2006 ad

branches: 1.101.2;
Use the LWP cached credentials where sane.


# 1.100 26-Jun-2006 mrg

version the socket(2) syscall. for compat30 socket, we use
EPROTONOSUPPORT instead of EAFNOSUPPORT.

from pavel@ with a little bit of clean up from myself.

XXX: netbsd32 (and perhaps other emulations) should be able
XXX: to call the standard socket calls for this i think, but
XXX: revisit this at another time.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base yamt-pdpolicy-base5 chap-midi-base simonb-timecounters-base
# 1.99 16-May-2006 christos

branches: 1.99.4;
Don't set mature an fd that has been ffree'd


Revision tags: elad-kernelauth-base
# 1.98 11-May-2006 christos

Add MSG_NOSIGNAL (from FreeBSD)


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2 yamt-pdpolicy-base
# 1.97 01-Mar-2006 yamt

branches: 1.97.2; 1.97.4; 1.97.6;
merge yamt-uio_vmspace branch.

- use vmspace rather than proc or lwp where appropriate.
the latter is more natural to specify an address space.
(and less likely to be abused for random purposes.)
- fix a swdmover race.


Revision tags: yamt-uio_vmspace-base5
# 1.96 26-Dec-2005 perry

branches: 1.96.2; 1.96.4; 1.96.6;
u_intN_t -> uintN_t


# 1.95 11-Dec-2005 christos

merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base ktrace-lwp-base
# 1.94 03-Sep-2005 martin

In adjust_rights() Use CMSG_SPACE() to calculate the number of
filedescriptors passed in this message - the counterpart in
unp_externalize does this as well.

Note that CMSG_SPACE(0) does not make sense, since it does not invoke
the alignment magic - so use CMSG_SPACE(sizeof(int)) and adjust the
calculated total later.

This fixes the postfix conection cache for 64bit platforms. Previously
the number of passed filed descriptors (nfds) would have been
calculeted too high, causing the fdrelease() of uninitialized junk.


# 1.93 03-Sep-2005 martin

minor knf tweak


# 1.92 30-May-2005 martin

branches: 1.92.2;
Close additional file descriptors if we set MSG_CTRUNC in a SCM_RIGHTS
message. From der Mouse in PR kern/30370.


# 1.91 29-May-2005 christos

- add const.
- remove unnecessary casts.
- add __UNCONST casts and mark them with XXXUNCONST as necessary.


Revision tags: yamt-km-base4 yamt-km-base3 netbsd-3-base kent-audio2-base
# 1.90 26-Feb-2005 perry

branches: 1.90.2;
nuke trailing whitespace


Revision tags: yamt-km-base2 yamt-km-base kent-audio1-beforemerge kent-audio1-base
# 1.89 30-Nov-2004 christos

branches: 1.89.4; 1.89.6;
Cloning cleanup:
1. make fileops const
2. add 2 new negative errno's to `officially' support the cloning hack:
- EDUPFD (used to overload ENODEV)
- EMOVEFD (used to overload ENXIO)
3. Created an fdclone() function to encapsulate the operations needed for
EMOVEFD, and made all cloners use it.
4. Centralize the local noop/badop fileops functions to:
fnullop_fcntl, fnullop_poll, fnullop_kqfilter, fbadop_stat


# 1.88 22-May-2004 jonathan

Eliminate several uses of `curproc' from the socket-layer code and from NFS.

Add a new explicit `struct proc *p' argument to socreate(), sosend().
Use that argument instead of curproc. Follow-on changes to pass that
argument to socreate(), sosend(), and (*so->so_send)() calls.
These changes reviewed and independently recoded by Matt Thomas.

Changes to soreceive() and (*dom->dom_exernalize() from Matt Thomas:
pass soreceive()'s struct uio* uio->uio_procp to unp_externalize().
Eliminate curproc from unp_externalize. Also, now soreceive() uses
its uio->uio_procp value, pass that same value downward to
((pr->pru_usrreq)() calls for consistency, instead of (struct proc * )0.

Similar changes in sys/nfs to eliminate (most) uses of curproc,
either via the req-> r_procp field of a struct nfsreq *req argument,
or by passing down new explicit struct proc * arguments.

Reviewed by: Matt Thomas, posted to tech-kern.
NB: The (*pr->pru_usrreq)() change should be tested on more (all!) protocols.


# 1.87 18-May-2004 ragge

Fix connect() "bug": If connect() is interrupted by a signal, the connection
attempt is terminated, so if a process needs frequent timer interrupts
it can't ever connect() to a machine far away.

Bug found by Erik Lundgren, bugfix (for the same problem) is similar to
the way FreeBSD solved the same problem.

As a side effect, the new connect() behaviour conformes to Posix.


Revision tags: netbsd-2-0-base
# 1.86 29-Nov-2003 matt

branches: 1.86.2;
Restore a change that made AF_LOCAL sockets block on connect(2) until
accepted. However, this time this behavor is not the default. Instead
it must enabled by using the LOCAL_CONNWAIT socket option on either the
connecting or accepting socket.


# 1.85 29-Nov-2003 perry

Revert a change that altered the semantics of AF_LOCAL sockets. Sadly
this made us API incompatible with other Unixes.


# 1.84 13-Nov-2003 chs

eliminate uvm_useracc() in favor of checking the return value of
copyin() or copyout().

uvm_useracc() tells us whether the mapping permissions allow access to
the desired part of an address space, and many callers assume that
this is the same as knowing whether an attempt to access that part of
the address space will succeed. however, access to user space can
fail for reasons other than insufficient permission, most notably that
paging in any non-resident data can fail due to i/o errors. most of
the callers of uvm_useracc() make the above incorrect assumption. the
rest are all misguided optimizations, which optimize for the case
where an operation will fail. we'd rather optimize for operations
succeeding, in which case we should just attempt the access and handle
failures due to insufficient permissions the same way we handle i/o
errors. since there appear to be no good uses of uvm_useracc(), we'll
just remove it.


# 1.83 04-Sep-2003 matt

Adapt to the new calling conventions of unp_connect2


# 1.82 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.81 29-Jun-2003 fvdl

branches: 1.81.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.


# 1.80 28-Jun-2003 darrenr

Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V


# 1.79 05-Apr-2003 christos

PR/21030: Naoto Shimazaki: fcntl to accepted socket does not work properly


# 1.78 26-Feb-2003 matt

Remove leftover MBUFTRACE asserts.


# 1.77 26-Feb-2003 drochner

deactivate MBUFTRACE related KASSERT()s in the !MBUFTRACE case


# 1.76 26-Feb-2003 matt

Add MBUFTRACE kernel option.
Do a little mbuf rework while here. Change all uses of MGET*(*, M_WAIT, *)
to m_get*(M_WAIT, *). These are not performance critical and making them
call m_get saves considerable space. Add m_clget analogue of MCLGET and
make corresponding change for M_WAIT uses.
Modify netinet, gem, fxp, tulip, nfs to support MBUFTRACE.
Begin to change netstat to use sysctl.


# 1.75 18-Jan-2003 thorpej

Merge the nathanw_sa branch.


Revision tags: nathanw_sa_before_merge fvdl_fs64_base gmcgarry_ctxsw_base gmcgarry_ucred_base nathanw_sa_base
# 1.74 26-Nov-2002 christos

si_ -> sel_ to avoid conflicts with siginfo.


# 1.73 25-Nov-2002 itojun

no need for error check after MEXTMALLOC - jdolecek


# 1.72 25-Nov-2002 itojun

MEXTMALLOC() can fail even if M_WAITOK, if arg is too big for malloc().


Revision tags: kqueue-aftermerge
# 1.71 23-Oct-2002 jdolecek

merge kqueue branch into -current

kqueue provides a stateful and efficient event notification framework
currently supported events include socket, file, directory, fifo,
pipe, tty and device changes, and monitoring of processes and signals

kqueue is supported by all writable filesystems in NetBSD tree
(with exception of Coda) and all device drivers supporting poll(2)

based on work done by Jonathan Lemon for FreeBSD
initial NetBSD port done by Luke Mewburn and Jason Thorpe


Revision tags: kqueue-beforemerge kqueue-base
# 1.70 04-Sep-2002 matt

Use the queue macros from <sys/queue.h> instead of referring to the queue
members directly. Use *_FOREACH whenever possible.


Revision tags: gehenna-devsw-base
# 1.69 31-May-2002 itojun

support setsockopt() with larger data (up to MCLBYTES).
From: Hitoshi Asaeda <Hitoshi.Asaeda@sophia.inria.fr>


Revision tags: netbsd-1-6-PATCH002-RELEASE netbsd-1-6-PATCH002 netbsd-1-6-PATCH002-RC4 netbsd-1-6-PATCH002-RC3 netbsd-1-6-PATCH002-RC2 netbsd-1-6-PATCH002-RC1 netbsd-1-6-PATCH001 netbsd-1-6-PATCH001-RELEASE netbsd-1-6-PATCH001-RC3 netbsd-1-6-PATCH001-RC2 netbsd-1-6-PATCH001-RC1 netbsd-1-6-RELEASE netbsd-1-6-RC3 netbsd-1-6-RC2 netbsd-1-6-RC1 netbsd-1-6-base eeh-devprop-base newlock-base ifpoll-base
# 1.68 11-Feb-2002 jdolecek

branches: 1.68.8;
Switch default for pipes to the faster John S. Dyson's implementation.
Old, socketpair-based ones are available with option PIPE_SOCKETPAIR.


Revision tags: thorpej-mips-cache-base
# 1.67 12-Nov-2001 lukem

add RCSIDs


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2
# 1.66 16-Sep-2001 wiz

branches: 1.66.2;
Spell 'occurred' with two 'r's.


Revision tags: post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.65 17-Jul-2001 jdolecek

branches: 1.65.2;
Expel MSG_COMPAT/COMPAT_OLDSOCK, make the COMPAT_43 wrappers
arrange things as needed. Unfortunately, the check in sockargs()
have to stay, since 4.3BSD bind(2), connect(2) and sendto(2) were
not versioned at the time :(

This code was tested to pass regression tests.


# 1.64 01-Jul-2001 matt

branches: 1.64.2;
Use consistent types for len. Limit sockarg length to reasonable values.


# 1.63 25-Jun-2001 jdolecek

Back off the sendit()/recvit() change, some have problems with it


# 1.62 25-Jun-2001 jdolecek

sys_connect(): fix the call to FILE_UNUSE() so that it's done on return, rather
than immediatelly after getsock() call


# 1.61 25-Jun-2001 jdolecek

Add 'kernsa' parameter for sendit()/recvit(); if nonzero, msg->msg_name
is supposed to point directly to struct mbuf or struct sockaddr in kernel
space as appropriate, rather than being a pointer to memory in userland.

This is to be used by compat/* when emulation needs to wrap
send{to|msg}(2)/recv{from|msg}(2) and modify the passed struct
sockaddr.


# 1.60 16-Jun-2001 jdolecek

Add port of high performance pipe implementation written by John S. Dyson
for FreeBSD project. Besides huge speed boost compared with socketpair-based
pipes, this implementation also uses pagable kernel memory instead of mbufs.

Significant differences to FreeBSD version:
* uses uvm_loan() facility for direct write
* async/SIGIO handling correct also for sync writer, async reader
* limits settable via sysctl, amountpipekva and nbigpipes available via sysctl
* pipes are unidirectional - this is enforced on file descriptor level
for now only, the code would be updated to take advantage of it
eventually
* uses lockmgr(9)-based locks instead of home brew variant
* scatter-gather write is handled correctly for direct write case, data
is transferred by PIPE_DIRECT_CHUNK bytes maximum, to avoid running out of kva

All FreeBSD/NetBSD specific code is within appropriate #ifdef, in preparation
to feed changes back to FreeBSD tree.

This pipe implementation is optional for now, add 'options NEW_PIPE'
to your kernel config to use it.


# 1.59 14-Jun-2001 thorpej

Fix a partial construction problem that can cause race conditions
between creation of a file descriptor and close(2) when using kernel
assisted threads. What we do is stick descriptors in the table, but
mark them as "larval". This causes essentially everything to treat
it as a non-existent descriptor, except for fdalloc(), which sees a
filled slot so that it won't (incorrectly) allocate it again. When
a descriptor is fully constructed, the code that has constructed it
marks it as "mature" (which actually clears the "larval" flag), and
things continue to work as normal.

While here, gather all the code that gets a descriptor from the table
into a fd_getfile() function, and call it, rather than having the
same (sometimes incorrect) code copied all over the place.


# 1.58 06-May-2001 manu

implement the recently introduced EMUL_BSD_ASYNCIO_PIPE emulation package
flag.

EMUL_BSD_ASYNCIO_PIPE notes that the emulated binaries expect the original
BSD pipe behavior for asynchronous I/O, which is to fire SIGIO on read() and
write(). OSes without this flag do not expect any SIGIO to be fired on
read() and write() for pipes, even when async I/O was requested. As far as
we know, the OSes that need EMUL_BSD_ASYNCIO_PIPE are NetBSD, OSF/1 and
Darwin.


Revision tags: thorpej_scsipi_beforemerge thorpej_scsipi_nbase thorpej_scsipi_base
# 1.57 27-Feb-2001 lukem

branches: 1.57.2;
convert to ANSI KNF


# 1.56 10-Dec-2000 fvdl

Make sobind() take a struct proc *. It already took curproc and
passed it down to the appropriate usrreq function, and this
allows usage for contexts that need to be explicitly different
from curproc (like in the NFS code when binding to a reserved port).


# 1.55 24-Nov-2000 jdolecek

define COMPAT_OLDSOCK unconditionally - the code is used virtually for all
emulations besides NetBSD, and this way it's LKM-safe


# 1.54 02-Aug-2000 thorpej

MALLOC()/FREE() are not to be used for variable sized allocations.


# 1.53 27-Jun-2000 mrg

remove include of <vm/vm.h>


Revision tags: netbsd-1-5-RELEASE netbsd-1-5-BETA2 netbsd-1-5-BETA netbsd-1-5-ALPHA2 netbsd-1-5-base minoura-xpg4dl-base
# 1.52 27-May-2000 sommerfeld

branches: 1.52.4;
Reduce use of curproc in several places:

- Change ktrace interface to pass in the current process, rather than
p->p_tracep, since the various ktr* function need curproc anyway.

- Add curproc as a parameter to mi_switch() since all callers had it
handy anyway.

- Add a second proc argument for inferior() since callers all had
curproc handy.

Also, miscellaneous cleanups in ktrace:

- ktrace now always uses file-based, rather than vnode-based I/O
(simplifies, increases type safety); eliminate KTRFLAG_FD & KTRFAC_FD.
Do non-blocking I/O, and yield a finite number of times when receiving
EWOULDBLOCK before giving up.

- move code duplicated between sys_fktrace and sys_ktrace into ktrace_common.

- simplify interface to ktrwrite()


# 1.51 30-Mar-2000 augustss

Get rid of register declarations.


# 1.50 23-Mar-2000 thorpej

Implement fdremove() which is used in place of all the code that
did the "fdp->fd_ofiles[fd] = 0" assignment; fdremove() make sure
the fd_freefiles hints stay in sync.

From OpenBSD.


Revision tags: chs-ubc2-newbase wrstuden-devbsize-19991221 wrstuden-devbsize-base comdex-fall-1999-base fvdl-softdep-base
# 1.49 05-Nov-1999 mycroft

branches: 1.49.2;
Fix recent bug in sys_accept(): we must remove the file descriptor from the
file descriptor table before freeing the file description.


# 1.48 30-Oct-1999 enami

back out unnecessary stylistic changes in recent changes, to keep coding
style closer to NKF.


# 1.47 27-Oct-1999 jdolecek

minor cleanup of previous - avoid goto and code duplication


# 1.46 27-Oct-1999 darrenr

patch from Greg A. Woods to fix panic problems with code that attempts to
recover from failures to accept a socket successfully. Problem suggested
by this:
> It would appear (from two "panic: closef: count < 0" failures in less
> than 12 hours) that Darren's fix to accept(2) for lost file descriptors
> isn't quite correct. His fix inserts a call to closef() to handle one
> of several possible error conditions. However everywhere else in the
> socket code in the same file where falloc() cleanup is necessary the
> function used is ffree().


Revision tags: chs-ubc2-base
# 1.45 01-Jul-1999 itojun

branches: 1.45.2; 1.45.4; 1.45.6;
IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.


# 1.44 01-Jul-1999 darrenr

fix sys_accept() to return EOPNOTSUPP for protocols which don't support
listen/accept (PR_LISTEN flag in protosw) and detect obvious faults in
parameters passed. It is still possible for the address used for copying
the socket information to become invalid between that check and the copyout
so close the connection's allocated fd if the copyout fails so that we can
return EFAULT without allocating an fd and the application not knowing about
it. Ideally we'd be able to queue the connection back up so a later accept
could retrieve it but unfortunately that's not possible.


# 1.43 05-May-1999 thorpej

Add "use counting" to file entries. When closing a file, and it's reference
count is 0, wait for use count to drain before finishing the close.

This is necessary in order for multiple processes to safely share file
descriptor tables.


# 1.42 30-Apr-1999 cgd

add checks for COMPAT_OSF1 in the appropriate places


Revision tags: netbsd-1-4-RELEASE netbsd-1-4-base
# 1.41 10-Feb-1999 kleink

branches: 1.41.2; 1.41.4; 1.41.6;
* Due to addition and use of socklen_t, make the socket option and address
arguments passed to accept(), bind(), connect(), getpeername(), getsockname(),
getsockopt(), recvfrom(), sendto() and sendmsg() unsigned, which also elimiates
a few casts.
* Reflect the (now) signedness of msg_iovlen, which necessiates the addition
of a few casts.


# 1.40 18-Dec-1998 drochner

solve the COMPAT_OLDSOCK/MSG_COMPAT problem differently:
The source files which need MSG_COMPAT define COMPAT_OLDSOCK.


Revision tags: kenh-if-detach-base
# 1.39 26-Nov-1998 mycroft

Revert the functional change in rev 1.38; permit a msg_iovlen of 0.
There are two reasons for this:
* We should be able to pass file descriptors without sending any data.
* We could send zero-length iovecs anyway (but we shouldn't have to do this).
Also, msg_iovlen is already a u_int, so delete a bunch of casts.


Revision tags: chs-ubc-base
# 1.38 04-Aug-1998 kleink

Per XNS Issue 5, calling recvmsg(2) or sendmsg(2) with an msg.msg_iovlen less
than or equal to 0 shall fail with EMSGSIZE; the latter condition was not being
checked for. Also, document the msg.msg_iovlen > {IOV_MAX} case.


# 1.37 04-Aug-1998 kleink

UIO_MAXIOV -> IOV_MAX


# 1.36 04-Aug-1998 perry

Abolition of bcopy, ovbcopy, bcmp, and bzero, phase one.
bcopy(x, y, z) -> memcpy(y, x, z)
ovbcopy(x, y, z) -> memmove(y, x, z)
bcmp(x, y, z) -> memcmp(x, y, z)
bzero(x, y) -> memset(x, 0, y)


# 1.35 03-Aug-1998 kleink

Fix two off-by-one bugs, both present in each recvmsg(2) and sendmsg(2):
* the first one would cause an unnecessary malloc() of iovec storage for
a msg_iovlen of UIO_SMALLIOV although the required amount of memory has
been allocated on the stack.
* the second one would cause a recvmsg() or sendmsg() with a msg_iovlen of
UIO_MAXIOV to fail with EMSGSIZE, which is also a violation of XNS5.


# 1.34 31-Jul-1998 perry

fix sizeofs so they comply with the KNF style guide. yes, it is pedantic.


Revision tags: eeh-paddr_t-base
# 1.33 29-Jul-1998 thorpej

branches: 1.33.2;
When checking for overflow in the residual count, test against SSIZE_MAX.
The read/write system calls return ssize_t because -1 is used to indicate
error, therefore the transfer size MUST be limited to SSIZE_MAX, otherwise
garbage can be returned to the user.

There is NO change from existing behavior here, only a more precise
definition of that the semantics are, except in the Alpha case, where
the full SSIZE_MAX transfer size can now be realized (ssize_t is 64-bit
on the Alpha).


# 1.32 18-Jul-1998 lukem

use AF_LOCAL instead of AF_UNIX


# 1.31 25-Jun-1998 thorpej

defopt KTRACE


# 1.30 25-Apr-1998 matt

Hook for 0-copy (or other optimized) sends and receives


# 1.29 01-Mar-1998 fvdl

Merge with Lite2 + local changes


# 1.28 06-Feb-1998 thorpej

When copying out multiple control messages, ensure that the next control
message is aligned. From David Borman <dab@bsdi.com>.


# 1.27 07-Jan-1998 thorpej

Make insertion and removal of sockets from the partial and incoming
connections queues O(C) rather than O(N).


# 1.26 07-Jan-1998 thorpej

Fix bug in recvit() that would cause recvmsg() to only receive one
control message, even if there were multiple control messages on
the queue. From Jean-Luc Richier <Jean-Luc.Richier@imag.fr>, in
bug report kern/4700.


Revision tags: netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base thorpej-signal-base marc-pcmcia-bp marc-pcmcia-base
# 1.25 26-Jun-1997 thorpej

branches: 1.25.8;
Use UCHAR_MAX rather than "255" when sanity-checking the length of a
sockaddr in sockargs().


# 1.24 26-Jun-1997 thorpej

In sockargs():
- Add a comment describing my feelings about this interface, in general.
- Remove the COMPAT_OLDSOCK length hack. Instead, if the socket argument
is too long to fit in an mbuf, allocate enough external storage to
hold it.
- If the socket argument is a sockaddr, don't allow the length to be
greater than 255, as that would overflow sa_len.
Many thanks to enami tsugutomo <enami@cv.sony.co.jp> for his sanity checking.


Revision tags: is-newarp-before-merge is-newarp-base
# 1.23 22-Dec-1996 cgd

* catch up with system call argument type fixups/const poisoning.
* Fix arguments to various copyin()/copyout() invocations, to avoid
gratuitous casts.
* Some KNF formatting fixes
* Change sockargs()'s second argument to be a const void *, to help
with dealing with the syscall argument type fixups/const poisoning.


# 1.22 14-Jun-1996 cgd

avoid unnecessary checks of m_get/MGET/etc.'s return values. When
they're called with M_WAIT, they are defined to never return NULL.


# 1.21 22-May-1996 mycroft

Pass a proc pointer down to the usrreq and pcbbind functions for PRU_ATTACH, PRU_BIND and
PRU_CONTROL. The usrreq interface really needs to be split up, but this will have to wait.
Remove SS_PRIV completely.


Revision tags: netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.20 17-May-1996 pk

branches: 1.20.4;
Don't touch retval[] in socketpair(); manual page says this system call
return 0 on success (PR#2428).


# 1.19 09-Feb-1996 christos

More proto fixes


# 1.18 04-Feb-1996 christos

First pass at prototyping


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.17 10-Oct-1995 mycroft

Add hooks for COMPAT_FREEBSD, from Noriyuki Soda.


# 1.16 07-Oct-1995 mycroft

Prefix names of system call implementation functions with `sys_'.


# 1.15 19-Sep-1995 thorpej

Make system calls conform to a standard prototype and bring those
prototypes into scope.


# 1.14 12-Aug-1995 mycroft

splnet --> splsoftnet


# 1.13 24-Jun-1995 christos

Extracted all of the compat_xxx routines, and created a library [libcompat]
for them. There are a few #ifdef COMPAT_XX remaining, but they are not easy
or worth eliminating (yet).


# 1.12 10-May-1995 christos

tty_tb.c: need to include ioctl_compat.h in order to compile.
sysv_shm.c: make shm_find_segment_by_shmid global so it can be used by
COMPAT_HPUX. There should be a better way...
rest: Add #ifdef COMPAT_HPUX where needed


# 1.11 05-Mar-1995 fvdl

Extended a couple of defines with "|| defined(COMPAT_LINUX)" to make
things compile without requiring COMPAT_43 and/or COMPAT_09.


# 1.10 30-Oct-1994 cgd

be more careful with types, also pull in headers where necessary.


# 1.9 20-Oct-1994 cgd

update for new syscall args description mechanism


Revision tags: netbsd-1-0-PATCH06 netbsd-1-0-PATCH05 netbsd-1-0-PATCH04 netbsd-1-0-PATCH03 netbsd-1-0-PATCH02 netbsd-1-0-PATCH1 netbsd-1-0-PATCH0 netbsd-1-0-RELEASE netbsd-1-0-base
# 1.8 29-Jun-1994 cgd

New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.7 04-May-1994 mycroft

Add return types where missing. Simplify some of the compat conditionals.
Include compat code if COMPAT_SUNOS with or without COMPAT_43.


# 1.6 18-Dec-1993 mycroft

Canonicalize all #includes.


Revision tags: magnum-base netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.5 17-Jul-1993 mycroft

branches: 1.5.4;
Finish moving struct definitions outside of function declarations.


# 1.4 27-Jun-1993 andrew

* ansifications
* Yuval Yarom's socket recv(2) fixes - access rights problems (see also
uipc_socket.c).


# 1.3 22-May-1993 cgd

add include of select.h if necessary for protos, or delete if extraneous


# 1.2 18-May-1993 cgd

make kernel select interface be one-stop shopping & clean it all up.


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


# 1.210 02-Nov-2023 martin

Back out the following revisions on behalf of core:

sys/sys/lwp.h: revision 1.228
sys/sys/pipe.h: revision 1.40
sys/kern/uipc_socket.c: revision 1.306
sys/kern/kern_sleepq.c: revision 1.84
sys/rump/librump/rumpkern/locks_up.c: revision 1.13
sys/kern/sys_pipe.c: revision 1.165
usr.bin/fstat/fstat.c: revision 1.119
sys/rump/librump/rumpkern/locks.c: revision 1.87
sys/ddb/db_xxx.c: revision 1.78
sys/ddb/db_command.c: revision 1.187
sys/sys/condvar.h: revision 1.18
sys/ddb/db_interface.h: revision 1.42
sys/sys/socketvar.h: revision 1.166
sys/kern/uipc_syscalls.c: revision 1.209
sys/kern/kern_condvar.c: revision 1.60

Add cv_fdrestart() [...]
Use cv_fdrestart() to implement fo_restart.
Simplify/streamline pipes a little bit [...]

This changes have caused regressions and need to be debugged.
The cv_fdrestart() addition needs more discussion.


# 1.209 13-Oct-2023 ad

Use cv_fdrestart() to implement fo_restart.


# 1.208 04-Oct-2023 ad

kauth_cred_hold(): return cred verbatim so that donating a reference to
another data structure can be done more elegantly.


# 1.207 09-Sep-2023 ad

Fix a ~16 year old perf regression: when accepting a connection, add a
reference to the caller's credentials rather than copying them.


Revision tags: netbsd-10-base bouyer-sunxi-drm-base
# 1.206 01-Jul-2022 riastradh

sendto(2), recvfrom(2): Scrub internal struct msghdr on stack.

Otherwise this is kernel stack disclosure via ktrace.

Reported-by: syzbot+1d40303b310063778194@syzkaller.appspotmail.com


# 1.205 29-Jun-2022 riastradh

recvmmsg(2): More timespec validation.

Reported-by: syzbot+004ed2f264534bd27312@syzkaller.appspotmail.com
Reported-by: syzbot+6f9014c842c4e78df7bc@syzkaller.appspotmail.com


# 1.204 28-Jun-2022 riastradh

recvmmsg(2): Avoid arithmetic overflow in timeout calculations.

XXX This is not right -- it doesn't actually do anything to time
out...

Reported-by: syzbot+784209d76a94fcc6417b@syzkaller.appspotmail.com


# 1.203 27-Jun-2022 riastradh

sendmsg(2): Avoid buffer overrun in ktrace of invalid cmsghdr.

Reported-by: syzbot+efded148140b23425f5c@syzkaller.appspotmail.com


# 1.202 02-Oct-2021 thorpej

...and correct my terrible spelling.


# 1.201 02-Oct-2021 thorpej

- Strenghen the poll(2) fifo_inout test to ensure that once the reader
has read enough that exactly PIPE_BUF space is available that the FIFO
becomes writable again.
- When creating a FIFO, ensure that the receive low water mark is 1
(a FIFO must be readable when at least 1 byte is available); this
was already the case implicitly, but this makes it explicit.
- Similarly, set the send low water mark to PIPE_BUF to ensure that
the pipe is writable when at least PIPE_BUF bytes of space are available
in the send buffer. Without this change, the strengthened test case
above does not pass (the default send low water mark is larger than
PIPE_BUF; see soreserve()).
- Make the same low water mark changes to the PIPE_SOCKETPAIR case.


Revision tags: thorpej-i2c-spi-conf2-base thorpej-futex2-base thorpej-cfargs2-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base thorpej-i2c-spi-conf-base thorpej-cfargs-base thorpej-futex-base
# 1.200 23-May-2020 ad

Move proc_lock into the data segment. It was dynamically allocated because
at the time we had mutex_obj_alloc() but not __cacheline_aligned.


Revision tags: netbsd-9-2-RELEASE netbsd-9-1-RELEASE bouyer-xenpvh-base2 phil-wifi-20200421 bouyer-xenpvh-base1 phil-wifi-20200411 bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3 netbsd-9-0-RELEASE netbsd-9-0-RC2 ad-namecache-base2 ad-namecache-base1 ad-namecache-base netbsd-9-0-RC1 phil-wifi-20191119 netbsd-9-base phil-wifi-20190609 isaki-audio2-base pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126
# 1.199 12-Nov-2018 hannken

branches: 1.199.4;
sys_recvmmsg: don't defer an error that already gets returned.


# 1.198 07-Nov-2018 hannken

Don't defer errors from sendmmsg(). This matches the linux manpage.

Defer errors from recvmmsg() through so_rerror and
tests and return a deferred error on entry.

Ok: christos@


Revision tags: pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.197 03-Sep-2018 riastradh

Rename min/max -> uimin/uimax for better honesty.

These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER! Some subsystems have

#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)


# 1.196 01-Aug-2018 rjs

Add ioctl(2) handler for kernel part of sctp_peeloff().


# 1.195 31-Jul-2018 rjs

Add getsockopt2() syscall.


Revision tags: pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521
# 1.194 04-May-2018 christos

branches: 1.194.2;
define MBUFTYPES here.


# 1.193 03-May-2018 christos

Fix COMPAT_NETBSD32 cmsg handling:

1. alignment was wrong for > 1 message
2. macros were doing incorrect pointer comparisons, fortunately ending
the iteration early after the fists cmsg instead of crashing.
3. don't output 32 bit ktrace records for cmsg. 32 bit programs running
under emulation on 64 bit systems should produce 64 bit ktrace records
so that the native ktrace can handle the records; remove extra arguments
that are now not needed (the 32 bit msghdr).
4. output the correct type for cmsg trace records.
5. output all the cmsg records in traces instead of just the first one.

Welcome to 8.99.15 because of the argument removal.

XXX: Really all the code should be changed to use the CMSG_{FIRST,NXT}HDR
macros...


Revision tags: pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322
# 1.192 16-Mar-2018 christos

PR/53103: Timo Buhrmester: linux emulation of sendto(2) broken

The sockargs refactoring broke it, because sockargs only works with a user
address. Added an argument to sockargs to indicate where the address is
coming from. Welcome to 8.99.14.


Revision tags: pgoyette-compat-0315 pgoyette-compat-base
# 1.191 12-Feb-2018 maxv

branches: 1.191.2;
Add a KASSERT; we expect *from to be a single mbuf (not chained).


# 1.190 04-Jan-2018 christos

Add a new sockopt member to keep track of the actual size of the option
that should be returned to the caller in getsockopt(2).

(Tom Ivar Helbekkmo)


# 1.189 31-Dec-2017 christos

pass valsize for getsockopt like we do for setsockopt


# 1.188 26-Dec-2017 kamil

Refactor pipe1() and correct a bug in sys_pipe2() (SYS_pipe2)

sys_pipe2() returns two integers (values), the 2nd one is a copy of the 2nd
file descriptor that lands in fildes[2]. This is a side effect of reusing
the code for sys_pipe() (SYS_pipe) and not cleaning it up.

The first returned value is (on success) 0.

Introduced a small refactoring in pipe1() that it does not operate over
retval[], but on an array int[2]. A user sets retval[] for pipe() when
desired and needed.

This refactoring touches compat code: netbsd32, linux, linux32.

Before the changes on NetBSD/amd64:

$ ktruss -i ./a.out
[...]
15131 1 a.out pipe2(0x7f7fff2e62b8, 0) = 0, 4
[...]

After the changes:

$ ktruss -i ./a.out
[...]
782 1 a.out pipe2(0x7f7fff97e850, 0) = 0
[...]

There should not be a visible change for current users.

Sponsored by <The NetBSD Foundation>


Revision tags: tls-maxphys-base-20171202 nick-nhusb-base-20170825 perseant-stdc-iso10646-base
# 1.187 20-Jun-2017 christos

Change len type to be unsigned int for consistency with the input type.
Don't check for negative; it does not matter we clamp anyway. This
broke the compat32 getsockname() where an unitialized socklen_t ended
up randomly negative causing it to fail.


Revision tags: matt-nb8-mediatek-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204
# 1.186 03-Feb-2017 christos

branches: 1.186.6;
expose sendmsg_so and recvmsg_so.


# 1.185 02-Feb-2017 christos

expose copyout_sockname_sb


Revision tags: bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204
# 1.184 03-Dec-2016 christos

branches: 1.184.2;
Add missing ktrkuser


Revision tags: pgoyette-localcount-20161104 nick-nhusb-base-20161004 localcount-20160914
# 1.183 13-Sep-2016 martin

Make the ktrace record written by do_sys_sendmsg/do_sys_recvmsg overridable
by the caller. Use this in compat_netbsd32 to log the 32bit version, so
the 32bit userland kdump is happy.


Revision tags: pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.182 07-Jul-2016 msaitoh

branches: 1.182.2;
KNF. Remove extra spaces. No functional change.


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.181 01-Nov-2015 christos

Don't overwrite the user iov pointer in sendmmsg. Make the send and receive
code look the same.


Revision tags: nick-nhusb-base-20150921
# 1.180 24-Aug-2015 pooka

sprinkle _KERNEL_OPT


# 1.179 22-Jul-2015 maxv

Memory leak. Triggerable from an unprivileged user via COMPAT_43.


Revision tags: nick-nhusb-base-20150606
# 1.178 09-May-2015 rtr

change sosend() to accept sockaddr * instead of mbuf * for nam.

bump to 7.99.16


# 1.177 02-May-2015 rtr

make connect syscall use sockaddr_big and modify pr_{send,connect}
nam parameter type from buf * to sockaddr *.

final commit for parameter type changes to protocol user requests

* bump kernel version to 7.99.15 for parameter type changes to pr_{send,connect}


# 1.176 24-Apr-2015 rtr

make accept, getsockname and getpeername syscalls use sockaddr_big and modify
pr_{accept,sockname,peername} nam parameter type from mbuf * to sockaddr *.

* retained use of mbuftypes[MT_SONAME] for now.
* bump to netbsd version 7.99.12 for parameter type change.

patch posted to tech-net@ 2015/04/19


Revision tags: nick-nhusb-base-20150406
# 1.175 03-Apr-2015 rtr

* change pr_bind to accept struct sockaddr * instead of struct mbuf *
* update protocol bind implementations to use/expect sockaddr *
instead of mbuf *
* introduce sockaddr_big struct for storage of addr data passed via
sys_bind; sockaddr_big is of sufficient size and alignment to
accommodate all addr data sizes received.
* modify sys_bind to allocate sockaddr_big instead of using an mbuf.
* bump kernel version to 7.99.9 for change to pr_bind() parameter type.

Patch posted to tech-net@
http://mail-index.netbsd.org/tech-net/2015/03/15/msg005004.html

The choice to use a new structure sockaddr_big has been retained since
changing sockaddr_storage size would lead to unnecessary ABI change. The
use of the new structure does not preclude future work that increases
the size of sockaddr_storage and at that time sockaddr_big may be
trivially replaced.

Tested by mrg@ and myself, discussed with rmind@, posted to tech-net@


# 1.174 06-Mar-2015 rtr

Return EINVAL if namelen isn't large enough to encompass the expected
members of sockaddr structures. i.e. sa_len and sa_family.

Discussed with and patch by christos@


Revision tags: nick-nhusb-base
# 1.173 05-Sep-2014 matt

branches: 1.173.2;
Try not to use f_data, use f_{vnode,socket,pipe,mqueue,kqueue,ksem} to get
a correctly typed pointer.


Revision tags: netbsd-7-0-RC2 netbsd-7-0-RC1 netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.172 09-Aug-2014 rtr

branches: 1.172.2;
split PRU_CONNECT2 & PRU_PURGEIF function out of pr_generic() usrreq
switches and put into separate functions

- always KASSERT(solocked(so)) even if not implemented
(for PRU_CONNECT2 only)

- replace calls to pr_generic() with req = PRU_CONNECT2 with calls to
pr_connect2()

- replace calls to pr_generic() with req = PRU_PURGEIF with calls to
pr_purgeif()

put common code from unp_connect2() (used by unp_connect() into
unp_connect1() and call out to it when needed

patch only briefly reviewed by rmind@


# 1.171 09-Jul-2014 rtr

* split PRU_PEERADDR and PRU_SOCKADDR function out of pr_generic()
usrreq switches and put into separate functions
xxx_{peer,sock}addr(struct socket *, struct mbuf *).

- KASSERT(solocked(so)) always in new functions even if request
is not implemented

- KASSERT(pcb != NULL) and KASSERT(nam) if the request is
implemented and not for tcp.

* for tcp roll #ifdef KPROF and #ifdef DEBUG code from tcp_usrreq() into
easier to cut & paste functions tcp_debug_capture() and
tcp_debug_trace()

- functions provided by rmind
- remaining use of PRU_{PEER,SOCK}ADDR #define to be removed in a
future commit.

* rename netbt functions to permit consistency of pru function names
(as has been done with other requests already split out).

- l2cap_{peer,sock}addr() -> l2cap_{peer,sock}_addr_pcb()
- rfcomm_{peer,sock}addr() -> rfcomm_{peer,sock}_addr_pcb()
- sco_{peer,sock}addr() -> sco_{peer,sock}_addr_pcb()

* split/refactor do_sys_getsockname(lwp, fd, which, nam) into
two functions do_sys_get{peer,sock}name(fd, nam).

- move PRU_PEERADDR handling into do_sys_getpeername() from
do_sys_getsockname()
- have svr4_stream directly call do_sys_get{sock,peer}name()
respectively instead of providing `which' & fix a DPRINTF string
that incorrectly wrote "getpeername" when it meant "getsockname"
- fix sys_getpeername() and sys_getsockname() to call
do_sys_get{sock,peer}name() without `which' and `lwp' & adjust
comments
- bump kernel version for removal of lwp & which parameters from
do_sys_getsockname()

note: future cleanup to remove struct mbuf * abuse in
xxx_{peer,sock}name()
still to come, not done in this commit since it is easier to do post
split.

patch reviewed by rmind

welcome to 6.99.47


Revision tags: rmind-smpnet-nbase rmind-smpnet-base
# 1.170 18-May-2014 rmind

Add struct pr_usrreqs with a pr_generic function and prepare for the
dismantling of pr_usrreq in the protocols; no functional change intended.
PRU_ATTACH/PRU_DETACH changes will follow soon.

Bump for struct protosw. Welcome to 6.99.62!


# 1.169 17-May-2014 rmind

- fsocreate: set SS_NBIO before the file descriptor is affixed as there is
a theoretical race condition (hard to trigger, though); remove the LWP
parameter and clean up the code a little.
- Sprinkle few comments.
- Remove M_SOOPTS while here.


# 1.168 17-May-2014 rmind

makesocket: set SS_NBIO slightly earlier.


# 1.167 17-May-2014 rmind

Remove trailing whitespaces, wrap long lines, minor KNF; no functional changes.


Revision tags: yamt-pagecache-base9
# 1.166 07-Apr-2014 seanb

Fix a case where an erroneous EAGAIN was returned out of recvmmsg.
This occured when some, but not all of the mmsg array members
were filled with data from a non-blocking socket.
PR kern/48725


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.165 09-Oct-2013 christos

branches: 1.165.2;
delete extra m_len initialization.


# 1.164 09-Oct-2013 christos

PR/48292: Justin Cormack: paccept creates sockets that cannot be made blocking
Reset the socket flags not just the file flags for non-blocking I/O.
XXX: pullup 6


# 1.163 08-Oct-2013 christos

PR/47591: Michael Plass: If the unix socket is closed before accept,
unp->unp_conn will be NULL in PRU_ACCEPT, as called from
sys_accept->so_accept. This will cause the usrreq to return with
no error, leaving the mbuf gotten from m_get() with an uninitialized
length, containing junk from a previous call. Initialize m_len to
be 0 to handle this case. This is yet another reason why Beverly's
idea of setting m_len = 0 in m_get() makes a lot of sense. Arguably
this could be an error, since the data we return now has 0 family
and length.


# 1.162 29-Aug-2013 rmind

Remove SS_ISCONFIRMING, it is unused and TP4 will not come back.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.161 03-Jun-2013 christos

branches: 1.161.2;
use the proper name for kdump pretty-printing.


Revision tags: agc-symver-base
# 1.160 14-Feb-2013 christos

PR/47569: Valery Ushakov: SOCK_NONBLOCK does not work because it does not
set SS_NBIO.
XXX: there are too many flags that mean the same thing in too many places,
and too many flags that mean the same thing and are different.


# 1.159 14-Feb-2013 riastradh

Fix some screw cases in cmsg file descriptor passing.

- Don't leave garbage in the control buffer if allocating file
descriptors fails in unp_externalize.

- Scrub the space between CMSG_LEN and CMSG_SPACE to avoid kernel
memory disclosure in unp_externalize.

- Don't read past cmsg_len when closing file descriptors that
couldn't get delivered, in free_rights.

ok christos


Revision tags: yamt-pagecache-base8
# 1.158 29-Dec-2012 mlelstv

The sanity check prevented messages that carry only ancillary data.


# 1.157 29-Dec-2012 mlelstv

If an untraced process sleeps in recvmsg/sendmsg, the syscall does not
allocate an iov structure for ktrace. When tracing is then enabled
and the process wakes up, it crashes the kernel.

Undo the last commit which introduced this error path.

Avoid the mentioned kmem_alloc assertion by adding a sanity check analog
to similar code in sys_generic.c for I/O on file handles instead of
sockets.

This also causes the syscall to return EMSGSIZE if the msg_iovlen member
of the msg structure is less than or equal to 0, as defined in
recvmsg(2)/sendmsg(2).


Revision tags: yamt-pagecache-base7 yamt-pagecache-base6
# 1.156 17-Jul-2012 njoly

branches: 1.156.2;
Avoid kmem_alloc KASSERT for 0 byte allocation, when tracing processes
that use empty messages with sendmsg/recvmsg.


# 1.155 22-Jun-2012 christos

Add {send,recv}mmsg from Linux


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2 netbsd-6-base
# 1.154 25-Jan-2012 christos

branches: 1.154.2;
revert atomics for so_options since it is a short.


# 1.153 25-Jan-2012 christos

need <sys/atomic.h>


# 1.152 25-Jan-2012 christos

Add locking, requested by yamt. Note that locking is not used everywhere
for these.


# 1.151 25-Jan-2012 christos

As discussed in tech-kern, provide the means to prevent delivery of SIGPIPE
on EPIPE for all file descriptor types:

- provide O_NOSIGPIPE for open,kqueue1,pipe2,dup3,fcntl(F_{G,S}ETFL) [NetBSD]
- provide SOCK_NOSIGPIPE for socket,socketpair [NetBSD]
- provide SO_NOSIGPIPE for {g,s}seckopt [NetBSD/FreeBSD/MacOSX]
- provide F_{G,S}ETNOSIGPIPE for fcntl [MacOSX]


# 1.150 21-Dec-2011 christos

simplify expression


# 1.149 20-Dec-2011 christos

- Eliminate so_nbio and turn it into a bit SS_NBIO in so_state.
- Introduce MSG_NBIO so that we can turn non blocking i/o on a per call basis
- Use MSG_NBIO to fix the XXX: multi-threaded issues on the fifo sockets.
- Don't set SO_CANTRCVMORE, if we were interrupted (perhaps do it for all
errors?).


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2
# 1.148 04-Nov-2011 christos

branches: 1.148.4;
Fix error I introduced in previous commit that caused asymmetric connects
when SOCK_NONBLOCK or SOCK_CLOEXEC was specified. Factor out common code
and simplify error return.


Revision tags: yamt-pagecache-base
# 1.147 21-Sep-2011 christos

branches: 1.147.2;
Put the mbuf type in the ktrace record so that we know how to decode it
in userland.


# 1.146 27-Jul-2011 uebayasi

These don't need uvm/uvm_extern.h.


# 1.145 15-Jul-2011 christos

fail with EINVAL if flags not are not O_CLOEXEC|O_NONBLOCK in pipe2(2) and
dup3(2)


# 1.144 26-Jun-2011 christos

* Arrange for interfaces that create new file descriptors to be able to
set close-on-exec on creation (http://udrepper.livejournal.com/20407.html).

- Add F_DUPFD_CLOEXEC to fcntl(2).
- Add MSG_CMSG_CLOEXEC to recvmsg(2) for unix file descriptor passing.
- Add dup3(2) syscall with a flags argument for O_CLOEXEC, O_NONBLOCK.
- Add pipe2(2) syscall with a flags argument for O_CLOEXEC, O_NONBLOCK.
- Add flags SOCK_CLOEXEC, SOCK_NONBLOCK to the socket type parameter
for socket(2) and socketpair(2).
- Add new paccept(2) syscall that takes an additional sigset_t to alter
the sigmask temporarily and a flags argument to set SOCK_CLOEXEC,
SOCK_NONBLOCK.
- Add new mode character 'e' to fopen(3) and popen(3) to open pipes
and file descriptors for close on exec.
- Add new kqueue1(2) syscall with a new flags argument to open the
kqueue file descriptor with O_CLOEXEC, O_NONBLOCK.

* Fix the system calls that take socklen_t arguments to actually do so.

* Don't include userland header files (signal.h) from system header files
(rump_syscallargs.h).

* Bump libc version for the new syscalls.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.143 24-Apr-2011 rmind

- Replace few malloc(9) uses with kmem(9).
- Rename buf_malloc() to buf_alloc(), fix comments.
- Remove some unnecessary inclusions.


# 1.142 10-Apr-2011 christos

- Add O_CLOEXEC to open(2)
- Add fd_set_exclose() to encapsulate uses of FIO{,N}CLEX, O_CLOEXEC, F{G,S}ETFD
- Add a pipe1() function to allow passing flags to the fd's that pipe(2)
opens to ease implementation of linux pipe2(2)
- Factor out fp handling code from open(2) and fhopen(2)


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base matt-mips64-premerge-20101231 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10 uebayasi-xip-base1
# 1.141 23-Apr-2010 rmind

branches: 1.141.2;
Replace M_IOV and some malloc(9)s with kmem(9), and while there:
- Fix invalid free (M_TEMP vs M_IOV) in do_sys_recvmsg(), spotted by jakllsch@.
Also, same fix in osf1_sys_sendmsg_xopen().
- Fix attempt to free non-allocated memory in error path in netbsd32___getfh30().
- Plug a memory leak in compat_43_netbsd32_orecvmsg().


Revision tags: yamt-nfs-mp-base9 uebayasi-xip-base
# 1.140 21-Jan-2010 pgoyette

branches: 1.140.2; 1.140.4;
Remove unnecessary call to kauth_cred_free().

This resolves an occassional crash I'd been experiencing as reported on
current-users@

Fix suggested by and OK elad@


# 1.139 29-Dec-2009 elad

Add credentials to to sockets.

We don't need any deferred free etc. because we no longer free the
credentials in interrupt context.

Tons of help from matt@, thanks!


# 1.138 20-Dec-2009 dsl

If a multithreaded app closes an fd while another thread is blocked in
read/write/accept, then the expectation is that the blocked thread will
exit and the close complete.
Since only one fd is affected, but many fd can refer to the same file,
the close code can only request the fs code unblock with ERESTART.
Fixed for pipes and sockets, ERESTART will only be generated after such
a close - so there should be no change for other programs.
Also rename fo_abort() to fo_restart() (this used to be fo_drain()).
Fixes PR/26567


Revision tags: matt-premerge-20091211
# 1.137 09-Dec-2009 dsl

Rename fo_drain() to fo_abort(), 'drain' is used to mean 'wait for output
do drain' in many places, whereas fo_drain() was called in order to force
blocking read()/write() etc calls to return to userspace so that a close()
call from a different thread can complete.
In the sockets code comment out the broken code in the inner function,
it was being called from compat code.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 jym-xensuspend-nbase yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.136 04-Apr-2009 ad

Add fileops::fo_drain(), to be called from fd_close() when there is more
than one active reference to a file descriptor. It should dislodge threads
sleeping while holding a reference to the descriptor. Implemented only for
sockets but should be extended to pipes, fifos, etc.

Fixes the case of a multithreaded process doing something like the
following, which would have hung until the process got a signal.

thr0 accept(fd, ...)
thr1 close(fd)


Revision tags: nick-hppapmap-base2
# 1.135 21-Jan-2009 yamt

branches: 1.135.2;
restore the pre socket locking patch signal behaviour.
this fixes a busy-loop in nfs_connect.


Revision tags: netbsd-5-0-RC1 haad-dm-base2 haad-nbase2 ad-audiomp2-base netbsd-5-base matt-mips64-base2 haad-dm-base1 wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 haad-dm-base mjf-devfs2-base
# 1.134 06-Aug-2008 plunky

branches: 1.134.2; 1.134.4;
Convert socket options code to use a sockopt structure
instead of laying everything into an mbuf.

approved by core


Revision tags: simonb-wapbl-nbase simonb-wapbl-base
# 1.133 24-Jun-2008 ad

branches: 1.133.2;
Nothing uses getsock/getvnode any more.


Revision tags: wrstuden-revivesa-base-1 yamt-pf42-base4 yamt-pf42-base3 wrstuden-revivesa-base
# 1.132 30-May-2008 rmind

branches: 1.132.2;
do_sys_accept: release the reference to sock in few error paths.
Should fix PR/38790, report and test-case by Nicolas Joly.


Revision tags: hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2
# 1.131 28-Apr-2008 martin

branches: 1.131.2;
Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.130 24-Apr-2008 ad

branches: 1.130.2;
Network protocol interrupts can now block on locks, so merge the globals
proclist_mutex and proclist_lock into a single adaptive mutex (proc_lock).
Implications:

- Inspecting process state requires thread context, so signals can no longer
be sent from a hardware interrupt handler. Signal activity must be
deferred to a soft interrupt or kthread.

- As the proc state locking is simplified, it's now safe to take exit()
and wait() out from under kernel_lock.

- The system spends less time at IPL_SCHED, and there is less lock activity.


# 1.129 24-Apr-2008 ad

Merge the socket locking patch:

- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.

With much feedback from matt@ and plunky@.


Revision tags: yamt-pf42-baseX yamt-pf42-base ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14
# 1.128 21-Mar-2008 ad

branches: 1.128.2;
Catch up with descriptor handling changes. See kern_descrip.c revision
1.173 for details.


Revision tags: keiichi-mipv6-nbase nick-net80211-sync-base keiichi-mipv6-base matt-armv6-nbase mjf-devfs-base hpcarm-cleanup-base
# 1.127 06-Feb-2008 ad

branches: 1.127.6;
Don't lock the socket to set/clear FNONBLOCK. Just set it atomically.


Revision tags: vmlocking2-base3 bouyer-xeni386-nbase bouyer-xeni386-base matt-armv6-base
# 1.126 26-Dec-2007 ad

Merge more changes from vmlocking2, mainly:

- Locking improvements.
- Use pool_cache for more items.


# 1.125 20-Dec-2007 dsl

Convert all the system call entry points from:
int foo(struct lwp *l, void *v, register_t *retval)
to:
int foo(struct lwp *l, const struct foo_args *uap, register_t *retval)
Fixup compat code to not write into 'uap' and (in some cases) to actually
pass a correctly formatted 'uap' structure with the right name to the
next routine.
A few 'compat' routines that just call standard ones have been deleted.
All the 'compat' code compiles (along with the kernels required to test
build it).
98% done by automated scripts.


# 1.124 16-Dec-2007 elad

Make solisten() take an lwp pointer like the rest, so it can be passed down
to pr_usrreq.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 vmlocking-nbase jmcneill-pm-base reinoud-bufcleanup-base
# 1.123 24-Nov-2007 dyoung

branches: 1.123.2; 1.123.6;
Pass the mbuf type (e.g., MT_SONAME, MT_SOOPTS) as the second
argument to getsockmbuf().


Revision tags: jmcneill-base bouyer-xenamd64-base2 yamt-x86pmap-base4 bouyer-xenamd64-base yamt-x86pmap-base3 yamt-x86pmap-base2 vmlocking-base
# 1.122 05-Oct-2007 dyoung

branches: 1.122.4;
Use getsombuf().


Revision tags: yamt-x86pmap-base
# 1.121 19-Sep-2007 christos

branches: 1.121.2;
minor nits; no code change.


# 1.120 19-Sep-2007 dyoung

1) Introduce a new socket option, (SOL_SOCKET, SO_NOHEADER), that
tells a socket that it should both add a protocol header to tx'd
datagrams and remove the header from rx'd datagrams:

int onoff = 1, s = socket(...);
setsockopt(s, SOL_SOCKET, SO_NOHEADER, &onoff);

2) Add an implementation of (SOL_SOCKET, SO_NOHEADER) for raw IPv4
sockets.

3) Reorganize the protocols' pr_ctloutput implementations a bit.
Consistently return ENOPROTOOPT when an option is unsupported,
and EINVAL if a supported option's arguments are incorrect.
Reorganize the flow of code so that it's more clear how/when
options are passed down the stack until they are handled.

Shorten some pr_ctloutput staircases for readability.

4) Extract common mbuf code into subroutines, add new sockaddr
methods, and introduce a new subroutine, fsocreate(), for reuse
later; use it first in sys_socket():

struct mbuf *m_getsombuf(struct socket *so)

Create an mbuf and make its owner the socket `so'.

struct mbuf *m_intopt(struct socket *so, int val)

Create an mbuf, make its owner the socket `so', put the
int `val' into it, and set its length to sizeof(int).


int fsocreate(..., int *fd)

Create a socket, a la socreate(9), put the socket into the
given LWP's descriptor table, return the descriptor at `fd'
on success.

void *sockaddr_addr(struct sockaddr *sa, socklen_t *slenp)
const void *sockaddr_const_addr(const struct sockaddr *sa, socklen_t *slenp)

Extract a pointer to the address part of a sockaddr. Write
the length of the address part at `slenp', if `slenp' is
not NULL.

socklen_t sockaddr_getlen(const struct sockaddr *sa)

Return the length of a sockaddr. This just evaluates to
sa->sa_len. I only add this for consistency with code that
appears in a portable userland library that I am going to
import.

const struct sockaddr *sockaddr_any(const struct sockaddr *sa)

Return the "don't care" sockaddr in the same family as
`sa'. This is the address a client should sobind(9) if it
does not care the source address and, if applicable, the
port et cetera that it uses.

const void *sockaddr_anyaddr(const struct sockaddr *sa, socklen_t *slenp)

Return the "don't care" sockaddr in the same family as
`sa'. This is the address a client should sobind(9) if it
does not care the source address and, if applicable, the
port et cetera that it uses.


Revision tags: nick-csl-alignment-base5
# 1.119 06-Sep-2007 rmind

do_sys_sendmsg: Plug a possible leak.
From CID: 4535


# 1.118 01-Sep-2007 dsl

Don't error calls to copy socket addresses to userspace when the application
has provided a non-null buffer pointer and a zero length.


# 1.117 27-Aug-2007 dsl

ktrace socket control structures (ie msghdr, address etc) using ktrkuser().


# 1.116 15-Aug-2007 ad

branches: 1.116.2;
Changes to make ktrace LKM friendly and reduce ifdef KTRACE. Proposed
on tech-kern.


Revision tags: matt-mips64-base nick-csl-alignment-base
# 1.115 15-Jul-2007 dsl

branches: 1.115.2; 1.115.6;
Remove non-user flags (especially MSG_IOVUSRSPACE) from mp->msg_flags
before passing to so_receive.
This may (or may not) have any effect...


Revision tags: mjf-ufs-trans-base
# 1.114 01-Jul-2007 dsl

Check for SOL_SOCKET when checking for SCM_RIGHTS.


# 1.113 24-Jun-2007 dsl

Split sys_getpeername() and sys_getsockname() so they can be called when the
'name' is wanted in kernel code.
Similarly split sys_accept() and change the split in recvmsg() so that it
is useful to the compat functions, recvit() is removed and replaced by
do_sys_recvmsg().
Factor out the code that writes socked names to userspace (from mbuf) to
avoid replicated code.
Extract the code that writes socket 'control' (CMSG) data out to userspace,
being more careful about the 'fd' that may exist inside SCM_RIGHTS msgs.
(they still get lost if some of the latter copyout calls fail).
Since these are new functions, and old LKMs will fail to load.


# 1.112 02-Jun-2007 enami

- Fix obvious typos so that sendto(2) works.
- Wrap lines again.


# 1.111 01-Jun-2007 dsl

Split sys_bind() and sys_connect() so that compat code can use common code
once the 'address' has been copied into an mbuf.
Add extra flags for 'struct msghdr.msg_flags' to indicate that the address
and control are already in mbufs, and that the uio structure is in userspace
for sending data, rename sendit() to do_sys_sendmsg() to ensure no old code
passes in random flags.
Changes to compat code to use new functions - removing some stackgap use.
Fix a 'use after free' in compat_43_sys_recvmsg.
I ***THINK*** the code that converts 'cmsg' formatted data is borked!
svr4_stream.c ought to be generated from svr4_32_stream.c during the build.


Revision tags: yamt-idlelwp-base8
# 1.110 13-May-2007 dsl

Fallout from caddr_t deletion - remove a load of redundant (void *) casts.


# 1.109 18-Apr-2007 yamt

sys_accept: fix usecount botch and double soclose in rev.1.108.


# 1.108 15-Apr-2007 yamt

sys_accept: don't leak a socket on error.


Revision tags: thorpej-atomic-base
# 1.107 04-Mar-2007 christos

branches: 1.107.2; 1.107.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


Revision tags: ad-audiomp-base post-newlock2-merge
# 1.106 09-Feb-2007 ad

branches: 1.106.2;
Merge newlock2 to head.


Revision tags: newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base netbsd-4-base
# 1.105 01-Nov-2006 yamt

branches: 1.105.2;
remove some __unused from function parameters.


# 1.104 23-Oct-2006 elad

PR/34873: Ryo Shimizu: sendmsg() can cause kernel panic

Patch applied, thanks!


Revision tags: yamt-splraiseipl-base2
# 1.103 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


Revision tags: yamt-splraiseipl-base yamt-pdpolicy-base9 yamt-pdpolicy-base8 rpaulo-netinet-merge-pcb-base
# 1.102 22-Aug-2006 seanb

branches: 1.102.2; 1.102.4;
Don't leave a dangling socket (no associated struct file) if
user supplied a bad name or anamelen parameter to accept(2).
If bad paramaters were suplied and a copyout() failed, the
struct file was cleaned up but not the associated socket. This
could leave sockets in CLOSE_WAIT that could never be closed.


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base7
# 1.101 23-Jul-2006 ad

branches: 1.101.2;
Use the LWP cached credentials where sane.


# 1.100 26-Jun-2006 mrg

version the socket(2) syscall. for compat30 socket, we use
EPROTONOSUPPORT instead of EAFNOSUPPORT.

from pavel@ with a little bit of clean up from myself.

XXX: netbsd32 (and perhaps other emulations) should be able
XXX: to call the standard socket calls for this i think, but
XXX: revisit this at another time.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base yamt-pdpolicy-base5 chap-midi-base simonb-timecounters-base
# 1.99 16-May-2006 christos

branches: 1.99.4;
Don't set mature an fd that has been ffree'd


Revision tags: elad-kernelauth-base
# 1.98 11-May-2006 christos

Add MSG_NOSIGNAL (from FreeBSD)


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2 yamt-pdpolicy-base
# 1.97 01-Mar-2006 yamt

branches: 1.97.2; 1.97.4; 1.97.6;
merge yamt-uio_vmspace branch.

- use vmspace rather than proc or lwp where appropriate.
the latter is more natural to specify an address space.
(and less likely to be abused for random purposes.)
- fix a swdmover race.


Revision tags: yamt-uio_vmspace-base5
# 1.96 26-Dec-2005 perry

branches: 1.96.2; 1.96.4; 1.96.6;
u_intN_t -> uintN_t


# 1.95 11-Dec-2005 christos

merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base ktrace-lwp-base
# 1.94 03-Sep-2005 martin

In adjust_rights() Use CMSG_SPACE() to calculate the number of
filedescriptors passed in this message - the counterpart in
unp_externalize does this as well.

Note that CMSG_SPACE(0) does not make sense, since it does not invoke
the alignment magic - so use CMSG_SPACE(sizeof(int)) and adjust the
calculated total later.

This fixes the postfix conection cache for 64bit platforms. Previously
the number of passed filed descriptors (nfds) would have been
calculeted too high, causing the fdrelease() of uninitialized junk.


# 1.93 03-Sep-2005 martin

minor knf tweak


# 1.92 30-May-2005 martin

branches: 1.92.2;
Close additional file descriptors if we set MSG_CTRUNC in a SCM_RIGHTS
message. From der Mouse in PR kern/30370.


# 1.91 29-May-2005 christos

- add const.
- remove unnecessary casts.
- add __UNCONST casts and mark them with XXXUNCONST as necessary.


Revision tags: yamt-km-base4 yamt-km-base3 netbsd-3-base kent-audio2-base
# 1.90 26-Feb-2005 perry

branches: 1.90.2;
nuke trailing whitespace


Revision tags: yamt-km-base2 yamt-km-base kent-audio1-beforemerge kent-audio1-base
# 1.89 30-Nov-2004 christos

branches: 1.89.4; 1.89.6;
Cloning cleanup:
1. make fileops const
2. add 2 new negative errno's to `officially' support the cloning hack:
- EDUPFD (used to overload ENODEV)
- EMOVEFD (used to overload ENXIO)
3. Created an fdclone() function to encapsulate the operations needed for
EMOVEFD, and made all cloners use it.
4. Centralize the local noop/badop fileops functions to:
fnullop_fcntl, fnullop_poll, fnullop_kqfilter, fbadop_stat


# 1.88 22-May-2004 jonathan

Eliminate several uses of `curproc' from the socket-layer code and from NFS.

Add a new explicit `struct proc *p' argument to socreate(), sosend().
Use that argument instead of curproc. Follow-on changes to pass that
argument to socreate(), sosend(), and (*so->so_send)() calls.
These changes reviewed and independently recoded by Matt Thomas.

Changes to soreceive() and (*dom->dom_exernalize() from Matt Thomas:
pass soreceive()'s struct uio* uio->uio_procp to unp_externalize().
Eliminate curproc from unp_externalize. Also, now soreceive() uses
its uio->uio_procp value, pass that same value downward to
((pr->pru_usrreq)() calls for consistency, instead of (struct proc * )0.

Similar changes in sys/nfs to eliminate (most) uses of curproc,
either via the req-> r_procp field of a struct nfsreq *req argument,
or by passing down new explicit struct proc * arguments.

Reviewed by: Matt Thomas, posted to tech-kern.
NB: The (*pr->pru_usrreq)() change should be tested on more (all!) protocols.


# 1.87 18-May-2004 ragge

Fix connect() "bug": If connect() is interrupted by a signal, the connection
attempt is terminated, so if a process needs frequent timer interrupts
it can't ever connect() to a machine far away.

Bug found by Erik Lundgren, bugfix (for the same problem) is similar to
the way FreeBSD solved the same problem.

As a side effect, the new connect() behaviour conformes to Posix.


Revision tags: netbsd-2-0-base
# 1.86 29-Nov-2003 matt

branches: 1.86.2;
Restore a change that made AF_LOCAL sockets block on connect(2) until
accepted. However, this time this behavor is not the default. Instead
it must enabled by using the LOCAL_CONNWAIT socket option on either the
connecting or accepting socket.


# 1.85 29-Nov-2003 perry

Revert a change that altered the semantics of AF_LOCAL sockets. Sadly
this made us API incompatible with other Unixes.


# 1.84 13-Nov-2003 chs

eliminate uvm_useracc() in favor of checking the return value of
copyin() or copyout().

uvm_useracc() tells us whether the mapping permissions allow access to
the desired part of an address space, and many callers assume that
this is the same as knowing whether an attempt to access that part of
the address space will succeed. however, access to user space can
fail for reasons other than insufficient permission, most notably that
paging in any non-resident data can fail due to i/o errors. most of
the callers of uvm_useracc() make the above incorrect assumption. the
rest are all misguided optimizations, which optimize for the case
where an operation will fail. we'd rather optimize for operations
succeeding, in which case we should just attempt the access and handle
failures due to insufficient permissions the same way we handle i/o
errors. since there appear to be no good uses of uvm_useracc(), we'll
just remove it.


# 1.83 04-Sep-2003 matt

Adapt to the new calling conventions of unp_connect2


# 1.82 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.81 29-Jun-2003 fvdl

branches: 1.81.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.


# 1.80 28-Jun-2003 darrenr

Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V


# 1.79 05-Apr-2003 christos

PR/21030: Naoto Shimazaki: fcntl to accepted socket does not work properly


# 1.78 26-Feb-2003 matt

Remove leftover MBUFTRACE asserts.


# 1.77 26-Feb-2003 drochner

deactivate MBUFTRACE related KASSERT()s in the !MBUFTRACE case


# 1.76 26-Feb-2003 matt

Add MBUFTRACE kernel option.
Do a little mbuf rework while here. Change all uses of MGET*(*, M_WAIT, *)
to m_get*(M_WAIT, *). These are not performance critical and making them
call m_get saves considerable space. Add m_clget analogue of MCLGET and
make corresponding change for M_WAIT uses.
Modify netinet, gem, fxp, tulip, nfs to support MBUFTRACE.
Begin to change netstat to use sysctl.


# 1.75 18-Jan-2003 thorpej

Merge the nathanw_sa branch.


Revision tags: nathanw_sa_before_merge fvdl_fs64_base gmcgarry_ctxsw_base gmcgarry_ucred_base nathanw_sa_base
# 1.74 26-Nov-2002 christos

si_ -> sel_ to avoid conflicts with siginfo.


# 1.73 25-Nov-2002 itojun

no need for error check after MEXTMALLOC - jdolecek


# 1.72 25-Nov-2002 itojun

MEXTMALLOC() can fail even if M_WAITOK, if arg is too big for malloc().


Revision tags: kqueue-aftermerge
# 1.71 23-Oct-2002 jdolecek

merge kqueue branch into -current

kqueue provides a stateful and efficient event notification framework
currently supported events include socket, file, directory, fifo,
pipe, tty and device changes, and monitoring of processes and signals

kqueue is supported by all writable filesystems in NetBSD tree
(with exception of Coda) and all device drivers supporting poll(2)

based on work done by Jonathan Lemon for FreeBSD
initial NetBSD port done by Luke Mewburn and Jason Thorpe


Revision tags: kqueue-beforemerge kqueue-base
# 1.70 04-Sep-2002 matt

Use the queue macros from <sys/queue.h> instead of referring to the queue
members directly. Use *_FOREACH whenever possible.


Revision tags: gehenna-devsw-base
# 1.69 31-May-2002 itojun

support setsockopt() with larger data (up to MCLBYTES).
From: Hitoshi Asaeda <Hitoshi.Asaeda@sophia.inria.fr>


Revision tags: netbsd-1-6-PATCH002-RELEASE netbsd-1-6-PATCH002 netbsd-1-6-PATCH002-RC4 netbsd-1-6-PATCH002-RC3 netbsd-1-6-PATCH002-RC2 netbsd-1-6-PATCH002-RC1 netbsd-1-6-PATCH001 netbsd-1-6-PATCH001-RELEASE netbsd-1-6-PATCH001-RC3 netbsd-1-6-PATCH001-RC2 netbsd-1-6-PATCH001-RC1 netbsd-1-6-RELEASE netbsd-1-6-RC3 netbsd-1-6-RC2 netbsd-1-6-RC1 netbsd-1-6-base eeh-devprop-base newlock-base ifpoll-base
# 1.68 11-Feb-2002 jdolecek

branches: 1.68.8;
Switch default for pipes to the faster John S. Dyson's implementation.
Old, socketpair-based ones are available with option PIPE_SOCKETPAIR.


Revision tags: thorpej-mips-cache-base
# 1.67 12-Nov-2001 lukem

add RCSIDs


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2
# 1.66 16-Sep-2001 wiz

branches: 1.66.2;
Spell 'occurred' with two 'r's.


Revision tags: post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.65 17-Jul-2001 jdolecek

branches: 1.65.2;
Expel MSG_COMPAT/COMPAT_OLDSOCK, make the COMPAT_43 wrappers
arrange things as needed. Unfortunately, the check in sockargs()
have to stay, since 4.3BSD bind(2), connect(2) and sendto(2) were
not versioned at the time :(

This code was tested to pass regression tests.


# 1.64 01-Jul-2001 matt

branches: 1.64.2;
Use consistent types for len. Limit sockarg length to reasonable values.


# 1.63 25-Jun-2001 jdolecek

Back off the sendit()/recvit() change, some have problems with it


# 1.62 25-Jun-2001 jdolecek

sys_connect(): fix the call to FILE_UNUSE() so that it's done on return, rather
than immediatelly after getsock() call


# 1.61 25-Jun-2001 jdolecek

Add 'kernsa' parameter for sendit()/recvit(); if nonzero, msg->msg_name
is supposed to point directly to struct mbuf or struct sockaddr in kernel
space as appropriate, rather than being a pointer to memory in userland.

This is to be used by compat/* when emulation needs to wrap
send{to|msg}(2)/recv{from|msg}(2) and modify the passed struct
sockaddr.


# 1.60 16-Jun-2001 jdolecek

Add port of high performance pipe implementation written by John S. Dyson
for FreeBSD project. Besides huge speed boost compared with socketpair-based
pipes, this implementation also uses pagable kernel memory instead of mbufs.

Significant differences to FreeBSD version:
* uses uvm_loan() facility for direct write
* async/SIGIO handling correct also for sync writer, async reader
* limits settable via sysctl, amountpipekva and nbigpipes available via sysctl
* pipes are unidirectional - this is enforced on file descriptor level
for now only, the code would be updated to take advantage of it
eventually
* uses lockmgr(9)-based locks instead of home brew variant
* scatter-gather write is handled correctly for direct write case, data
is transferred by PIPE_DIRECT_CHUNK bytes maximum, to avoid running out of kva

All FreeBSD/NetBSD specific code is within appropriate #ifdef, in preparation
to feed changes back to FreeBSD tree.

This pipe implementation is optional for now, add 'options NEW_PIPE'
to your kernel config to use it.


# 1.59 14-Jun-2001 thorpej

Fix a partial construction problem that can cause race conditions
between creation of a file descriptor and close(2) when using kernel
assisted threads. What we do is stick descriptors in the table, but
mark them as "larval". This causes essentially everything to treat
it as a non-existent descriptor, except for fdalloc(), which sees a
filled slot so that it won't (incorrectly) allocate it again. When
a descriptor is fully constructed, the code that has constructed it
marks it as "mature" (which actually clears the "larval" flag), and
things continue to work as normal.

While here, gather all the code that gets a descriptor from the table
into a fd_getfile() function, and call it, rather than having the
same (sometimes incorrect) code copied all over the place.


# 1.58 06-May-2001 manu

implement the recently introduced EMUL_BSD_ASYNCIO_PIPE emulation package
flag.

EMUL_BSD_ASYNCIO_PIPE notes that the emulated binaries expect the original
BSD pipe behavior for asynchronous I/O, which is to fire SIGIO on read() and
write(). OSes without this flag do not expect any SIGIO to be fired on
read() and write() for pipes, even when async I/O was requested. As far as
we know, the OSes that need EMUL_BSD_ASYNCIO_PIPE are NetBSD, OSF/1 and
Darwin.


Revision tags: thorpej_scsipi_beforemerge thorpej_scsipi_nbase thorpej_scsipi_base
# 1.57 27-Feb-2001 lukem

branches: 1.57.2;
convert to ANSI KNF


# 1.56 10-Dec-2000 fvdl

Make sobind() take a struct proc *. It already took curproc and
passed it down to the appropriate usrreq function, and this
allows usage for contexts that need to be explicitly different
from curproc (like in the NFS code when binding to a reserved port).


# 1.55 24-Nov-2000 jdolecek

define COMPAT_OLDSOCK unconditionally - the code is used virtually for all
emulations besides NetBSD, and this way it's LKM-safe


# 1.54 02-Aug-2000 thorpej

MALLOC()/FREE() are not to be used for variable sized allocations.


# 1.53 27-Jun-2000 mrg

remove include of <vm/vm.h>


Revision tags: netbsd-1-5-RELEASE netbsd-1-5-BETA2 netbsd-1-5-BETA netbsd-1-5-ALPHA2 netbsd-1-5-base minoura-xpg4dl-base
# 1.52 27-May-2000 sommerfeld

branches: 1.52.4;
Reduce use of curproc in several places:

- Change ktrace interface to pass in the current process, rather than
p->p_tracep, since the various ktr* function need curproc anyway.

- Add curproc as a parameter to mi_switch() since all callers had it
handy anyway.

- Add a second proc argument for inferior() since callers all had
curproc handy.

Also, miscellaneous cleanups in ktrace:

- ktrace now always uses file-based, rather than vnode-based I/O
(simplifies, increases type safety); eliminate KTRFLAG_FD & KTRFAC_FD.
Do non-blocking I/O, and yield a finite number of times when receiving
EWOULDBLOCK before giving up.

- move code duplicated between sys_fktrace and sys_ktrace into ktrace_common.

- simplify interface to ktrwrite()


# 1.51 30-Mar-2000 augustss

Get rid of register declarations.


# 1.50 23-Mar-2000 thorpej

Implement fdremove() which is used in place of all the code that
did the "fdp->fd_ofiles[fd] = 0" assignment; fdremove() make sure
the fd_freefiles hints stay in sync.

From OpenBSD.


Revision tags: chs-ubc2-newbase wrstuden-devbsize-19991221 wrstuden-devbsize-base comdex-fall-1999-base fvdl-softdep-base
# 1.49 05-Nov-1999 mycroft

branches: 1.49.2;
Fix recent bug in sys_accept(): we must remove the file descriptor from the
file descriptor table before freeing the file description.


# 1.48 30-Oct-1999 enami

back out unnecessary stylistic changes in recent changes, to keep coding
style closer to NKF.


# 1.47 27-Oct-1999 jdolecek

minor cleanup of previous - avoid goto and code duplication


# 1.46 27-Oct-1999 darrenr

patch from Greg A. Woods to fix panic problems with code that attempts to
recover from failures to accept a socket successfully. Problem suggested
by this:
> It would appear (from two "panic: closef: count < 0" failures in less
> than 12 hours) that Darren's fix to accept(2) for lost file descriptors
> isn't quite correct. His fix inserts a call to closef() to handle one
> of several possible error conditions. However everywhere else in the
> socket code in the same file where falloc() cleanup is necessary the
> function used is ffree().


Revision tags: chs-ubc2-base
# 1.45 01-Jul-1999 itojun

branches: 1.45.2; 1.45.4; 1.45.6;
IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.


# 1.44 01-Jul-1999 darrenr

fix sys_accept() to return EOPNOTSUPP for protocols which don't support
listen/accept (PR_LISTEN flag in protosw) and detect obvious faults in
parameters passed. It is still possible for the address used for copying
the socket information to become invalid between that check and the copyout
so close the connection's allocated fd if the copyout fails so that we can
return EFAULT without allocating an fd and the application not knowing about
it. Ideally we'd be able to queue the connection back up so a later accept
could retrieve it but unfortunately that's not possible.


# 1.43 05-May-1999 thorpej

Add "use counting" to file entries. When closing a file, and it's reference
count is 0, wait for use count to drain before finishing the close.

This is necessary in order for multiple processes to safely share file
descriptor tables.


# 1.42 30-Apr-1999 cgd

add checks for COMPAT_OSF1 in the appropriate places


Revision tags: netbsd-1-4-RELEASE netbsd-1-4-base
# 1.41 10-Feb-1999 kleink

branches: 1.41.2; 1.41.4; 1.41.6;
* Due to addition and use of socklen_t, make the socket option and address
arguments passed to accept(), bind(), connect(), getpeername(), getsockname(),
getsockopt(), recvfrom(), sendto() and sendmsg() unsigned, which also elimiates
a few casts.
* Reflect the (now) signedness of msg_iovlen, which necessiates the addition
of a few casts.


# 1.40 18-Dec-1998 drochner

solve the COMPAT_OLDSOCK/MSG_COMPAT problem differently:
The source files which need MSG_COMPAT define COMPAT_OLDSOCK.


Revision tags: kenh-if-detach-base
# 1.39 26-Nov-1998 mycroft

Revert the functional change in rev 1.38; permit a msg_iovlen of 0.
There are two reasons for this:
* We should be able to pass file descriptors without sending any data.
* We could send zero-length iovecs anyway (but we shouldn't have to do this).
Also, msg_iovlen is already a u_int, so delete a bunch of casts.


Revision tags: chs-ubc-base
# 1.38 04-Aug-1998 kleink

Per XNS Issue 5, calling recvmsg(2) or sendmsg(2) with an msg.msg_iovlen less
than or equal to 0 shall fail with EMSGSIZE; the latter condition was not being
checked for. Also, document the msg.msg_iovlen > {IOV_MAX} case.


# 1.37 04-Aug-1998 kleink

UIO_MAXIOV -> IOV_MAX


# 1.36 04-Aug-1998 perry

Abolition of bcopy, ovbcopy, bcmp, and bzero, phase one.
bcopy(x, y, z) -> memcpy(y, x, z)
ovbcopy(x, y, z) -> memmove(y, x, z)
bcmp(x, y, z) -> memcmp(x, y, z)
bzero(x, y) -> memset(x, 0, y)


# 1.35 03-Aug-1998 kleink

Fix two off-by-one bugs, both present in each recvmsg(2) and sendmsg(2):
* the first one would cause an unnecessary malloc() of iovec storage for
a msg_iovlen of UIO_SMALLIOV although the required amount of memory has
been allocated on the stack.
* the second one would cause a recvmsg() or sendmsg() with a msg_iovlen of
UIO_MAXIOV to fail with EMSGSIZE, which is also a violation of XNS5.


# 1.34 31-Jul-1998 perry

fix sizeofs so they comply with the KNF style guide. yes, it is pedantic.


Revision tags: eeh-paddr_t-base
# 1.33 29-Jul-1998 thorpej

branches: 1.33.2;
When checking for overflow in the residual count, test against SSIZE_MAX.
The read/write system calls return ssize_t because -1 is used to indicate
error, therefore the transfer size MUST be limited to SSIZE_MAX, otherwise
garbage can be returned to the user.

There is NO change from existing behavior here, only a more precise
definition of that the semantics are, except in the Alpha case, where
the full SSIZE_MAX transfer size can now be realized (ssize_t is 64-bit
on the Alpha).


# 1.32 18-Jul-1998 lukem

use AF_LOCAL instead of AF_UNIX


# 1.31 25-Jun-1998 thorpej

defopt KTRACE


# 1.30 25-Apr-1998 matt

Hook for 0-copy (or other optimized) sends and receives


# 1.29 01-Mar-1998 fvdl

Merge with Lite2 + local changes


# 1.28 06-Feb-1998 thorpej

When copying out multiple control messages, ensure that the next control
message is aligned. From David Borman <dab@bsdi.com>.


# 1.27 07-Jan-1998 thorpej

Make insertion and removal of sockets from the partial and incoming
connections queues O(C) rather than O(N).


# 1.26 07-Jan-1998 thorpej

Fix bug in recvit() that would cause recvmsg() to only receive one
control message, even if there were multiple control messages on
the queue. From Jean-Luc Richier <Jean-Luc.Richier@imag.fr>, in
bug report kern/4700.


Revision tags: netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base thorpej-signal-base marc-pcmcia-bp marc-pcmcia-base
# 1.25 26-Jun-1997 thorpej

branches: 1.25.8;
Use UCHAR_MAX rather than "255" when sanity-checking the length of a
sockaddr in sockargs().


# 1.24 26-Jun-1997 thorpej

In sockargs():
- Add a comment describing my feelings about this interface, in general.
- Remove the COMPAT_OLDSOCK length hack. Instead, if the socket argument
is too long to fit in an mbuf, allocate enough external storage to
hold it.
- If the socket argument is a sockaddr, don't allow the length to be
greater than 255, as that would overflow sa_len.
Many thanks to enami tsugutomo <enami@cv.sony.co.jp> for his sanity checking.


Revision tags: is-newarp-before-merge is-newarp-base
# 1.23 22-Dec-1996 cgd

* catch up with system call argument type fixups/const poisoning.
* Fix arguments to various copyin()/copyout() invocations, to avoid
gratuitous casts.
* Some KNF formatting fixes
* Change sockargs()'s second argument to be a const void *, to help
with dealing with the syscall argument type fixups/const poisoning.


# 1.22 14-Jun-1996 cgd

avoid unnecessary checks of m_get/MGET/etc.'s return values. When
they're called with M_WAIT, they are defined to never return NULL.


# 1.21 22-May-1996 mycroft

Pass a proc pointer down to the usrreq and pcbbind functions for PRU_ATTACH, PRU_BIND and
PRU_CONTROL. The usrreq interface really needs to be split up, but this will have to wait.
Remove SS_PRIV completely.


Revision tags: netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.20 17-May-1996 pk

branches: 1.20.4;
Don't touch retval[] in socketpair(); manual page says this system call
return 0 on success (PR#2428).


# 1.19 09-Feb-1996 christos

More proto fixes


# 1.18 04-Feb-1996 christos

First pass at prototyping


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.17 10-Oct-1995 mycroft

Add hooks for COMPAT_FREEBSD, from Noriyuki Soda.


# 1.16 07-Oct-1995 mycroft

Prefix names of system call implementation functions with `sys_'.


# 1.15 19-Sep-1995 thorpej

Make system calls conform to a standard prototype and bring those
prototypes into scope.


# 1.14 12-Aug-1995 mycroft

splnet --> splsoftnet


# 1.13 24-Jun-1995 christos

Extracted all of the compat_xxx routines, and created a library [libcompat]
for them. There are a few #ifdef COMPAT_XX remaining, but they are not easy
or worth eliminating (yet).


# 1.12 10-May-1995 christos

tty_tb.c: need to include ioctl_compat.h in order to compile.
sysv_shm.c: make shm_find_segment_by_shmid global so it can be used by
COMPAT_HPUX. There should be a better way...
rest: Add #ifdef COMPAT_HPUX where needed


# 1.11 05-Mar-1995 fvdl

Extended a couple of defines with "|| defined(COMPAT_LINUX)" to make
things compile without requiring COMPAT_43 and/or COMPAT_09.


# 1.10 30-Oct-1994 cgd

be more careful with types, also pull in headers where necessary.


# 1.9 20-Oct-1994 cgd

update for new syscall args description mechanism


Revision tags: netbsd-1-0-PATCH06 netbsd-1-0-PATCH05 netbsd-1-0-PATCH04 netbsd-1-0-PATCH03 netbsd-1-0-PATCH02 netbsd-1-0-PATCH1 netbsd-1-0-PATCH0 netbsd-1-0-RELEASE netbsd-1-0-base
# 1.8 29-Jun-1994 cgd

New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.7 04-May-1994 mycroft

Add return types where missing. Simplify some of the compat conditionals.
Include compat code if COMPAT_SUNOS with or without COMPAT_43.


# 1.6 18-Dec-1993 mycroft

Canonicalize all #includes.


Revision tags: magnum-base netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.5 17-Jul-1993 mycroft

branches: 1.5.4;
Finish moving struct definitions outside of function declarations.


# 1.4 27-Jun-1993 andrew

* ansifications
* Yuval Yarom's socket recv(2) fixes - access rights problems (see also
uipc_socket.c).


# 1.3 22-May-1993 cgd

add include of select.h if necessary for protos, or delete if extraneous


# 1.2 18-May-1993 cgd

make kernel select interface be one-stop shopping & clean it all up.


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


# 1.209 13-Oct-2023 ad

Use cv_fdrestart() to implement fo_restart.


# 1.208 04-Oct-2023 ad

kauth_cred_hold(): return cred verbatim so that donating a reference to
another data structure can be done more elegantly.


# 1.207 09-Sep-2023 ad

Fix a ~16 year old perf regression: when accepting a connection, add a
reference to the caller's credentials rather than copying them.


Revision tags: netbsd-10-base bouyer-sunxi-drm-base
# 1.206 01-Jul-2022 riastradh

sendto(2), recvfrom(2): Scrub internal struct msghdr on stack.

Otherwise this is kernel stack disclosure via ktrace.

Reported-by: syzbot+1d40303b310063778194@syzkaller.appspotmail.com


# 1.205 29-Jun-2022 riastradh

recvmmsg(2): More timespec validation.

Reported-by: syzbot+004ed2f264534bd27312@syzkaller.appspotmail.com
Reported-by: syzbot+6f9014c842c4e78df7bc@syzkaller.appspotmail.com


# 1.204 28-Jun-2022 riastradh

recvmmsg(2): Avoid arithmetic overflow in timeout calculations.

XXX This is not right -- it doesn't actually do anything to time
out...

Reported-by: syzbot+784209d76a94fcc6417b@syzkaller.appspotmail.com


# 1.203 27-Jun-2022 riastradh

sendmsg(2): Avoid buffer overrun in ktrace of invalid cmsghdr.

Reported-by: syzbot+efded148140b23425f5c@syzkaller.appspotmail.com


# 1.202 02-Oct-2021 thorpej

...and correct my terrible spelling.


# 1.201 02-Oct-2021 thorpej

- Strenghen the poll(2) fifo_inout test to ensure that once the reader
has read enough that exactly PIPE_BUF space is available that the FIFO
becomes writable again.
- When creating a FIFO, ensure that the receive low water mark is 1
(a FIFO must be readable when at least 1 byte is available); this
was already the case implicitly, but this makes it explicit.
- Similarly, set the send low water mark to PIPE_BUF to ensure that
the pipe is writable when at least PIPE_BUF bytes of space are available
in the send buffer. Without this change, the strengthened test case
above does not pass (the default send low water mark is larger than
PIPE_BUF; see soreserve()).
- Make the same low water mark changes to the PIPE_SOCKETPAIR case.


Revision tags: thorpej-i2c-spi-conf2-base thorpej-futex2-base thorpej-cfargs2-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base thorpej-i2c-spi-conf-base thorpej-cfargs-base thorpej-futex-base
# 1.200 23-May-2020 ad

Move proc_lock into the data segment. It was dynamically allocated because
at the time we had mutex_obj_alloc() but not __cacheline_aligned.


Revision tags: netbsd-9-2-RELEASE netbsd-9-1-RELEASE bouyer-xenpvh-base2 phil-wifi-20200421 bouyer-xenpvh-base1 phil-wifi-20200411 bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3 netbsd-9-0-RELEASE netbsd-9-0-RC2 ad-namecache-base2 ad-namecache-base1 ad-namecache-base netbsd-9-0-RC1 phil-wifi-20191119 netbsd-9-base phil-wifi-20190609 isaki-audio2-base pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126
# 1.199 12-Nov-2018 hannken

branches: 1.199.4;
sys_recvmmsg: don't defer an error that already gets returned.


# 1.198 07-Nov-2018 hannken

Don't defer errors from sendmmsg(). This matches the linux manpage.

Defer errors from recvmmsg() through so_rerror and
tests and return a deferred error on entry.

Ok: christos@


Revision tags: pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.197 03-Sep-2018 riastradh

Rename min/max -> uimin/uimax for better honesty.

These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER! Some subsystems have

#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)


# 1.196 01-Aug-2018 rjs

Add ioctl(2) handler for kernel part of sctp_peeloff().


# 1.195 31-Jul-2018 rjs

Add getsockopt2() syscall.


Revision tags: pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521
# 1.194 04-May-2018 christos

branches: 1.194.2;
define MBUFTYPES here.


# 1.193 03-May-2018 christos

Fix COMPAT_NETBSD32 cmsg handling:

1. alignment was wrong for > 1 message
2. macros were doing incorrect pointer comparisons, fortunately ending
the iteration early after the fists cmsg instead of crashing.
3. don't output 32 bit ktrace records for cmsg. 32 bit programs running
under emulation on 64 bit systems should produce 64 bit ktrace records
so that the native ktrace can handle the records; remove extra arguments
that are now not needed (the 32 bit msghdr).
4. output the correct type for cmsg trace records.
5. output all the cmsg records in traces instead of just the first one.

Welcome to 8.99.15 because of the argument removal.

XXX: Really all the code should be changed to use the CMSG_{FIRST,NXT}HDR
macros...


Revision tags: pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322
# 1.192 16-Mar-2018 christos

PR/53103: Timo Buhrmester: linux emulation of sendto(2) broken

The sockargs refactoring broke it, because sockargs only works with a user
address. Added an argument to sockargs to indicate where the address is
coming from. Welcome to 8.99.14.


Revision tags: pgoyette-compat-0315 pgoyette-compat-base
# 1.191 12-Feb-2018 maxv

branches: 1.191.2;
Add a KASSERT; we expect *from to be a single mbuf (not chained).


# 1.190 04-Jan-2018 christos

Add a new sockopt member to keep track of the actual size of the option
that should be returned to the caller in getsockopt(2).

(Tom Ivar Helbekkmo)


# 1.189 31-Dec-2017 christos

pass valsize for getsockopt like we do for setsockopt


# 1.188 26-Dec-2017 kamil

Refactor pipe1() and correct a bug in sys_pipe2() (SYS_pipe2)

sys_pipe2() returns two integers (values), the 2nd one is a copy of the 2nd
file descriptor that lands in fildes[2]. This is a side effect of reusing
the code for sys_pipe() (SYS_pipe) and not cleaning it up.

The first returned value is (on success) 0.

Introduced a small refactoring in pipe1() that it does not operate over
retval[], but on an array int[2]. A user sets retval[] for pipe() when
desired and needed.

This refactoring touches compat code: netbsd32, linux, linux32.

Before the changes on NetBSD/amd64:

$ ktruss -i ./a.out
[...]
15131 1 a.out pipe2(0x7f7fff2e62b8, 0) = 0, 4
[...]

After the changes:

$ ktruss -i ./a.out
[...]
782 1 a.out pipe2(0x7f7fff97e850, 0) = 0
[...]

There should not be a visible change for current users.

Sponsored by <The NetBSD Foundation>


Revision tags: tls-maxphys-base-20171202 nick-nhusb-base-20170825 perseant-stdc-iso10646-base
# 1.187 20-Jun-2017 christos

Change len type to be unsigned int for consistency with the input type.
Don't check for negative; it does not matter we clamp anyway. This
broke the compat32 getsockname() where an unitialized socklen_t ended
up randomly negative causing it to fail.


Revision tags: matt-nb8-mediatek-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204
# 1.186 03-Feb-2017 christos

branches: 1.186.6;
expose sendmsg_so and recvmsg_so.


# 1.185 02-Feb-2017 christos

expose copyout_sockname_sb


Revision tags: bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204
# 1.184 03-Dec-2016 christos

branches: 1.184.2;
Add missing ktrkuser


Revision tags: pgoyette-localcount-20161104 nick-nhusb-base-20161004 localcount-20160914
# 1.183 13-Sep-2016 martin

Make the ktrace record written by do_sys_sendmsg/do_sys_recvmsg overridable
by the caller. Use this in compat_netbsd32 to log the 32bit version, so
the 32bit userland kdump is happy.


Revision tags: pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.182 07-Jul-2016 msaitoh

branches: 1.182.2;
KNF. Remove extra spaces. No functional change.


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.181 01-Nov-2015 christos

Don't overwrite the user iov pointer in sendmmsg. Make the send and receive
code look the same.


Revision tags: nick-nhusb-base-20150921
# 1.180 24-Aug-2015 pooka

sprinkle _KERNEL_OPT


# 1.179 22-Jul-2015 maxv

Memory leak. Triggerable from an unprivileged user via COMPAT_43.


Revision tags: nick-nhusb-base-20150606
# 1.178 09-May-2015 rtr

change sosend() to accept sockaddr * instead of mbuf * for nam.

bump to 7.99.16


# 1.177 02-May-2015 rtr

make connect syscall use sockaddr_big and modify pr_{send,connect}
nam parameter type from buf * to sockaddr *.

final commit for parameter type changes to protocol user requests

* bump kernel version to 7.99.15 for parameter type changes to pr_{send,connect}


# 1.176 24-Apr-2015 rtr

make accept, getsockname and getpeername syscalls use sockaddr_big and modify
pr_{accept,sockname,peername} nam parameter type from mbuf * to sockaddr *.

* retained use of mbuftypes[MT_SONAME] for now.
* bump to netbsd version 7.99.12 for parameter type change.

patch posted to tech-net@ 2015/04/19


Revision tags: nick-nhusb-base-20150406
# 1.175 03-Apr-2015 rtr

* change pr_bind to accept struct sockaddr * instead of struct mbuf *
* update protocol bind implementations to use/expect sockaddr *
instead of mbuf *
* introduce sockaddr_big struct for storage of addr data passed via
sys_bind; sockaddr_big is of sufficient size and alignment to
accommodate all addr data sizes received.
* modify sys_bind to allocate sockaddr_big instead of using an mbuf.
* bump kernel version to 7.99.9 for change to pr_bind() parameter type.

Patch posted to tech-net@
http://mail-index.netbsd.org/tech-net/2015/03/15/msg005004.html

The choice to use a new structure sockaddr_big has been retained since
changing sockaddr_storage size would lead to unnecessary ABI change. The
use of the new structure does not preclude future work that increases
the size of sockaddr_storage and at that time sockaddr_big may be
trivially replaced.

Tested by mrg@ and myself, discussed with rmind@, posted to tech-net@


# 1.174 06-Mar-2015 rtr

Return EINVAL if namelen isn't large enough to encompass the expected
members of sockaddr structures. i.e. sa_len and sa_family.

Discussed with and patch by christos@


Revision tags: nick-nhusb-base
# 1.173 05-Sep-2014 matt

branches: 1.173.2;
Try not to use f_data, use f_{vnode,socket,pipe,mqueue,kqueue,ksem} to get
a correctly typed pointer.


Revision tags: netbsd-7-0-RC2 netbsd-7-0-RC1 netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.172 09-Aug-2014 rtr

branches: 1.172.2;
split PRU_CONNECT2 & PRU_PURGEIF function out of pr_generic() usrreq
switches and put into separate functions

- always KASSERT(solocked(so)) even if not implemented
(for PRU_CONNECT2 only)

- replace calls to pr_generic() with req = PRU_CONNECT2 with calls to
pr_connect2()

- replace calls to pr_generic() with req = PRU_PURGEIF with calls to
pr_purgeif()

put common code from unp_connect2() (used by unp_connect() into
unp_connect1() and call out to it when needed

patch only briefly reviewed by rmind@


# 1.171 09-Jul-2014 rtr

* split PRU_PEERADDR and PRU_SOCKADDR function out of pr_generic()
usrreq switches and put into separate functions
xxx_{peer,sock}addr(struct socket *, struct mbuf *).

- KASSERT(solocked(so)) always in new functions even if request
is not implemented

- KASSERT(pcb != NULL) and KASSERT(nam) if the request is
implemented and not for tcp.

* for tcp roll #ifdef KPROF and #ifdef DEBUG code from tcp_usrreq() into
easier to cut & paste functions tcp_debug_capture() and
tcp_debug_trace()

- functions provided by rmind
- remaining use of PRU_{PEER,SOCK}ADDR #define to be removed in a
future commit.

* rename netbt functions to permit consistency of pru function names
(as has been done with other requests already split out).

- l2cap_{peer,sock}addr() -> l2cap_{peer,sock}_addr_pcb()
- rfcomm_{peer,sock}addr() -> rfcomm_{peer,sock}_addr_pcb()
- sco_{peer,sock}addr() -> sco_{peer,sock}_addr_pcb()

* split/refactor do_sys_getsockname(lwp, fd, which, nam) into
two functions do_sys_get{peer,sock}name(fd, nam).

- move PRU_PEERADDR handling into do_sys_getpeername() from
do_sys_getsockname()
- have svr4_stream directly call do_sys_get{sock,peer}name()
respectively instead of providing `which' & fix a DPRINTF string
that incorrectly wrote "getpeername" when it meant "getsockname"
- fix sys_getpeername() and sys_getsockname() to call
do_sys_get{sock,peer}name() without `which' and `lwp' & adjust
comments
- bump kernel version for removal of lwp & which parameters from
do_sys_getsockname()

note: future cleanup to remove struct mbuf * abuse in
xxx_{peer,sock}name()
still to come, not done in this commit since it is easier to do post
split.

patch reviewed by rmind

welcome to 6.99.47


Revision tags: rmind-smpnet-nbase rmind-smpnet-base
# 1.170 18-May-2014 rmind

Add struct pr_usrreqs with a pr_generic function and prepare for the
dismantling of pr_usrreq in the protocols; no functional change intended.
PRU_ATTACH/PRU_DETACH changes will follow soon.

Bump for struct protosw. Welcome to 6.99.62!


# 1.169 17-May-2014 rmind

- fsocreate: set SS_NBIO before the file descriptor is affixed as there is
a theoretical race condition (hard to trigger, though); remove the LWP
parameter and clean up the code a little.
- Sprinkle few comments.
- Remove M_SOOPTS while here.


# 1.168 17-May-2014 rmind

makesocket: set SS_NBIO slightly earlier.


# 1.167 17-May-2014 rmind

Remove trailing whitespaces, wrap long lines, minor KNF; no functional changes.


Revision tags: yamt-pagecache-base9
# 1.166 07-Apr-2014 seanb

Fix a case where an erroneous EAGAIN was returned out of recvmmsg.
This occured when some, but not all of the mmsg array members
were filled with data from a non-blocking socket.
PR kern/48725


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.165 09-Oct-2013 christos

branches: 1.165.2;
delete extra m_len initialization.


# 1.164 09-Oct-2013 christos

PR/48292: Justin Cormack: paccept creates sockets that cannot be made blocking
Reset the socket flags not just the file flags for non-blocking I/O.
XXX: pullup 6


# 1.163 08-Oct-2013 christos

PR/47591: Michael Plass: If the unix socket is closed before accept,
unp->unp_conn will be NULL in PRU_ACCEPT, as called from
sys_accept->so_accept. This will cause the usrreq to return with
no error, leaving the mbuf gotten from m_get() with an uninitialized
length, containing junk from a previous call. Initialize m_len to
be 0 to handle this case. This is yet another reason why Beverly's
idea of setting m_len = 0 in m_get() makes a lot of sense. Arguably
this could be an error, since the data we return now has 0 family
and length.


# 1.162 29-Aug-2013 rmind

Remove SS_ISCONFIRMING, it is unused and TP4 will not come back.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.161 03-Jun-2013 christos

branches: 1.161.2;
use the proper name for kdump pretty-printing.


Revision tags: agc-symver-base
# 1.160 14-Feb-2013 christos

PR/47569: Valery Ushakov: SOCK_NONBLOCK does not work because it does not
set SS_NBIO.
XXX: there are too many flags that mean the same thing in too many places,
and too many flags that mean the same thing and are different.


# 1.159 14-Feb-2013 riastradh

Fix some screw cases in cmsg file descriptor passing.

- Don't leave garbage in the control buffer if allocating file
descriptors fails in unp_externalize.

- Scrub the space between CMSG_LEN and CMSG_SPACE to avoid kernel
memory disclosure in unp_externalize.

- Don't read past cmsg_len when closing file descriptors that
couldn't get delivered, in free_rights.

ok christos


Revision tags: yamt-pagecache-base8
# 1.158 29-Dec-2012 mlelstv

The sanity check prevented messages that carry only ancillary data.


# 1.157 29-Dec-2012 mlelstv

If an untraced process sleeps in recvmsg/sendmsg, the syscall does not
allocate an iov structure for ktrace. When tracing is then enabled
and the process wakes up, it crashes the kernel.

Undo the last commit which introduced this error path.

Avoid the mentioned kmem_alloc assertion by adding a sanity check analog
to similar code in sys_generic.c for I/O on file handles instead of
sockets.

This also causes the syscall to return EMSGSIZE if the msg_iovlen member
of the msg structure is less than or equal to 0, as defined in
recvmsg(2)/sendmsg(2).


Revision tags: yamt-pagecache-base7 yamt-pagecache-base6
# 1.156 17-Jul-2012 njoly

branches: 1.156.2;
Avoid kmem_alloc KASSERT for 0 byte allocation, when tracing processes
that use empty messages with sendmsg/recvmsg.


# 1.155 22-Jun-2012 christos

Add {send,recv}mmsg from Linux


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2 netbsd-6-base
# 1.154 25-Jan-2012 christos

branches: 1.154.2;
revert atomics for so_options since it is a short.


# 1.153 25-Jan-2012 christos

need <sys/atomic.h>


# 1.152 25-Jan-2012 christos

Add locking, requested by yamt. Note that locking is not used everywhere
for these.


# 1.151 25-Jan-2012 christos

As discussed in tech-kern, provide the means to prevent delivery of SIGPIPE
on EPIPE for all file descriptor types:

- provide O_NOSIGPIPE for open,kqueue1,pipe2,dup3,fcntl(F_{G,S}ETFL) [NetBSD]
- provide SOCK_NOSIGPIPE for socket,socketpair [NetBSD]
- provide SO_NOSIGPIPE for {g,s}seckopt [NetBSD/FreeBSD/MacOSX]
- provide F_{G,S}ETNOSIGPIPE for fcntl [MacOSX]


# 1.150 21-Dec-2011 christos

simplify expression


# 1.149 20-Dec-2011 christos

- Eliminate so_nbio and turn it into a bit SS_NBIO in so_state.
- Introduce MSG_NBIO so that we can turn non blocking i/o on a per call basis
- Use MSG_NBIO to fix the XXX: multi-threaded issues on the fifo sockets.
- Don't set SO_CANTRCVMORE, if we were interrupted (perhaps do it for all
errors?).


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2
# 1.148 04-Nov-2011 christos

branches: 1.148.4;
Fix error I introduced in previous commit that caused asymmetric connects
when SOCK_NONBLOCK or SOCK_CLOEXEC was specified. Factor out common code
and simplify error return.


Revision tags: yamt-pagecache-base
# 1.147 21-Sep-2011 christos

branches: 1.147.2;
Put the mbuf type in the ktrace record so that we know how to decode it
in userland.


# 1.146 27-Jul-2011 uebayasi

These don't need uvm/uvm_extern.h.


# 1.145 15-Jul-2011 christos

fail with EINVAL if flags not are not O_CLOEXEC|O_NONBLOCK in pipe2(2) and
dup3(2)


# 1.144 26-Jun-2011 christos

* Arrange for interfaces that create new file descriptors to be able to
set close-on-exec on creation (http://udrepper.livejournal.com/20407.html).

- Add F_DUPFD_CLOEXEC to fcntl(2).
- Add MSG_CMSG_CLOEXEC to recvmsg(2) for unix file descriptor passing.
- Add dup3(2) syscall with a flags argument for O_CLOEXEC, O_NONBLOCK.
- Add pipe2(2) syscall with a flags argument for O_CLOEXEC, O_NONBLOCK.
- Add flags SOCK_CLOEXEC, SOCK_NONBLOCK to the socket type parameter
for socket(2) and socketpair(2).
- Add new paccept(2) syscall that takes an additional sigset_t to alter
the sigmask temporarily and a flags argument to set SOCK_CLOEXEC,
SOCK_NONBLOCK.
- Add new mode character 'e' to fopen(3) and popen(3) to open pipes
and file descriptors for close on exec.
- Add new kqueue1(2) syscall with a new flags argument to open the
kqueue file descriptor with O_CLOEXEC, O_NONBLOCK.

* Fix the system calls that take socklen_t arguments to actually do so.

* Don't include userland header files (signal.h) from system header files
(rump_syscallargs.h).

* Bump libc version for the new syscalls.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.143 24-Apr-2011 rmind

- Replace few malloc(9) uses with kmem(9).
- Rename buf_malloc() to buf_alloc(), fix comments.
- Remove some unnecessary inclusions.


# 1.142 10-Apr-2011 christos

- Add O_CLOEXEC to open(2)
- Add fd_set_exclose() to encapsulate uses of FIO{,N}CLEX, O_CLOEXEC, F{G,S}ETFD
- Add a pipe1() function to allow passing flags to the fd's that pipe(2)
opens to ease implementation of linux pipe2(2)
- Factor out fp handling code from open(2) and fhopen(2)


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base matt-mips64-premerge-20101231 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10 uebayasi-xip-base1
# 1.141 23-Apr-2010 rmind

branches: 1.141.2;
Replace M_IOV and some malloc(9)s with kmem(9), and while there:
- Fix invalid free (M_TEMP vs M_IOV) in do_sys_recvmsg(), spotted by jakllsch@.
Also, same fix in osf1_sys_sendmsg_xopen().
- Fix attempt to free non-allocated memory in error path in netbsd32___getfh30().
- Plug a memory leak in compat_43_netbsd32_orecvmsg().


Revision tags: yamt-nfs-mp-base9 uebayasi-xip-base
# 1.140 21-Jan-2010 pgoyette

branches: 1.140.2; 1.140.4;
Remove unnecessary call to kauth_cred_free().

This resolves an occassional crash I'd been experiencing as reported on
current-users@

Fix suggested by and OK elad@


# 1.139 29-Dec-2009 elad

Add credentials to to sockets.

We don't need any deferred free etc. because we no longer free the
credentials in interrupt context.

Tons of help from matt@, thanks!


# 1.138 20-Dec-2009 dsl

If a multithreaded app closes an fd while another thread is blocked in
read/write/accept, then the expectation is that the blocked thread will
exit and the close complete.
Since only one fd is affected, but many fd can refer to the same file,
the close code can only request the fs code unblock with ERESTART.
Fixed for pipes and sockets, ERESTART will only be generated after such
a close - so there should be no change for other programs.
Also rename fo_abort() to fo_restart() (this used to be fo_drain()).
Fixes PR/26567


Revision tags: matt-premerge-20091211
# 1.137 09-Dec-2009 dsl

Rename fo_drain() to fo_abort(), 'drain' is used to mean 'wait for output
do drain' in many places, whereas fo_drain() was called in order to force
blocking read()/write() etc calls to return to userspace so that a close()
call from a different thread can complete.
In the sockets code comment out the broken code in the inner function,
it was being called from compat code.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 jym-xensuspend-nbase yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.136 04-Apr-2009 ad

Add fileops::fo_drain(), to be called from fd_close() when there is more
than one active reference to a file descriptor. It should dislodge threads
sleeping while holding a reference to the descriptor. Implemented only for
sockets but should be extended to pipes, fifos, etc.

Fixes the case of a multithreaded process doing something like the
following, which would have hung until the process got a signal.

thr0 accept(fd, ...)
thr1 close(fd)


Revision tags: nick-hppapmap-base2
# 1.135 21-Jan-2009 yamt

branches: 1.135.2;
restore the pre socket locking patch signal behaviour.
this fixes a busy-loop in nfs_connect.


Revision tags: netbsd-5-0-RC1 haad-dm-base2 haad-nbase2 ad-audiomp2-base netbsd-5-base matt-mips64-base2 haad-dm-base1 wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 haad-dm-base mjf-devfs2-base
# 1.134 06-Aug-2008 plunky

branches: 1.134.2; 1.134.4;
Convert socket options code to use a sockopt structure
instead of laying everything into an mbuf.

approved by core


Revision tags: simonb-wapbl-nbase simonb-wapbl-base
# 1.133 24-Jun-2008 ad

branches: 1.133.2;
Nothing uses getsock/getvnode any more.


Revision tags: wrstuden-revivesa-base-1 yamt-pf42-base4 yamt-pf42-base3 wrstuden-revivesa-base
# 1.132 30-May-2008 rmind

branches: 1.132.2;
do_sys_accept: release the reference to sock in few error paths.
Should fix PR/38790, report and test-case by Nicolas Joly.


Revision tags: hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2
# 1.131 28-Apr-2008 martin

branches: 1.131.2;
Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.130 24-Apr-2008 ad

branches: 1.130.2;
Network protocol interrupts can now block on locks, so merge the globals
proclist_mutex and proclist_lock into a single adaptive mutex (proc_lock).
Implications:

- Inspecting process state requires thread context, so signals can no longer
be sent from a hardware interrupt handler. Signal activity must be
deferred to a soft interrupt or kthread.

- As the proc state locking is simplified, it's now safe to take exit()
and wait() out from under kernel_lock.

- The system spends less time at IPL_SCHED, and there is less lock activity.


# 1.129 24-Apr-2008 ad

Merge the socket locking patch:

- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.

With much feedback from matt@ and plunky@.


Revision tags: yamt-pf42-baseX yamt-pf42-base ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14
# 1.128 21-Mar-2008 ad

branches: 1.128.2;
Catch up with descriptor handling changes. See kern_descrip.c revision
1.173 for details.


Revision tags: keiichi-mipv6-nbase nick-net80211-sync-base keiichi-mipv6-base matt-armv6-nbase mjf-devfs-base hpcarm-cleanup-base
# 1.127 06-Feb-2008 ad

branches: 1.127.6;
Don't lock the socket to set/clear FNONBLOCK. Just set it atomically.


Revision tags: vmlocking2-base3 bouyer-xeni386-nbase bouyer-xeni386-base matt-armv6-base
# 1.126 26-Dec-2007 ad

Merge more changes from vmlocking2, mainly:

- Locking improvements.
- Use pool_cache for more items.


# 1.125 20-Dec-2007 dsl

Convert all the system call entry points from:
int foo(struct lwp *l, void *v, register_t *retval)
to:
int foo(struct lwp *l, const struct foo_args *uap, register_t *retval)
Fixup compat code to not write into 'uap' and (in some cases) to actually
pass a correctly formatted 'uap' structure with the right name to the
next routine.
A few 'compat' routines that just call standard ones have been deleted.
All the 'compat' code compiles (along with the kernels required to test
build it).
98% done by automated scripts.


# 1.124 16-Dec-2007 elad

Make solisten() take an lwp pointer like the rest, so it can be passed down
to pr_usrreq.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 vmlocking-nbase jmcneill-pm-base reinoud-bufcleanup-base
# 1.123 24-Nov-2007 dyoung

branches: 1.123.2; 1.123.6;
Pass the mbuf type (e.g., MT_SONAME, MT_SOOPTS) as the second
argument to getsockmbuf().


Revision tags: jmcneill-base bouyer-xenamd64-base2 yamt-x86pmap-base4 bouyer-xenamd64-base yamt-x86pmap-base3 yamt-x86pmap-base2 vmlocking-base
# 1.122 05-Oct-2007 dyoung

branches: 1.122.4;
Use getsombuf().


Revision tags: yamt-x86pmap-base
# 1.121 19-Sep-2007 christos

branches: 1.121.2;
minor nits; no code change.


# 1.120 19-Sep-2007 dyoung

1) Introduce a new socket option, (SOL_SOCKET, SO_NOHEADER), that
tells a socket that it should both add a protocol header to tx'd
datagrams and remove the header from rx'd datagrams:

int onoff = 1, s = socket(...);
setsockopt(s, SOL_SOCKET, SO_NOHEADER, &onoff);

2) Add an implementation of (SOL_SOCKET, SO_NOHEADER) for raw IPv4
sockets.

3) Reorganize the protocols' pr_ctloutput implementations a bit.
Consistently return ENOPROTOOPT when an option is unsupported,
and EINVAL if a supported option's arguments are incorrect.
Reorganize the flow of code so that it's more clear how/when
options are passed down the stack until they are handled.

Shorten some pr_ctloutput staircases for readability.

4) Extract common mbuf code into subroutines, add new sockaddr
methods, and introduce a new subroutine, fsocreate(), for reuse
later; use it first in sys_socket():

struct mbuf *m_getsombuf(struct socket *so)

Create an mbuf and make its owner the socket `so'.

struct mbuf *m_intopt(struct socket *so, int val)

Create an mbuf, make its owner the socket `so', put the
int `val' into it, and set its length to sizeof(int).


int fsocreate(..., int *fd)

Create a socket, a la socreate(9), put the socket into the
given LWP's descriptor table, return the descriptor at `fd'
on success.

void *sockaddr_addr(struct sockaddr *sa, socklen_t *slenp)
const void *sockaddr_const_addr(const struct sockaddr *sa, socklen_t *slenp)

Extract a pointer to the address part of a sockaddr. Write
the length of the address part at `slenp', if `slenp' is
not NULL.

socklen_t sockaddr_getlen(const struct sockaddr *sa)

Return the length of a sockaddr. This just evaluates to
sa->sa_len. I only add this for consistency with code that
appears in a portable userland library that I am going to
import.

const struct sockaddr *sockaddr_any(const struct sockaddr *sa)

Return the "don't care" sockaddr in the same family as
`sa'. This is the address a client should sobind(9) if it
does not care the source address and, if applicable, the
port et cetera that it uses.

const void *sockaddr_anyaddr(const struct sockaddr *sa, socklen_t *slenp)

Return the "don't care" sockaddr in the same family as
`sa'. This is the address a client should sobind(9) if it
does not care the source address and, if applicable, the
port et cetera that it uses.


Revision tags: nick-csl-alignment-base5
# 1.119 06-Sep-2007 rmind

do_sys_sendmsg: Plug a possible leak.
From CID: 4535


# 1.118 01-Sep-2007 dsl

Don't error calls to copy socket addresses to userspace when the application
has provided a non-null buffer pointer and a zero length.


# 1.117 27-Aug-2007 dsl

ktrace socket control structures (ie msghdr, address etc) using ktrkuser().


# 1.116 15-Aug-2007 ad

branches: 1.116.2;
Changes to make ktrace LKM friendly and reduce ifdef KTRACE. Proposed
on tech-kern.


Revision tags: matt-mips64-base nick-csl-alignment-base
# 1.115 15-Jul-2007 dsl

branches: 1.115.2; 1.115.6;
Remove non-user flags (especially MSG_IOVUSRSPACE) from mp->msg_flags
before passing to so_receive.
This may (or may not) have any effect...


Revision tags: mjf-ufs-trans-base
# 1.114 01-Jul-2007 dsl

Check for SOL_SOCKET when checking for SCM_RIGHTS.


# 1.113 24-Jun-2007 dsl

Split sys_getpeername() and sys_getsockname() so they can be called when the
'name' is wanted in kernel code.
Similarly split sys_accept() and change the split in recvmsg() so that it
is useful to the compat functions, recvit() is removed and replaced by
do_sys_recvmsg().
Factor out the code that writes socked names to userspace (from mbuf) to
avoid replicated code.
Extract the code that writes socket 'control' (CMSG) data out to userspace,
being more careful about the 'fd' that may exist inside SCM_RIGHTS msgs.
(they still get lost if some of the latter copyout calls fail).
Since these are new functions, and old LKMs will fail to load.


# 1.112 02-Jun-2007 enami

- Fix obvious typos so that sendto(2) works.
- Wrap lines again.


# 1.111 01-Jun-2007 dsl

Split sys_bind() and sys_connect() so that compat code can use common code
once the 'address' has been copied into an mbuf.
Add extra flags for 'struct msghdr.msg_flags' to indicate that the address
and control are already in mbufs, and that the uio structure is in userspace
for sending data, rename sendit() to do_sys_sendmsg() to ensure no old code
passes in random flags.
Changes to compat code to use new functions - removing some stackgap use.
Fix a 'use after free' in compat_43_sys_recvmsg.
I ***THINK*** the code that converts 'cmsg' formatted data is borked!
svr4_stream.c ought to be generated from svr4_32_stream.c during the build.


Revision tags: yamt-idlelwp-base8
# 1.110 13-May-2007 dsl

Fallout from caddr_t deletion - remove a load of redundant (void *) casts.


# 1.109 18-Apr-2007 yamt

sys_accept: fix usecount botch and double soclose in rev.1.108.


# 1.108 15-Apr-2007 yamt

sys_accept: don't leak a socket on error.


Revision tags: thorpej-atomic-base
# 1.107 04-Mar-2007 christos

branches: 1.107.2; 1.107.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


Revision tags: ad-audiomp-base post-newlock2-merge
# 1.106 09-Feb-2007 ad

branches: 1.106.2;
Merge newlock2 to head.


Revision tags: newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base netbsd-4-base
# 1.105 01-Nov-2006 yamt

branches: 1.105.2;
remove some __unused from function parameters.


# 1.104 23-Oct-2006 elad

PR/34873: Ryo Shimizu: sendmsg() can cause kernel panic

Patch applied, thanks!


Revision tags: yamt-splraiseipl-base2
# 1.103 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


Revision tags: yamt-splraiseipl-base yamt-pdpolicy-base9 yamt-pdpolicy-base8 rpaulo-netinet-merge-pcb-base
# 1.102 22-Aug-2006 seanb

branches: 1.102.2; 1.102.4;
Don't leave a dangling socket (no associated struct file) if
user supplied a bad name or anamelen parameter to accept(2).
If bad paramaters were suplied and a copyout() failed, the
struct file was cleaned up but not the associated socket. This
could leave sockets in CLOSE_WAIT that could never be closed.


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base7
# 1.101 23-Jul-2006 ad

branches: 1.101.2;
Use the LWP cached credentials where sane.


# 1.100 26-Jun-2006 mrg

version the socket(2) syscall. for compat30 socket, we use
EPROTONOSUPPORT instead of EAFNOSUPPORT.

from pavel@ with a little bit of clean up from myself.

XXX: netbsd32 (and perhaps other emulations) should be able
XXX: to call the standard socket calls for this i think, but
XXX: revisit this at another time.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base yamt-pdpolicy-base5 chap-midi-base simonb-timecounters-base
# 1.99 16-May-2006 christos

branches: 1.99.4;
Don't set mature an fd that has been ffree'd


Revision tags: elad-kernelauth-base
# 1.98 11-May-2006 christos

Add MSG_NOSIGNAL (from FreeBSD)


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2 yamt-pdpolicy-base
# 1.97 01-Mar-2006 yamt

branches: 1.97.2; 1.97.4; 1.97.6;
merge yamt-uio_vmspace branch.

- use vmspace rather than proc or lwp where appropriate.
the latter is more natural to specify an address space.
(and less likely to be abused for random purposes.)
- fix a swdmover race.


Revision tags: yamt-uio_vmspace-base5
# 1.96 26-Dec-2005 perry

branches: 1.96.2; 1.96.4; 1.96.6;
u_intN_t -> uintN_t


# 1.95 11-Dec-2005 christos

merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base ktrace-lwp-base
# 1.94 03-Sep-2005 martin

In adjust_rights() Use CMSG_SPACE() to calculate the number of
filedescriptors passed in this message - the counterpart in
unp_externalize does this as well.

Note that CMSG_SPACE(0) does not make sense, since it does not invoke
the alignment magic - so use CMSG_SPACE(sizeof(int)) and adjust the
calculated total later.

This fixes the postfix conection cache for 64bit platforms. Previously
the number of passed filed descriptors (nfds) would have been
calculeted too high, causing the fdrelease() of uninitialized junk.


# 1.93 03-Sep-2005 martin

minor knf tweak


# 1.92 30-May-2005 martin

branches: 1.92.2;
Close additional file descriptors if we set MSG_CTRUNC in a SCM_RIGHTS
message. From der Mouse in PR kern/30370.


# 1.91 29-May-2005 christos

- add const.
- remove unnecessary casts.
- add __UNCONST casts and mark them with XXXUNCONST as necessary.


Revision tags: yamt-km-base4 yamt-km-base3 netbsd-3-base kent-audio2-base
# 1.90 26-Feb-2005 perry

branches: 1.90.2;
nuke trailing whitespace


Revision tags: yamt-km-base2 yamt-km-base kent-audio1-beforemerge kent-audio1-base
# 1.89 30-Nov-2004 christos

branches: 1.89.4; 1.89.6;
Cloning cleanup:
1. make fileops const
2. add 2 new negative errno's to `officially' support the cloning hack:
- EDUPFD (used to overload ENODEV)
- EMOVEFD (used to overload ENXIO)
3. Created an fdclone() function to encapsulate the operations needed for
EMOVEFD, and made all cloners use it.
4. Centralize the local noop/badop fileops functions to:
fnullop_fcntl, fnullop_poll, fnullop_kqfilter, fbadop_stat


# 1.88 22-May-2004 jonathan

Eliminate several uses of `curproc' from the socket-layer code and from NFS.

Add a new explicit `struct proc *p' argument to socreate(), sosend().
Use that argument instead of curproc. Follow-on changes to pass that
argument to socreate(), sosend(), and (*so->so_send)() calls.
These changes reviewed and independently recoded by Matt Thomas.

Changes to soreceive() and (*dom->dom_exernalize() from Matt Thomas:
pass soreceive()'s struct uio* uio->uio_procp to unp_externalize().
Eliminate curproc from unp_externalize. Also, now soreceive() uses
its uio->uio_procp value, pass that same value downward to
((pr->pru_usrreq)() calls for consistency, instead of (struct proc * )0.

Similar changes in sys/nfs to eliminate (most) uses of curproc,
either via the req-> r_procp field of a struct nfsreq *req argument,
or by passing down new explicit struct proc * arguments.

Reviewed by: Matt Thomas, posted to tech-kern.
NB: The (*pr->pru_usrreq)() change should be tested on more (all!) protocols.


# 1.87 18-May-2004 ragge

Fix connect() "bug": If connect() is interrupted by a signal, the connection
attempt is terminated, so if a process needs frequent timer interrupts
it can't ever connect() to a machine far away.

Bug found by Erik Lundgren, bugfix (for the same problem) is similar to
the way FreeBSD solved the same problem.

As a side effect, the new connect() behaviour conformes to Posix.


Revision tags: netbsd-2-0-base
# 1.86 29-Nov-2003 matt

branches: 1.86.2;
Restore a change that made AF_LOCAL sockets block on connect(2) until
accepted. However, this time this behavor is not the default. Instead
it must enabled by using the LOCAL_CONNWAIT socket option on either the
connecting or accepting socket.


# 1.85 29-Nov-2003 perry

Revert a change that altered the semantics of AF_LOCAL sockets. Sadly
this made us API incompatible with other Unixes.


# 1.84 13-Nov-2003 chs

eliminate uvm_useracc() in favor of checking the return value of
copyin() or copyout().

uvm_useracc() tells us whether the mapping permissions allow access to
the desired part of an address space, and many callers assume that
this is the same as knowing whether an attempt to access that part of
the address space will succeed. however, access to user space can
fail for reasons other than insufficient permission, most notably that
paging in any non-resident data can fail due to i/o errors. most of
the callers of uvm_useracc() make the above incorrect assumption. the
rest are all misguided optimizations, which optimize for the case
where an operation will fail. we'd rather optimize for operations
succeeding, in which case we should just attempt the access and handle
failures due to insufficient permissions the same way we handle i/o
errors. since there appear to be no good uses of uvm_useracc(), we'll
just remove it.


# 1.83 04-Sep-2003 matt

Adapt to the new calling conventions of unp_connect2


# 1.82 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.81 29-Jun-2003 fvdl

branches: 1.81.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.


# 1.80 28-Jun-2003 darrenr

Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V


# 1.79 05-Apr-2003 christos

PR/21030: Naoto Shimazaki: fcntl to accepted socket does not work properly


# 1.78 26-Feb-2003 matt

Remove leftover MBUFTRACE asserts.


# 1.77 26-Feb-2003 drochner

deactivate MBUFTRACE related KASSERT()s in the !MBUFTRACE case


# 1.76 26-Feb-2003 matt

Add MBUFTRACE kernel option.
Do a little mbuf rework while here. Change all uses of MGET*(*, M_WAIT, *)
to m_get*(M_WAIT, *). These are not performance critical and making them
call m_get saves considerable space. Add m_clget analogue of MCLGET and
make corresponding change for M_WAIT uses.
Modify netinet, gem, fxp, tulip, nfs to support MBUFTRACE.
Begin to change netstat to use sysctl.


# 1.75 18-Jan-2003 thorpej

Merge the nathanw_sa branch.


Revision tags: nathanw_sa_before_merge fvdl_fs64_base gmcgarry_ctxsw_base gmcgarry_ucred_base nathanw_sa_base
# 1.74 26-Nov-2002 christos

si_ -> sel_ to avoid conflicts with siginfo.


# 1.73 25-Nov-2002 itojun

no need for error check after MEXTMALLOC - jdolecek


# 1.72 25-Nov-2002 itojun

MEXTMALLOC() can fail even if M_WAITOK, if arg is too big for malloc().


Revision tags: kqueue-aftermerge
# 1.71 23-Oct-2002 jdolecek

merge kqueue branch into -current

kqueue provides a stateful and efficient event notification framework
currently supported events include socket, file, directory, fifo,
pipe, tty and device changes, and monitoring of processes and signals

kqueue is supported by all writable filesystems in NetBSD tree
(with exception of Coda) and all device drivers supporting poll(2)

based on work done by Jonathan Lemon for FreeBSD
initial NetBSD port done by Luke Mewburn and Jason Thorpe


Revision tags: kqueue-beforemerge kqueue-base
# 1.70 04-Sep-2002 matt

Use the queue macros from <sys/queue.h> instead of referring to the queue
members directly. Use *_FOREACH whenever possible.


Revision tags: gehenna-devsw-base
# 1.69 31-May-2002 itojun

support setsockopt() with larger data (up to MCLBYTES).
From: Hitoshi Asaeda <Hitoshi.Asaeda@sophia.inria.fr>


Revision tags: netbsd-1-6-PATCH002-RELEASE netbsd-1-6-PATCH002 netbsd-1-6-PATCH002-RC4 netbsd-1-6-PATCH002-RC3 netbsd-1-6-PATCH002-RC2 netbsd-1-6-PATCH002-RC1 netbsd-1-6-PATCH001 netbsd-1-6-PATCH001-RELEASE netbsd-1-6-PATCH001-RC3 netbsd-1-6-PATCH001-RC2 netbsd-1-6-PATCH001-RC1 netbsd-1-6-RELEASE netbsd-1-6-RC3 netbsd-1-6-RC2 netbsd-1-6-RC1 netbsd-1-6-base eeh-devprop-base newlock-base ifpoll-base
# 1.68 11-Feb-2002 jdolecek

branches: 1.68.8;
Switch default for pipes to the faster John S. Dyson's implementation.
Old, socketpair-based ones are available with option PIPE_SOCKETPAIR.


Revision tags: thorpej-mips-cache-base
# 1.67 12-Nov-2001 lukem

add RCSIDs


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2
# 1.66 16-Sep-2001 wiz

branches: 1.66.2;
Spell 'occurred' with two 'r's.


Revision tags: post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.65 17-Jul-2001 jdolecek

branches: 1.65.2;
Expel MSG_COMPAT/COMPAT_OLDSOCK, make the COMPAT_43 wrappers
arrange things as needed. Unfortunately, the check in sockargs()
have to stay, since 4.3BSD bind(2), connect(2) and sendto(2) were
not versioned at the time :(

This code was tested to pass regression tests.


# 1.64 01-Jul-2001 matt

branches: 1.64.2;
Use consistent types for len. Limit sockarg length to reasonable values.


# 1.63 25-Jun-2001 jdolecek

Back off the sendit()/recvit() change, some have problems with it


# 1.62 25-Jun-2001 jdolecek

sys_connect(): fix the call to FILE_UNUSE() so that it's done on return, rather
than immediatelly after getsock() call


# 1.61 25-Jun-2001 jdolecek

Add 'kernsa' parameter for sendit()/recvit(); if nonzero, msg->msg_name
is supposed to point directly to struct mbuf or struct sockaddr in kernel
space as appropriate, rather than being a pointer to memory in userland.

This is to be used by compat/* when emulation needs to wrap
send{to|msg}(2)/recv{from|msg}(2) and modify the passed struct
sockaddr.


# 1.60 16-Jun-2001 jdolecek

Add port of high performance pipe implementation written by John S. Dyson
for FreeBSD project. Besides huge speed boost compared with socketpair-based
pipes, this implementation also uses pagable kernel memory instead of mbufs.

Significant differences to FreeBSD version:
* uses uvm_loan() facility for direct write
* async/SIGIO handling correct also for sync writer, async reader
* limits settable via sysctl, amountpipekva and nbigpipes available via sysctl
* pipes are unidirectional - this is enforced on file descriptor level
for now only, the code would be updated to take advantage of it
eventually
* uses lockmgr(9)-based locks instead of home brew variant
* scatter-gather write is handled correctly for direct write case, data
is transferred by PIPE_DIRECT_CHUNK bytes maximum, to avoid running out of kva

All FreeBSD/NetBSD specific code is within appropriate #ifdef, in preparation
to feed changes back to FreeBSD tree.

This pipe implementation is optional for now, add 'options NEW_PIPE'
to your kernel config to use it.


# 1.59 14-Jun-2001 thorpej

Fix a partial construction problem that can cause race conditions
between creation of a file descriptor and close(2) when using kernel
assisted threads. What we do is stick descriptors in the table, but
mark them as "larval". This causes essentially everything to treat
it as a non-existent descriptor, except for fdalloc(), which sees a
filled slot so that it won't (incorrectly) allocate it again. When
a descriptor is fully constructed, the code that has constructed it
marks it as "mature" (which actually clears the "larval" flag), and
things continue to work as normal.

While here, gather all the code that gets a descriptor from the table
into a fd_getfile() function, and call it, rather than having the
same (sometimes incorrect) code copied all over the place.


# 1.58 06-May-2001 manu

implement the recently introduced EMUL_BSD_ASYNCIO_PIPE emulation package
flag.

EMUL_BSD_ASYNCIO_PIPE notes that the emulated binaries expect the original
BSD pipe behavior for asynchronous I/O, which is to fire SIGIO on read() and
write(). OSes without this flag do not expect any SIGIO to be fired on
read() and write() for pipes, even when async I/O was requested. As far as
we know, the OSes that need EMUL_BSD_ASYNCIO_PIPE are NetBSD, OSF/1 and
Darwin.


Revision tags: thorpej_scsipi_beforemerge thorpej_scsipi_nbase thorpej_scsipi_base
# 1.57 27-Feb-2001 lukem

branches: 1.57.2;
convert to ANSI KNF


# 1.56 10-Dec-2000 fvdl

Make sobind() take a struct proc *. It already took curproc and
passed it down to the appropriate usrreq function, and this
allows usage for contexts that need to be explicitly different
from curproc (like in the NFS code when binding to a reserved port).


# 1.55 24-Nov-2000 jdolecek

define COMPAT_OLDSOCK unconditionally - the code is used virtually for all
emulations besides NetBSD, and this way it's LKM-safe


# 1.54 02-Aug-2000 thorpej

MALLOC()/FREE() are not to be used for variable sized allocations.


# 1.53 27-Jun-2000 mrg

remove include of <vm/vm.h>


Revision tags: netbsd-1-5-RELEASE netbsd-1-5-BETA2 netbsd-1-5-BETA netbsd-1-5-ALPHA2 netbsd-1-5-base minoura-xpg4dl-base
# 1.52 27-May-2000 sommerfeld

branches: 1.52.4;
Reduce use of curproc in several places:

- Change ktrace interface to pass in the current process, rather than
p->p_tracep, since the various ktr* function need curproc anyway.

- Add curproc as a parameter to mi_switch() since all callers had it
handy anyway.

- Add a second proc argument for inferior() since callers all had
curproc handy.

Also, miscellaneous cleanups in ktrace:

- ktrace now always uses file-based, rather than vnode-based I/O
(simplifies, increases type safety); eliminate KTRFLAG_FD & KTRFAC_FD.
Do non-blocking I/O, and yield a finite number of times when receiving
EWOULDBLOCK before giving up.

- move code duplicated between sys_fktrace and sys_ktrace into ktrace_common.

- simplify interface to ktrwrite()


# 1.51 30-Mar-2000 augustss

Get rid of register declarations.


# 1.50 23-Mar-2000 thorpej

Implement fdremove() which is used in place of all the code that
did the "fdp->fd_ofiles[fd] = 0" assignment; fdremove() make sure
the fd_freefiles hints stay in sync.

From OpenBSD.


Revision tags: chs-ubc2-newbase wrstuden-devbsize-19991221 wrstuden-devbsize-base comdex-fall-1999-base fvdl-softdep-base
# 1.49 05-Nov-1999 mycroft

branches: 1.49.2;
Fix recent bug in sys_accept(): we must remove the file descriptor from the
file descriptor table before freeing the file description.


# 1.48 30-Oct-1999 enami

back out unnecessary stylistic changes in recent changes, to keep coding
style closer to NKF.


# 1.47 27-Oct-1999 jdolecek

minor cleanup of previous - avoid goto and code duplication


# 1.46 27-Oct-1999 darrenr

patch from Greg A. Woods to fix panic problems with code that attempts to
recover from failures to accept a socket successfully. Problem suggested
by this:
> It would appear (from two "panic: closef: count < 0" failures in less
> than 12 hours) that Darren's fix to accept(2) for lost file descriptors
> isn't quite correct. His fix inserts a call to closef() to handle one
> of several possible error conditions. However everywhere else in the
> socket code in the same file where falloc() cleanup is necessary the
> function used is ffree().


Revision tags: chs-ubc2-base
# 1.45 01-Jul-1999 itojun

branches: 1.45.2; 1.45.4; 1.45.6;
IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.


# 1.44 01-Jul-1999 darrenr

fix sys_accept() to return EOPNOTSUPP for protocols which don't support
listen/accept (PR_LISTEN flag in protosw) and detect obvious faults in
parameters passed. It is still possible for the address used for copying
the socket information to become invalid between that check and the copyout
so close the connection's allocated fd if the copyout fails so that we can
return EFAULT without allocating an fd and the application not knowing about
it. Ideally we'd be able to queue the connection back up so a later accept
could retrieve it but unfortunately that's not possible.


# 1.43 05-May-1999 thorpej

Add "use counting" to file entries. When closing a file, and it's reference
count is 0, wait for use count to drain before finishing the close.

This is necessary in order for multiple processes to safely share file
descriptor tables.


# 1.42 30-Apr-1999 cgd

add checks for COMPAT_OSF1 in the appropriate places


Revision tags: netbsd-1-4-RELEASE netbsd-1-4-base
# 1.41 10-Feb-1999 kleink

branches: 1.41.2; 1.41.4; 1.41.6;
* Due to addition and use of socklen_t, make the socket option and address
arguments passed to accept(), bind(), connect(), getpeername(), getsockname(),
getsockopt(), recvfrom(), sendto() and sendmsg() unsigned, which also elimiates
a few casts.
* Reflect the (now) signedness of msg_iovlen, which necessiates the addition
of a few casts.


# 1.40 18-Dec-1998 drochner

solve the COMPAT_OLDSOCK/MSG_COMPAT problem differently:
The source files which need MSG_COMPAT define COMPAT_OLDSOCK.


Revision tags: kenh-if-detach-base
# 1.39 26-Nov-1998 mycroft

Revert the functional change in rev 1.38; permit a msg_iovlen of 0.
There are two reasons for this:
* We should be able to pass file descriptors without sending any data.
* We could send zero-length iovecs anyway (but we shouldn't have to do this).
Also, msg_iovlen is already a u_int, so delete a bunch of casts.


Revision tags: chs-ubc-base
# 1.38 04-Aug-1998 kleink

Per XNS Issue 5, calling recvmsg(2) or sendmsg(2) with an msg.msg_iovlen less
than or equal to 0 shall fail with EMSGSIZE; the latter condition was not being
checked for. Also, document the msg.msg_iovlen > {IOV_MAX} case.


# 1.37 04-Aug-1998 kleink

UIO_MAXIOV -> IOV_MAX


# 1.36 04-Aug-1998 perry

Abolition of bcopy, ovbcopy, bcmp, and bzero, phase one.
bcopy(x, y, z) -> memcpy(y, x, z)
ovbcopy(x, y, z) -> memmove(y, x, z)
bcmp(x, y, z) -> memcmp(x, y, z)
bzero(x, y) -> memset(x, 0, y)


# 1.35 03-Aug-1998 kleink

Fix two off-by-one bugs, both present in each recvmsg(2) and sendmsg(2):
* the first one would cause an unnecessary malloc() of iovec storage for
a msg_iovlen of UIO_SMALLIOV although the required amount of memory has
been allocated on the stack.
* the second one would cause a recvmsg() or sendmsg() with a msg_iovlen of
UIO_MAXIOV to fail with EMSGSIZE, which is also a violation of XNS5.


# 1.34 31-Jul-1998 perry

fix sizeofs so they comply with the KNF style guide. yes, it is pedantic.


Revision tags: eeh-paddr_t-base
# 1.33 29-Jul-1998 thorpej

branches: 1.33.2;
When checking for overflow in the residual count, test against SSIZE_MAX.
The read/write system calls return ssize_t because -1 is used to indicate
error, therefore the transfer size MUST be limited to SSIZE_MAX, otherwise
garbage can be returned to the user.

There is NO change from existing behavior here, only a more precise
definition of that the semantics are, except in the Alpha case, where
the full SSIZE_MAX transfer size can now be realized (ssize_t is 64-bit
on the Alpha).


# 1.32 18-Jul-1998 lukem

use AF_LOCAL instead of AF_UNIX


# 1.31 25-Jun-1998 thorpej

defopt KTRACE


# 1.30 25-Apr-1998 matt

Hook for 0-copy (or other optimized) sends and receives


# 1.29 01-Mar-1998 fvdl

Merge with Lite2 + local changes


# 1.28 06-Feb-1998 thorpej

When copying out multiple control messages, ensure that the next control
message is aligned. From David Borman <dab@bsdi.com>.


# 1.27 07-Jan-1998 thorpej

Make insertion and removal of sockets from the partial and incoming
connections queues O(C) rather than O(N).


# 1.26 07-Jan-1998 thorpej

Fix bug in recvit() that would cause recvmsg() to only receive one
control message, even if there were multiple control messages on
the queue. From Jean-Luc Richier <Jean-Luc.Richier@imag.fr>, in
bug report kern/4700.


Revision tags: netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base thorpej-signal-base marc-pcmcia-bp marc-pcmcia-base
# 1.25 26-Jun-1997 thorpej

branches: 1.25.8;
Use UCHAR_MAX rather than "255" when sanity-checking the length of a
sockaddr in sockargs().


# 1.24 26-Jun-1997 thorpej

In sockargs():
- Add a comment describing my feelings about this interface, in general.
- Remove the COMPAT_OLDSOCK length hack. Instead, if the socket argument
is too long to fit in an mbuf, allocate enough external storage to
hold it.
- If the socket argument is a sockaddr, don't allow the length to be
greater than 255, as that would overflow sa_len.
Many thanks to enami tsugutomo <enami@cv.sony.co.jp> for his sanity checking.


Revision tags: is-newarp-before-merge is-newarp-base
# 1.23 22-Dec-1996 cgd

* catch up with system call argument type fixups/const poisoning.
* Fix arguments to various copyin()/copyout() invocations, to avoid
gratuitous casts.
* Some KNF formatting fixes
* Change sockargs()'s second argument to be a const void *, to help
with dealing with the syscall argument type fixups/const poisoning.


# 1.22 14-Jun-1996 cgd

avoid unnecessary checks of m_get/MGET/etc.'s return values. When
they're called with M_WAIT, they are defined to never return NULL.


# 1.21 22-May-1996 mycroft

Pass a proc pointer down to the usrreq and pcbbind functions for PRU_ATTACH, PRU_BIND and
PRU_CONTROL. The usrreq interface really needs to be split up, but this will have to wait.
Remove SS_PRIV completely.


Revision tags: netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.20 17-May-1996 pk

branches: 1.20.4;
Don't touch retval[] in socketpair(); manual page says this system call
return 0 on success (PR#2428).


# 1.19 09-Feb-1996 christos

More proto fixes


# 1.18 04-Feb-1996 christos

First pass at prototyping


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.17 10-Oct-1995 mycroft

Add hooks for COMPAT_FREEBSD, from Noriyuki Soda.


# 1.16 07-Oct-1995 mycroft

Prefix names of system call implementation functions with `sys_'.


# 1.15 19-Sep-1995 thorpej

Make system calls conform to a standard prototype and bring those
prototypes into scope.


# 1.14 12-Aug-1995 mycroft

splnet --> splsoftnet


# 1.13 24-Jun-1995 christos

Extracted all of the compat_xxx routines, and created a library [libcompat]
for them. There are a few #ifdef COMPAT_XX remaining, but they are not easy
or worth eliminating (yet).


# 1.12 10-May-1995 christos

tty_tb.c: need to include ioctl_compat.h in order to compile.
sysv_shm.c: make shm_find_segment_by_shmid global so it can be used by
COMPAT_HPUX. There should be a better way...
rest: Add #ifdef COMPAT_HPUX where needed


# 1.11 05-Mar-1995 fvdl

Extended a couple of defines with "|| defined(COMPAT_LINUX)" to make
things compile without requiring COMPAT_43 and/or COMPAT_09.


# 1.10 30-Oct-1994 cgd

be more careful with types, also pull in headers where necessary.


# 1.9 20-Oct-1994 cgd

update for new syscall args description mechanism


Revision tags: netbsd-1-0-PATCH06 netbsd-1-0-PATCH05 netbsd-1-0-PATCH04 netbsd-1-0-PATCH03 netbsd-1-0-PATCH02 netbsd-1-0-PATCH1 netbsd-1-0-PATCH0 netbsd-1-0-RELEASE netbsd-1-0-base
# 1.8 29-Jun-1994 cgd

New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.7 04-May-1994 mycroft

Add return types where missing. Simplify some of the compat conditionals.
Include compat code if COMPAT_SUNOS with or without COMPAT_43.


# 1.6 18-Dec-1993 mycroft

Canonicalize all #includes.


Revision tags: magnum-base netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.5 17-Jul-1993 mycroft

branches: 1.5.4;
Finish moving struct definitions outside of function declarations.


# 1.4 27-Jun-1993 andrew

* ansifications
* Yuval Yarom's socket recv(2) fixes - access rights problems (see also
uipc_socket.c).


# 1.3 22-May-1993 cgd

add include of select.h if necessary for protos, or delete if extraneous


# 1.2 18-May-1993 cgd

make kernel select interface be one-stop shopping & clean it all up.


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


# 1.208 04-Oct-2023 ad

kauth_cred_hold(): return cred verbatim so that donating a reference to
another data structure can be done more elegantly.


# 1.207 09-Sep-2023 ad

Fix a ~16 year old perf regression: when accepting a connection, add a
reference to the caller's credentials rather than copying them.


Revision tags: netbsd-10-base bouyer-sunxi-drm-base
# 1.206 01-Jul-2022 riastradh

sendto(2), recvfrom(2): Scrub internal struct msghdr on stack.

Otherwise this is kernel stack disclosure via ktrace.

Reported-by: syzbot+1d40303b310063778194@syzkaller.appspotmail.com


# 1.205 29-Jun-2022 riastradh

recvmmsg(2): More timespec validation.

Reported-by: syzbot+004ed2f264534bd27312@syzkaller.appspotmail.com
Reported-by: syzbot+6f9014c842c4e78df7bc@syzkaller.appspotmail.com


# 1.204 28-Jun-2022 riastradh

recvmmsg(2): Avoid arithmetic overflow in timeout calculations.

XXX This is not right -- it doesn't actually do anything to time
out...

Reported-by: syzbot+784209d76a94fcc6417b@syzkaller.appspotmail.com


# 1.203 27-Jun-2022 riastradh

sendmsg(2): Avoid buffer overrun in ktrace of invalid cmsghdr.

Reported-by: syzbot+efded148140b23425f5c@syzkaller.appspotmail.com


# 1.202 02-Oct-2021 thorpej

...and correct my terrible spelling.


# 1.201 02-Oct-2021 thorpej

- Strenghen the poll(2) fifo_inout test to ensure that once the reader
has read enough that exactly PIPE_BUF space is available that the FIFO
becomes writable again.
- When creating a FIFO, ensure that the receive low water mark is 1
(a FIFO must be readable when at least 1 byte is available); this
was already the case implicitly, but this makes it explicit.
- Similarly, set the send low water mark to PIPE_BUF to ensure that
the pipe is writable when at least PIPE_BUF bytes of space are available
in the send buffer. Without this change, the strengthened test case
above does not pass (the default send low water mark is larger than
PIPE_BUF; see soreserve()).
- Make the same low water mark changes to the PIPE_SOCKETPAIR case.


Revision tags: thorpej-i2c-spi-conf2-base thorpej-futex2-base thorpej-cfargs2-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base thorpej-i2c-spi-conf-base thorpej-cfargs-base thorpej-futex-base
# 1.200 23-May-2020 ad

Move proc_lock into the data segment. It was dynamically allocated because
at the time we had mutex_obj_alloc() but not __cacheline_aligned.


Revision tags: netbsd-9-2-RELEASE netbsd-9-1-RELEASE bouyer-xenpvh-base2 phil-wifi-20200421 bouyer-xenpvh-base1 phil-wifi-20200411 bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3 netbsd-9-0-RELEASE netbsd-9-0-RC2 ad-namecache-base2 ad-namecache-base1 ad-namecache-base netbsd-9-0-RC1 phil-wifi-20191119 netbsd-9-base phil-wifi-20190609 isaki-audio2-base pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126
# 1.199 12-Nov-2018 hannken

branches: 1.199.4;
sys_recvmmsg: don't defer an error that already gets returned.


# 1.198 07-Nov-2018 hannken

Don't defer errors from sendmmsg(). This matches the linux manpage.

Defer errors from recvmmsg() through so_rerror and
tests and return a deferred error on entry.

Ok: christos@


Revision tags: pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.197 03-Sep-2018 riastradh

Rename min/max -> uimin/uimax for better honesty.

These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER! Some subsystems have

#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)


# 1.196 01-Aug-2018 rjs

Add ioctl(2) handler for kernel part of sctp_peeloff().


# 1.195 31-Jul-2018 rjs

Add getsockopt2() syscall.


Revision tags: pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521
# 1.194 04-May-2018 christos

branches: 1.194.2;
define MBUFTYPES here.


# 1.193 03-May-2018 christos

Fix COMPAT_NETBSD32 cmsg handling:

1. alignment was wrong for > 1 message
2. macros were doing incorrect pointer comparisons, fortunately ending
the iteration early after the fists cmsg instead of crashing.
3. don't output 32 bit ktrace records for cmsg. 32 bit programs running
under emulation on 64 bit systems should produce 64 bit ktrace records
so that the native ktrace can handle the records; remove extra arguments
that are now not needed (the 32 bit msghdr).
4. output the correct type for cmsg trace records.
5. output all the cmsg records in traces instead of just the first one.

Welcome to 8.99.15 because of the argument removal.

XXX: Really all the code should be changed to use the CMSG_{FIRST,NXT}HDR
macros...


Revision tags: pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322
# 1.192 16-Mar-2018 christos

PR/53103: Timo Buhrmester: linux emulation of sendto(2) broken

The sockargs refactoring broke it, because sockargs only works with a user
address. Added an argument to sockargs to indicate where the address is
coming from. Welcome to 8.99.14.


Revision tags: pgoyette-compat-0315 pgoyette-compat-base
# 1.191 12-Feb-2018 maxv

branches: 1.191.2;
Add a KASSERT; we expect *from to be a single mbuf (not chained).


# 1.190 04-Jan-2018 christos

Add a new sockopt member to keep track of the actual size of the option
that should be returned to the caller in getsockopt(2).

(Tom Ivar Helbekkmo)


# 1.189 31-Dec-2017 christos

pass valsize for getsockopt like we do for setsockopt


# 1.188 26-Dec-2017 kamil

Refactor pipe1() and correct a bug in sys_pipe2() (SYS_pipe2)

sys_pipe2() returns two integers (values), the 2nd one is a copy of the 2nd
file descriptor that lands in fildes[2]. This is a side effect of reusing
the code for sys_pipe() (SYS_pipe) and not cleaning it up.

The first returned value is (on success) 0.

Introduced a small refactoring in pipe1() that it does not operate over
retval[], but on an array int[2]. A user sets retval[] for pipe() when
desired and needed.

This refactoring touches compat code: netbsd32, linux, linux32.

Before the changes on NetBSD/amd64:

$ ktruss -i ./a.out
[...]
15131 1 a.out pipe2(0x7f7fff2e62b8, 0) = 0, 4
[...]

After the changes:

$ ktruss -i ./a.out
[...]
782 1 a.out pipe2(0x7f7fff97e850, 0) = 0
[...]

There should not be a visible change for current users.

Sponsored by <The NetBSD Foundation>


Revision tags: tls-maxphys-base-20171202 nick-nhusb-base-20170825 perseant-stdc-iso10646-base
# 1.187 20-Jun-2017 christos

Change len type to be unsigned int for consistency with the input type.
Don't check for negative; it does not matter we clamp anyway. This
broke the compat32 getsockname() where an unitialized socklen_t ended
up randomly negative causing it to fail.


Revision tags: matt-nb8-mediatek-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204
# 1.186 03-Feb-2017 christos

branches: 1.186.6;
expose sendmsg_so and recvmsg_so.


# 1.185 02-Feb-2017 christos

expose copyout_sockname_sb


Revision tags: bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204
# 1.184 03-Dec-2016 christos

branches: 1.184.2;
Add missing ktrkuser


Revision tags: pgoyette-localcount-20161104 nick-nhusb-base-20161004 localcount-20160914
# 1.183 13-Sep-2016 martin

Make the ktrace record written by do_sys_sendmsg/do_sys_recvmsg overridable
by the caller. Use this in compat_netbsd32 to log the 32bit version, so
the 32bit userland kdump is happy.


Revision tags: pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.182 07-Jul-2016 msaitoh

branches: 1.182.2;
KNF. Remove extra spaces. No functional change.


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.181 01-Nov-2015 christos

Don't overwrite the user iov pointer in sendmmsg. Make the send and receive
code look the same.


Revision tags: nick-nhusb-base-20150921
# 1.180 24-Aug-2015 pooka

sprinkle _KERNEL_OPT


# 1.179 22-Jul-2015 maxv

Memory leak. Triggerable from an unprivileged user via COMPAT_43.


Revision tags: nick-nhusb-base-20150606
# 1.178 09-May-2015 rtr

change sosend() to accept sockaddr * instead of mbuf * for nam.

bump to 7.99.16


# 1.177 02-May-2015 rtr

make connect syscall use sockaddr_big and modify pr_{send,connect}
nam parameter type from buf * to sockaddr *.

final commit for parameter type changes to protocol user requests

* bump kernel version to 7.99.15 for parameter type changes to pr_{send,connect}


# 1.176 24-Apr-2015 rtr

make accept, getsockname and getpeername syscalls use sockaddr_big and modify
pr_{accept,sockname,peername} nam parameter type from mbuf * to sockaddr *.

* retained use of mbuftypes[MT_SONAME] for now.
* bump to netbsd version 7.99.12 for parameter type change.

patch posted to tech-net@ 2015/04/19


Revision tags: nick-nhusb-base-20150406
# 1.175 03-Apr-2015 rtr

* change pr_bind to accept struct sockaddr * instead of struct mbuf *
* update protocol bind implementations to use/expect sockaddr *
instead of mbuf *
* introduce sockaddr_big struct for storage of addr data passed via
sys_bind; sockaddr_big is of sufficient size and alignment to
accommodate all addr data sizes received.
* modify sys_bind to allocate sockaddr_big instead of using an mbuf.
* bump kernel version to 7.99.9 for change to pr_bind() parameter type.

Patch posted to tech-net@
http://mail-index.netbsd.org/tech-net/2015/03/15/msg005004.html

The choice to use a new structure sockaddr_big has been retained since
changing sockaddr_storage size would lead to unnecessary ABI change. The
use of the new structure does not preclude future work that increases
the size of sockaddr_storage and at that time sockaddr_big may be
trivially replaced.

Tested by mrg@ and myself, discussed with rmind@, posted to tech-net@


# 1.174 06-Mar-2015 rtr

Return EINVAL if namelen isn't large enough to encompass the expected
members of sockaddr structures. i.e. sa_len and sa_family.

Discussed with and patch by christos@


Revision tags: nick-nhusb-base
# 1.173 05-Sep-2014 matt

branches: 1.173.2;
Try not to use f_data, use f_{vnode,socket,pipe,mqueue,kqueue,ksem} to get
a correctly typed pointer.


Revision tags: netbsd-7-0-RC2 netbsd-7-0-RC1 netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.172 09-Aug-2014 rtr

branches: 1.172.2;
split PRU_CONNECT2 & PRU_PURGEIF function out of pr_generic() usrreq
switches and put into separate functions

- always KASSERT(solocked(so)) even if not implemented
(for PRU_CONNECT2 only)

- replace calls to pr_generic() with req = PRU_CONNECT2 with calls to
pr_connect2()

- replace calls to pr_generic() with req = PRU_PURGEIF with calls to
pr_purgeif()

put common code from unp_connect2() (used by unp_connect() into
unp_connect1() and call out to it when needed

patch only briefly reviewed by rmind@


# 1.171 09-Jul-2014 rtr

* split PRU_PEERADDR and PRU_SOCKADDR function out of pr_generic()
usrreq switches and put into separate functions
xxx_{peer,sock}addr(struct socket *, struct mbuf *).

- KASSERT(solocked(so)) always in new functions even if request
is not implemented

- KASSERT(pcb != NULL) and KASSERT(nam) if the request is
implemented and not for tcp.

* for tcp roll #ifdef KPROF and #ifdef DEBUG code from tcp_usrreq() into
easier to cut & paste functions tcp_debug_capture() and
tcp_debug_trace()

- functions provided by rmind
- remaining use of PRU_{PEER,SOCK}ADDR #define to be removed in a
future commit.

* rename netbt functions to permit consistency of pru function names
(as has been done with other requests already split out).

- l2cap_{peer,sock}addr() -> l2cap_{peer,sock}_addr_pcb()
- rfcomm_{peer,sock}addr() -> rfcomm_{peer,sock}_addr_pcb()
- sco_{peer,sock}addr() -> sco_{peer,sock}_addr_pcb()

* split/refactor do_sys_getsockname(lwp, fd, which, nam) into
two functions do_sys_get{peer,sock}name(fd, nam).

- move PRU_PEERADDR handling into do_sys_getpeername() from
do_sys_getsockname()
- have svr4_stream directly call do_sys_get{sock,peer}name()
respectively instead of providing `which' & fix a DPRINTF string
that incorrectly wrote "getpeername" when it meant "getsockname"
- fix sys_getpeername() and sys_getsockname() to call
do_sys_get{sock,peer}name() without `which' and `lwp' & adjust
comments
- bump kernel version for removal of lwp & which parameters from
do_sys_getsockname()

note: future cleanup to remove struct mbuf * abuse in
xxx_{peer,sock}name()
still to come, not done in this commit since it is easier to do post
split.

patch reviewed by rmind

welcome to 6.99.47


Revision tags: rmind-smpnet-nbase rmind-smpnet-base
# 1.170 18-May-2014 rmind

Add struct pr_usrreqs with a pr_generic function and prepare for the
dismantling of pr_usrreq in the protocols; no functional change intended.
PRU_ATTACH/PRU_DETACH changes will follow soon.

Bump for struct protosw. Welcome to 6.99.62!


# 1.169 17-May-2014 rmind

- fsocreate: set SS_NBIO before the file descriptor is affixed as there is
a theoretical race condition (hard to trigger, though); remove the LWP
parameter and clean up the code a little.
- Sprinkle few comments.
- Remove M_SOOPTS while here.


# 1.168 17-May-2014 rmind

makesocket: set SS_NBIO slightly earlier.


# 1.167 17-May-2014 rmind

Remove trailing whitespaces, wrap long lines, minor KNF; no functional changes.


Revision tags: yamt-pagecache-base9
# 1.166 07-Apr-2014 seanb

Fix a case where an erroneous EAGAIN was returned out of recvmmsg.
This occured when some, but not all of the mmsg array members
were filled with data from a non-blocking socket.
PR kern/48725


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.165 09-Oct-2013 christos

branches: 1.165.2;
delete extra m_len initialization.


# 1.164 09-Oct-2013 christos

PR/48292: Justin Cormack: paccept creates sockets that cannot be made blocking
Reset the socket flags not just the file flags for non-blocking I/O.
XXX: pullup 6


# 1.163 08-Oct-2013 christos

PR/47591: Michael Plass: If the unix socket is closed before accept,
unp->unp_conn will be NULL in PRU_ACCEPT, as called from
sys_accept->so_accept. This will cause the usrreq to return with
no error, leaving the mbuf gotten from m_get() with an uninitialized
length, containing junk from a previous call. Initialize m_len to
be 0 to handle this case. This is yet another reason why Beverly's
idea of setting m_len = 0 in m_get() makes a lot of sense. Arguably
this could be an error, since the data we return now has 0 family
and length.


# 1.162 29-Aug-2013 rmind

Remove SS_ISCONFIRMING, it is unused and TP4 will not come back.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.161 03-Jun-2013 christos

branches: 1.161.2;
use the proper name for kdump pretty-printing.


Revision tags: agc-symver-base
# 1.160 14-Feb-2013 christos

PR/47569: Valery Ushakov: SOCK_NONBLOCK does not work because it does not
set SS_NBIO.
XXX: there are too many flags that mean the same thing in too many places,
and too many flags that mean the same thing and are different.


# 1.159 14-Feb-2013 riastradh

Fix some screw cases in cmsg file descriptor passing.

- Don't leave garbage in the control buffer if allocating file
descriptors fails in unp_externalize.

- Scrub the space between CMSG_LEN and CMSG_SPACE to avoid kernel
memory disclosure in unp_externalize.

- Don't read past cmsg_len when closing file descriptors that
couldn't get delivered, in free_rights.

ok christos


Revision tags: yamt-pagecache-base8
# 1.158 29-Dec-2012 mlelstv

The sanity check prevented messages that carry only ancillary data.


# 1.157 29-Dec-2012 mlelstv

If an untraced process sleeps in recvmsg/sendmsg, the syscall does not
allocate an iov structure for ktrace. When tracing is then enabled
and the process wakes up, it crashes the kernel.

Undo the last commit which introduced this error path.

Avoid the mentioned kmem_alloc assertion by adding a sanity check analog
to similar code in sys_generic.c for I/O on file handles instead of
sockets.

This also causes the syscall to return EMSGSIZE if the msg_iovlen member
of the msg structure is less than or equal to 0, as defined in
recvmsg(2)/sendmsg(2).


Revision tags: yamt-pagecache-base7 yamt-pagecache-base6
# 1.156 17-Jul-2012 njoly

branches: 1.156.2;
Avoid kmem_alloc KASSERT for 0 byte allocation, when tracing processes
that use empty messages with sendmsg/recvmsg.


# 1.155 22-Jun-2012 christos

Add {send,recv}mmsg from Linux


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2 netbsd-6-base
# 1.154 25-Jan-2012 christos

branches: 1.154.2;
revert atomics for so_options since it is a short.


# 1.153 25-Jan-2012 christos

need <sys/atomic.h>


# 1.152 25-Jan-2012 christos

Add locking, requested by yamt. Note that locking is not used everywhere
for these.


# 1.151 25-Jan-2012 christos

As discussed in tech-kern, provide the means to prevent delivery of SIGPIPE
on EPIPE for all file descriptor types:

- provide O_NOSIGPIPE for open,kqueue1,pipe2,dup3,fcntl(F_{G,S}ETFL) [NetBSD]
- provide SOCK_NOSIGPIPE for socket,socketpair [NetBSD]
- provide SO_NOSIGPIPE for {g,s}seckopt [NetBSD/FreeBSD/MacOSX]
- provide F_{G,S}ETNOSIGPIPE for fcntl [MacOSX]


# 1.150 21-Dec-2011 christos

simplify expression


# 1.149 20-Dec-2011 christos

- Eliminate so_nbio and turn it into a bit SS_NBIO in so_state.
- Introduce MSG_NBIO so that we can turn non blocking i/o on a per call basis
- Use MSG_NBIO to fix the XXX: multi-threaded issues on the fifo sockets.
- Don't set SO_CANTRCVMORE, if we were interrupted (perhaps do it for all
errors?).


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2
# 1.148 04-Nov-2011 christos

branches: 1.148.4;
Fix error I introduced in previous commit that caused asymmetric connects
when SOCK_NONBLOCK or SOCK_CLOEXEC was specified. Factor out common code
and simplify error return.


Revision tags: yamt-pagecache-base
# 1.147 21-Sep-2011 christos

branches: 1.147.2;
Put the mbuf type in the ktrace record so that we know how to decode it
in userland.


# 1.146 27-Jul-2011 uebayasi

These don't need uvm/uvm_extern.h.


# 1.145 15-Jul-2011 christos

fail with EINVAL if flags not are not O_CLOEXEC|O_NONBLOCK in pipe2(2) and
dup3(2)


# 1.144 26-Jun-2011 christos

* Arrange for interfaces that create new file descriptors to be able to
set close-on-exec on creation (http://udrepper.livejournal.com/20407.html).

- Add F_DUPFD_CLOEXEC to fcntl(2).
- Add MSG_CMSG_CLOEXEC to recvmsg(2) for unix file descriptor passing.
- Add dup3(2) syscall with a flags argument for O_CLOEXEC, O_NONBLOCK.
- Add pipe2(2) syscall with a flags argument for O_CLOEXEC, O_NONBLOCK.
- Add flags SOCK_CLOEXEC, SOCK_NONBLOCK to the socket type parameter
for socket(2) and socketpair(2).
- Add new paccept(2) syscall that takes an additional sigset_t to alter
the sigmask temporarily and a flags argument to set SOCK_CLOEXEC,
SOCK_NONBLOCK.
- Add new mode character 'e' to fopen(3) and popen(3) to open pipes
and file descriptors for close on exec.
- Add new kqueue1(2) syscall with a new flags argument to open the
kqueue file descriptor with O_CLOEXEC, O_NONBLOCK.

* Fix the system calls that take socklen_t arguments to actually do so.

* Don't include userland header files (signal.h) from system header files
(rump_syscallargs.h).

* Bump libc version for the new syscalls.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.143 24-Apr-2011 rmind

- Replace few malloc(9) uses with kmem(9).
- Rename buf_malloc() to buf_alloc(), fix comments.
- Remove some unnecessary inclusions.


# 1.142 10-Apr-2011 christos

- Add O_CLOEXEC to open(2)
- Add fd_set_exclose() to encapsulate uses of FIO{,N}CLEX, O_CLOEXEC, F{G,S}ETFD
- Add a pipe1() function to allow passing flags to the fd's that pipe(2)
opens to ease implementation of linux pipe2(2)
- Factor out fp handling code from open(2) and fhopen(2)


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base matt-mips64-premerge-20101231 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10 uebayasi-xip-base1
# 1.141 23-Apr-2010 rmind

branches: 1.141.2;
Replace M_IOV and some malloc(9)s with kmem(9), and while there:
- Fix invalid free (M_TEMP vs M_IOV) in do_sys_recvmsg(), spotted by jakllsch@.
Also, same fix in osf1_sys_sendmsg_xopen().
- Fix attempt to free non-allocated memory in error path in netbsd32___getfh30().
- Plug a memory leak in compat_43_netbsd32_orecvmsg().


Revision tags: yamt-nfs-mp-base9 uebayasi-xip-base
# 1.140 21-Jan-2010 pgoyette

branches: 1.140.2; 1.140.4;
Remove unnecessary call to kauth_cred_free().

This resolves an occassional crash I'd been experiencing as reported on
current-users@

Fix suggested by and OK elad@


# 1.139 29-Dec-2009 elad

Add credentials to to sockets.

We don't need any deferred free etc. because we no longer free the
credentials in interrupt context.

Tons of help from matt@, thanks!


# 1.138 20-Dec-2009 dsl

If a multithreaded app closes an fd while another thread is blocked in
read/write/accept, then the expectation is that the blocked thread will
exit and the close complete.
Since only one fd is affected, but many fd can refer to the same file,
the close code can only request the fs code unblock with ERESTART.
Fixed for pipes and sockets, ERESTART will only be generated after such
a close - so there should be no change for other programs.
Also rename fo_abort() to fo_restart() (this used to be fo_drain()).
Fixes PR/26567


Revision tags: matt-premerge-20091211
# 1.137 09-Dec-2009 dsl

Rename fo_drain() to fo_abort(), 'drain' is used to mean 'wait for output
do drain' in many places, whereas fo_drain() was called in order to force
blocking read()/write() etc calls to return to userspace so that a close()
call from a different thread can complete.
In the sockets code comment out the broken code in the inner function,
it was being called from compat code.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 jym-xensuspend-nbase yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.136 04-Apr-2009 ad

Add fileops::fo_drain(), to be called from fd_close() when there is more
than one active reference to a file descriptor. It should dislodge threads
sleeping while holding a reference to the descriptor. Implemented only for
sockets but should be extended to pipes, fifos, etc.

Fixes the case of a multithreaded process doing something like the
following, which would have hung until the process got a signal.

thr0 accept(fd, ...)
thr1 close(fd)


Revision tags: nick-hppapmap-base2
# 1.135 21-Jan-2009 yamt

branches: 1.135.2;
restore the pre socket locking patch signal behaviour.
this fixes a busy-loop in nfs_connect.


Revision tags: netbsd-5-0-RC1 haad-dm-base2 haad-nbase2 ad-audiomp2-base netbsd-5-base matt-mips64-base2 haad-dm-base1 wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 haad-dm-base mjf-devfs2-base
# 1.134 06-Aug-2008 plunky

branches: 1.134.2; 1.134.4;
Convert socket options code to use a sockopt structure
instead of laying everything into an mbuf.

approved by core


Revision tags: simonb-wapbl-nbase simonb-wapbl-base
# 1.133 24-Jun-2008 ad

branches: 1.133.2;
Nothing uses getsock/getvnode any more.


Revision tags: wrstuden-revivesa-base-1 yamt-pf42-base4 yamt-pf42-base3 wrstuden-revivesa-base
# 1.132 30-May-2008 rmind

branches: 1.132.2;
do_sys_accept: release the reference to sock in few error paths.
Should fix PR/38790, report and test-case by Nicolas Joly.


Revision tags: hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2
# 1.131 28-Apr-2008 martin

branches: 1.131.2;
Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.130 24-Apr-2008 ad

branches: 1.130.2;
Network protocol interrupts can now block on locks, so merge the globals
proclist_mutex and proclist_lock into a single adaptive mutex (proc_lock).
Implications:

- Inspecting process state requires thread context, so signals can no longer
be sent from a hardware interrupt handler. Signal activity must be
deferred to a soft interrupt or kthread.

- As the proc state locking is simplified, it's now safe to take exit()
and wait() out from under kernel_lock.

- The system spends less time at IPL_SCHED, and there is less lock activity.


# 1.129 24-Apr-2008 ad

Merge the socket locking patch:

- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.

With much feedback from matt@ and plunky@.


Revision tags: yamt-pf42-baseX yamt-pf42-base ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14
# 1.128 21-Mar-2008 ad

branches: 1.128.2;
Catch up with descriptor handling changes. See kern_descrip.c revision
1.173 for details.


Revision tags: keiichi-mipv6-nbase nick-net80211-sync-base keiichi-mipv6-base matt-armv6-nbase mjf-devfs-base hpcarm-cleanup-base
# 1.127 06-Feb-2008 ad

branches: 1.127.6;
Don't lock the socket to set/clear FNONBLOCK. Just set it atomically.


Revision tags: vmlocking2-base3 bouyer-xeni386-nbase bouyer-xeni386-base matt-armv6-base
# 1.126 26-Dec-2007 ad

Merge more changes from vmlocking2, mainly:

- Locking improvements.
- Use pool_cache for more items.


# 1.125 20-Dec-2007 dsl

Convert all the system call entry points from:
int foo(struct lwp *l, void *v, register_t *retval)
to:
int foo(struct lwp *l, const struct foo_args *uap, register_t *retval)
Fixup compat code to not write into 'uap' and (in some cases) to actually
pass a correctly formatted 'uap' structure with the right name to the
next routine.
A few 'compat' routines that just call standard ones have been deleted.
All the 'compat' code compiles (along with the kernels required to test
build it).
98% done by automated scripts.


# 1.124 16-Dec-2007 elad

Make solisten() take an lwp pointer like the rest, so it can be passed down
to pr_usrreq.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 vmlocking-nbase jmcneill-pm-base reinoud-bufcleanup-base
# 1.123 24-Nov-2007 dyoung

branches: 1.123.2; 1.123.6;
Pass the mbuf type (e.g., MT_SONAME, MT_SOOPTS) as the second
argument to getsockmbuf().


Revision tags: jmcneill-base bouyer-xenamd64-base2 yamt-x86pmap-base4 bouyer-xenamd64-base yamt-x86pmap-base3 yamt-x86pmap-base2 vmlocking-base
# 1.122 05-Oct-2007 dyoung

branches: 1.122.4;
Use getsombuf().


Revision tags: yamt-x86pmap-base
# 1.121 19-Sep-2007 christos

branches: 1.121.2;
minor nits; no code change.


# 1.120 19-Sep-2007 dyoung

1) Introduce a new socket option, (SOL_SOCKET, SO_NOHEADER), that
tells a socket that it should both add a protocol header to tx'd
datagrams and remove the header from rx'd datagrams:

int onoff = 1, s = socket(...);
setsockopt(s, SOL_SOCKET, SO_NOHEADER, &onoff);

2) Add an implementation of (SOL_SOCKET, SO_NOHEADER) for raw IPv4
sockets.

3) Reorganize the protocols' pr_ctloutput implementations a bit.
Consistently return ENOPROTOOPT when an option is unsupported,
and EINVAL if a supported option's arguments are incorrect.
Reorganize the flow of code so that it's more clear how/when
options are passed down the stack until they are handled.

Shorten some pr_ctloutput staircases for readability.

4) Extract common mbuf code into subroutines, add new sockaddr
methods, and introduce a new subroutine, fsocreate(), for reuse
later; use it first in sys_socket():

struct mbuf *m_getsombuf(struct socket *so)

Create an mbuf and make its owner the socket `so'.

struct mbuf *m_intopt(struct socket *so, int val)

Create an mbuf, make its owner the socket `so', put the
int `val' into it, and set its length to sizeof(int).


int fsocreate(..., int *fd)

Create a socket, a la socreate(9), put the socket into the
given LWP's descriptor table, return the descriptor at `fd'
on success.

void *sockaddr_addr(struct sockaddr *sa, socklen_t *slenp)
const void *sockaddr_const_addr(const struct sockaddr *sa, socklen_t *slenp)

Extract a pointer to the address part of a sockaddr. Write
the length of the address part at `slenp', if `slenp' is
not NULL.

socklen_t sockaddr_getlen(const struct sockaddr *sa)

Return the length of a sockaddr. This just evaluates to
sa->sa_len. I only add this for consistency with code that
appears in a portable userland library that I am going to
import.

const struct sockaddr *sockaddr_any(const struct sockaddr *sa)

Return the "don't care" sockaddr in the same family as
`sa'. This is the address a client should sobind(9) if it
does not care the source address and, if applicable, the
port et cetera that it uses.

const void *sockaddr_anyaddr(const struct sockaddr *sa, socklen_t *slenp)

Return the "don't care" sockaddr in the same family as
`sa'. This is the address a client should sobind(9) if it
does not care the source address and, if applicable, the
port et cetera that it uses.


Revision tags: nick-csl-alignment-base5
# 1.119 06-Sep-2007 rmind

do_sys_sendmsg: Plug a possible leak.
From CID: 4535


# 1.118 01-Sep-2007 dsl

Don't error calls to copy socket addresses to userspace when the application
has provided a non-null buffer pointer and a zero length.


# 1.117 27-Aug-2007 dsl

ktrace socket control structures (ie msghdr, address etc) using ktrkuser().


# 1.116 15-Aug-2007 ad

branches: 1.116.2;
Changes to make ktrace LKM friendly and reduce ifdef KTRACE. Proposed
on tech-kern.


Revision tags: matt-mips64-base nick-csl-alignment-base
# 1.115 15-Jul-2007 dsl

branches: 1.115.2; 1.115.6;
Remove non-user flags (especially MSG_IOVUSRSPACE) from mp->msg_flags
before passing to so_receive.
This may (or may not) have any effect...


Revision tags: mjf-ufs-trans-base
# 1.114 01-Jul-2007 dsl

Check for SOL_SOCKET when checking for SCM_RIGHTS.


# 1.113 24-Jun-2007 dsl

Split sys_getpeername() and sys_getsockname() so they can be called when the
'name' is wanted in kernel code.
Similarly split sys_accept() and change the split in recvmsg() so that it
is useful to the compat functions, recvit() is removed and replaced by
do_sys_recvmsg().
Factor out the code that writes socked names to userspace (from mbuf) to
avoid replicated code.
Extract the code that writes socket 'control' (CMSG) data out to userspace,
being more careful about the 'fd' that may exist inside SCM_RIGHTS msgs.
(they still get lost if some of the latter copyout calls fail).
Since these are new functions, and old LKMs will fail to load.


# 1.112 02-Jun-2007 enami

- Fix obvious typos so that sendto(2) works.
- Wrap lines again.


# 1.111 01-Jun-2007 dsl

Split sys_bind() and sys_connect() so that compat code can use common code
once the 'address' has been copied into an mbuf.
Add extra flags for 'struct msghdr.msg_flags' to indicate that the address
and control are already in mbufs, and that the uio structure is in userspace
for sending data, rename sendit() to do_sys_sendmsg() to ensure no old code
passes in random flags.
Changes to compat code to use new functions - removing some stackgap use.
Fix a 'use after free' in compat_43_sys_recvmsg.
I ***THINK*** the code that converts 'cmsg' formatted data is borked!
svr4_stream.c ought to be generated from svr4_32_stream.c during the build.


Revision tags: yamt-idlelwp-base8
# 1.110 13-May-2007 dsl

Fallout from caddr_t deletion - remove a load of redundant (void *) casts.


# 1.109 18-Apr-2007 yamt

sys_accept: fix usecount botch and double soclose in rev.1.108.


# 1.108 15-Apr-2007 yamt

sys_accept: don't leak a socket on error.


Revision tags: thorpej-atomic-base
# 1.107 04-Mar-2007 christos

branches: 1.107.2; 1.107.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


Revision tags: ad-audiomp-base post-newlock2-merge
# 1.106 09-Feb-2007 ad

branches: 1.106.2;
Merge newlock2 to head.


Revision tags: newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base netbsd-4-base
# 1.105 01-Nov-2006 yamt

branches: 1.105.2;
remove some __unused from function parameters.


# 1.104 23-Oct-2006 elad

PR/34873: Ryo Shimizu: sendmsg() can cause kernel panic

Patch applied, thanks!


Revision tags: yamt-splraiseipl-base2
# 1.103 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


Revision tags: yamt-splraiseipl-base yamt-pdpolicy-base9 yamt-pdpolicy-base8 rpaulo-netinet-merge-pcb-base
# 1.102 22-Aug-2006 seanb

branches: 1.102.2; 1.102.4;
Don't leave a dangling socket (no associated struct file) if
user supplied a bad name or anamelen parameter to accept(2).
If bad paramaters were suplied and a copyout() failed, the
struct file was cleaned up but not the associated socket. This
could leave sockets in CLOSE_WAIT that could never be closed.


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base7
# 1.101 23-Jul-2006 ad

branches: 1.101.2;
Use the LWP cached credentials where sane.


# 1.100 26-Jun-2006 mrg

version the socket(2) syscall. for compat30 socket, we use
EPROTONOSUPPORT instead of EAFNOSUPPORT.

from pavel@ with a little bit of clean up from myself.

XXX: netbsd32 (and perhaps other emulations) should be able
XXX: to call the standard socket calls for this i think, but
XXX: revisit this at another time.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base yamt-pdpolicy-base5 chap-midi-base simonb-timecounters-base
# 1.99 16-May-2006 christos

branches: 1.99.4;
Don't set mature an fd that has been ffree'd


Revision tags: elad-kernelauth-base
# 1.98 11-May-2006 christos

Add MSG_NOSIGNAL (from FreeBSD)


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2 yamt-pdpolicy-base
# 1.97 01-Mar-2006 yamt

branches: 1.97.2; 1.97.4; 1.97.6;
merge yamt-uio_vmspace branch.

- use vmspace rather than proc or lwp where appropriate.
the latter is more natural to specify an address space.
(and less likely to be abused for random purposes.)
- fix a swdmover race.


Revision tags: yamt-uio_vmspace-base5
# 1.96 26-Dec-2005 perry

branches: 1.96.2; 1.96.4; 1.96.6;
u_intN_t -> uintN_t


# 1.95 11-Dec-2005 christos

merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base ktrace-lwp-base
# 1.94 03-Sep-2005 martin

In adjust_rights() Use CMSG_SPACE() to calculate the number of
filedescriptors passed in this message - the counterpart in
unp_externalize does this as well.

Note that CMSG_SPACE(0) does not make sense, since it does not invoke
the alignment magic - so use CMSG_SPACE(sizeof(int)) and adjust the
calculated total later.

This fixes the postfix conection cache for 64bit platforms. Previously
the number of passed filed descriptors (nfds) would have been
calculeted too high, causing the fdrelease() of uninitialized junk.


# 1.93 03-Sep-2005 martin

minor knf tweak


# 1.92 30-May-2005 martin

branches: 1.92.2;
Close additional file descriptors if we set MSG_CTRUNC in a SCM_RIGHTS
message. From der Mouse in PR kern/30370.


# 1.91 29-May-2005 christos

- add const.
- remove unnecessary casts.
- add __UNCONST casts and mark them with XXXUNCONST as necessary.


Revision tags: yamt-km-base4 yamt-km-base3 netbsd-3-base kent-audio2-base
# 1.90 26-Feb-2005 perry

branches: 1.90.2;
nuke trailing whitespace


Revision tags: yamt-km-base2 yamt-km-base kent-audio1-beforemerge kent-audio1-base
# 1.89 30-Nov-2004 christos

branches: 1.89.4; 1.89.6;
Cloning cleanup:
1. make fileops const
2. add 2 new negative errno's to `officially' support the cloning hack:
- EDUPFD (used to overload ENODEV)
- EMOVEFD (used to overload ENXIO)
3. Created an fdclone() function to encapsulate the operations needed for
EMOVEFD, and made all cloners use it.
4. Centralize the local noop/badop fileops functions to:
fnullop_fcntl, fnullop_poll, fnullop_kqfilter, fbadop_stat


# 1.88 22-May-2004 jonathan

Eliminate several uses of `curproc' from the socket-layer code and from NFS.

Add a new explicit `struct proc *p' argument to socreate(), sosend().
Use that argument instead of curproc. Follow-on changes to pass that
argument to socreate(), sosend(), and (*so->so_send)() calls.
These changes reviewed and independently recoded by Matt Thomas.

Changes to soreceive() and (*dom->dom_exernalize() from Matt Thomas:
pass soreceive()'s struct uio* uio->uio_procp to unp_externalize().
Eliminate curproc from unp_externalize. Also, now soreceive() uses
its uio->uio_procp value, pass that same value downward to
((pr->pru_usrreq)() calls for consistency, instead of (struct proc * )0.

Similar changes in sys/nfs to eliminate (most) uses of curproc,
either via the req-> r_procp field of a struct nfsreq *req argument,
or by passing down new explicit struct proc * arguments.

Reviewed by: Matt Thomas, posted to tech-kern.
NB: The (*pr->pru_usrreq)() change should be tested on more (all!) protocols.


# 1.87 18-May-2004 ragge

Fix connect() "bug": If connect() is interrupted by a signal, the connection
attempt is terminated, so if a process needs frequent timer interrupts
it can't ever connect() to a machine far away.

Bug found by Erik Lundgren, bugfix (for the same problem) is similar to
the way FreeBSD solved the same problem.

As a side effect, the new connect() behaviour conformes to Posix.


Revision tags: netbsd-2-0-base
# 1.86 29-Nov-2003 matt

branches: 1.86.2;
Restore a change that made AF_LOCAL sockets block on connect(2) until
accepted. However, this time this behavor is not the default. Instead
it must enabled by using the LOCAL_CONNWAIT socket option on either the
connecting or accepting socket.


# 1.85 29-Nov-2003 perry

Revert a change that altered the semantics of AF_LOCAL sockets. Sadly
this made us API incompatible with other Unixes.


# 1.84 13-Nov-2003 chs

eliminate uvm_useracc() in favor of checking the return value of
copyin() or copyout().

uvm_useracc() tells us whether the mapping permissions allow access to
the desired part of an address space, and many callers assume that
this is the same as knowing whether an attempt to access that part of
the address space will succeed. however, access to user space can
fail for reasons other than insufficient permission, most notably that
paging in any non-resident data can fail due to i/o errors. most of
the callers of uvm_useracc() make the above incorrect assumption. the
rest are all misguided optimizations, which optimize for the case
where an operation will fail. we'd rather optimize for operations
succeeding, in which case we should just attempt the access and handle
failures due to insufficient permissions the same way we handle i/o
errors. since there appear to be no good uses of uvm_useracc(), we'll
just remove it.


# 1.83 04-Sep-2003 matt

Adapt to the new calling conventions of unp_connect2


# 1.82 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.81 29-Jun-2003 fvdl

branches: 1.81.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.


# 1.80 28-Jun-2003 darrenr

Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V


# 1.79 05-Apr-2003 christos

PR/21030: Naoto Shimazaki: fcntl to accepted socket does not work properly


# 1.78 26-Feb-2003 matt

Remove leftover MBUFTRACE asserts.


# 1.77 26-Feb-2003 drochner

deactivate MBUFTRACE related KASSERT()s in the !MBUFTRACE case


# 1.76 26-Feb-2003 matt

Add MBUFTRACE kernel option.
Do a little mbuf rework while here. Change all uses of MGET*(*, M_WAIT, *)
to m_get*(M_WAIT, *). These are not performance critical and making them
call m_get saves considerable space. Add m_clget analogue of MCLGET and
make corresponding change for M_WAIT uses.
Modify netinet, gem, fxp, tulip, nfs to support MBUFTRACE.
Begin to change netstat to use sysctl.


# 1.75 18-Jan-2003 thorpej

Merge the nathanw_sa branch.


Revision tags: nathanw_sa_before_merge fvdl_fs64_base gmcgarry_ctxsw_base gmcgarry_ucred_base nathanw_sa_base
# 1.74 26-Nov-2002 christos

si_ -> sel_ to avoid conflicts with siginfo.


# 1.73 25-Nov-2002 itojun

no need for error check after MEXTMALLOC - jdolecek


# 1.72 25-Nov-2002 itojun

MEXTMALLOC() can fail even if M_WAITOK, if arg is too big for malloc().


Revision tags: kqueue-aftermerge
# 1.71 23-Oct-2002 jdolecek

merge kqueue branch into -current

kqueue provides a stateful and efficient event notification framework
currently supported events include socket, file, directory, fifo,
pipe, tty and device changes, and monitoring of processes and signals

kqueue is supported by all writable filesystems in NetBSD tree
(with exception of Coda) and all device drivers supporting poll(2)

based on work done by Jonathan Lemon for FreeBSD
initial NetBSD port done by Luke Mewburn and Jason Thorpe


Revision tags: kqueue-beforemerge kqueue-base
# 1.70 04-Sep-2002 matt

Use the queue macros from <sys/queue.h> instead of referring to the queue
members directly. Use *_FOREACH whenever possible.


Revision tags: gehenna-devsw-base
# 1.69 31-May-2002 itojun

support setsockopt() with larger data (up to MCLBYTES).
From: Hitoshi Asaeda <Hitoshi.Asaeda@sophia.inria.fr>


Revision tags: netbsd-1-6-PATCH002-RELEASE netbsd-1-6-PATCH002 netbsd-1-6-PATCH002-RC4 netbsd-1-6-PATCH002-RC3 netbsd-1-6-PATCH002-RC2 netbsd-1-6-PATCH002-RC1 netbsd-1-6-PATCH001 netbsd-1-6-PATCH001-RELEASE netbsd-1-6-PATCH001-RC3 netbsd-1-6-PATCH001-RC2 netbsd-1-6-PATCH001-RC1 netbsd-1-6-RELEASE netbsd-1-6-RC3 netbsd-1-6-RC2 netbsd-1-6-RC1 netbsd-1-6-base eeh-devprop-base newlock-base ifpoll-base
# 1.68 11-Feb-2002 jdolecek

branches: 1.68.8;
Switch default for pipes to the faster John S. Dyson's implementation.
Old, socketpair-based ones are available with option PIPE_SOCKETPAIR.


Revision tags: thorpej-mips-cache-base
# 1.67 12-Nov-2001 lukem

add RCSIDs


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2
# 1.66 16-Sep-2001 wiz

branches: 1.66.2;
Spell 'occurred' with two 'r's.


Revision tags: post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.65 17-Jul-2001 jdolecek

branches: 1.65.2;
Expel MSG_COMPAT/COMPAT_OLDSOCK, make the COMPAT_43 wrappers
arrange things as needed. Unfortunately, the check in sockargs()
have to stay, since 4.3BSD bind(2), connect(2) and sendto(2) were
not versioned at the time :(

This code was tested to pass regression tests.


# 1.64 01-Jul-2001 matt

branches: 1.64.2;
Use consistent types for len. Limit sockarg length to reasonable values.


# 1.63 25-Jun-2001 jdolecek

Back off the sendit()/recvit() change, some have problems with it


# 1.62 25-Jun-2001 jdolecek

sys_connect(): fix the call to FILE_UNUSE() so that it's done on return, rather
than immediatelly after getsock() call


# 1.61 25-Jun-2001 jdolecek

Add 'kernsa' parameter for sendit()/recvit(); if nonzero, msg->msg_name
is supposed to point directly to struct mbuf or struct sockaddr in kernel
space as appropriate, rather than being a pointer to memory in userland.

This is to be used by compat/* when emulation needs to wrap
send{to|msg}(2)/recv{from|msg}(2) and modify the passed struct
sockaddr.


# 1.60 16-Jun-2001 jdolecek

Add port of high performance pipe implementation written by John S. Dyson
for FreeBSD project. Besides huge speed boost compared with socketpair-based
pipes, this implementation also uses pagable kernel memory instead of mbufs.

Significant differences to FreeBSD version:
* uses uvm_loan() facility for direct write
* async/SIGIO handling correct also for sync writer, async reader
* limits settable via sysctl, amountpipekva and nbigpipes available via sysctl
* pipes are unidirectional - this is enforced on file descriptor level
for now only, the code would be updated to take advantage of it
eventually
* uses lockmgr(9)-based locks instead of home brew variant
* scatter-gather write is handled correctly for direct write case, data
is transferred by PIPE_DIRECT_CHUNK bytes maximum, to avoid running out of kva

All FreeBSD/NetBSD specific code is within appropriate #ifdef, in preparation
to feed changes back to FreeBSD tree.

This pipe implementation is optional for now, add 'options NEW_PIPE'
to your kernel config to use it.


# 1.59 14-Jun-2001 thorpej

Fix a partial construction problem that can cause race conditions
between creation of a file descriptor and close(2) when using kernel
assisted threads. What we do is stick descriptors in the table, but
mark them as "larval". This causes essentially everything to treat
it as a non-existent descriptor, except for fdalloc(), which sees a
filled slot so that it won't (incorrectly) allocate it again. When
a descriptor is fully constructed, the code that has constructed it
marks it as "mature" (which actually clears the "larval" flag), and
things continue to work as normal.

While here, gather all the code that gets a descriptor from the table
into a fd_getfile() function, and call it, rather than having the
same (sometimes incorrect) code copied all over the place.


# 1.58 06-May-2001 manu

implement the recently introduced EMUL_BSD_ASYNCIO_PIPE emulation package
flag.

EMUL_BSD_ASYNCIO_PIPE notes that the emulated binaries expect the original
BSD pipe behavior for asynchronous I/O, which is to fire SIGIO on read() and
write(). OSes without this flag do not expect any SIGIO to be fired on
read() and write() for pipes, even when async I/O was requested. As far as
we know, the OSes that need EMUL_BSD_ASYNCIO_PIPE are NetBSD, OSF/1 and
Darwin.


Revision tags: thorpej_scsipi_beforemerge thorpej_scsipi_nbase thorpej_scsipi_base
# 1.57 27-Feb-2001 lukem

branches: 1.57.2;
convert to ANSI KNF


# 1.56 10-Dec-2000 fvdl

Make sobind() take a struct proc *. It already took curproc and
passed it down to the appropriate usrreq function, and this
allows usage for contexts that need to be explicitly different
from curproc (like in the NFS code when binding to a reserved port).


# 1.55 24-Nov-2000 jdolecek

define COMPAT_OLDSOCK unconditionally - the code is used virtually for all
emulations besides NetBSD, and this way it's LKM-safe


# 1.54 02-Aug-2000 thorpej

MALLOC()/FREE() are not to be used for variable sized allocations.


# 1.53 27-Jun-2000 mrg

remove include of <vm/vm.h>


Revision tags: netbsd-1-5-RELEASE netbsd-1-5-BETA2 netbsd-1-5-BETA netbsd-1-5-ALPHA2 netbsd-1-5-base minoura-xpg4dl-base
# 1.52 27-May-2000 sommerfeld

branches: 1.52.4;
Reduce use of curproc in several places:

- Change ktrace interface to pass in the current process, rather than
p->p_tracep, since the various ktr* function need curproc anyway.

- Add curproc as a parameter to mi_switch() since all callers had it
handy anyway.

- Add a second proc argument for inferior() since callers all had
curproc handy.

Also, miscellaneous cleanups in ktrace:

- ktrace now always uses file-based, rather than vnode-based I/O
(simplifies, increases type safety); eliminate KTRFLAG_FD & KTRFAC_FD.
Do non-blocking I/O, and yield a finite number of times when receiving
EWOULDBLOCK before giving up.

- move code duplicated between sys_fktrace and sys_ktrace into ktrace_common.

- simplify interface to ktrwrite()


# 1.51 30-Mar-2000 augustss

Get rid of register declarations.


# 1.50 23-Mar-2000 thorpej

Implement fdremove() which is used in place of all the code that
did the "fdp->fd_ofiles[fd] = 0" assignment; fdremove() make sure
the fd_freefiles hints stay in sync.

From OpenBSD.


Revision tags: chs-ubc2-newbase wrstuden-devbsize-19991221 wrstuden-devbsize-base comdex-fall-1999-base fvdl-softdep-base
# 1.49 05-Nov-1999 mycroft

branches: 1.49.2;
Fix recent bug in sys_accept(): we must remove the file descriptor from the
file descriptor table before freeing the file description.


# 1.48 30-Oct-1999 enami

back out unnecessary stylistic changes in recent changes, to keep coding
style closer to NKF.


# 1.47 27-Oct-1999 jdolecek

minor cleanup of previous - avoid goto and code duplication


# 1.46 27-Oct-1999 darrenr

patch from Greg A. Woods to fix panic problems with code that attempts to
recover from failures to accept a socket successfully. Problem suggested
by this:
> It would appear (from two "panic: closef: count < 0" failures in less
> than 12 hours) that Darren's fix to accept(2) for lost file descriptors
> isn't quite correct. His fix inserts a call to closef() to handle one
> of several possible error conditions. However everywhere else in the
> socket code in the same file where falloc() cleanup is necessary the
> function used is ffree().


Revision tags: chs-ubc2-base
# 1.45 01-Jul-1999 itojun

branches: 1.45.2; 1.45.4; 1.45.6;
IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.


# 1.44 01-Jul-1999 darrenr

fix sys_accept() to return EOPNOTSUPP for protocols which don't support
listen/accept (PR_LISTEN flag in protosw) and detect obvious faults in
parameters passed. It is still possible for the address used for copying
the socket information to become invalid between that check and the copyout
so close the connection's allocated fd if the copyout fails so that we can
return EFAULT without allocating an fd and the application not knowing about
it. Ideally we'd be able to queue the connection back up so a later accept
could retrieve it but unfortunately that's not possible.


# 1.43 05-May-1999 thorpej

Add "use counting" to file entries. When closing a file, and it's reference
count is 0, wait for use count to drain before finishing the close.

This is necessary in order for multiple processes to safely share file
descriptor tables.


# 1.42 30-Apr-1999 cgd

add checks for COMPAT_OSF1 in the appropriate places


Revision tags: netbsd-1-4-RELEASE netbsd-1-4-base
# 1.41 10-Feb-1999 kleink

branches: 1.41.2; 1.41.4; 1.41.6;
* Due to addition and use of socklen_t, make the socket option and address
arguments passed to accept(), bind(), connect(), getpeername(), getsockname(),
getsockopt(), recvfrom(), sendto() and sendmsg() unsigned, which also elimiates
a few casts.
* Reflect the (now) signedness of msg_iovlen, which necessiates the addition
of a few casts.


# 1.40 18-Dec-1998 drochner

solve the COMPAT_OLDSOCK/MSG_COMPAT problem differently:
The source files which need MSG_COMPAT define COMPAT_OLDSOCK.


Revision tags: kenh-if-detach-base
# 1.39 26-Nov-1998 mycroft

Revert the functional change in rev 1.38; permit a msg_iovlen of 0.
There are two reasons for this:
* We should be able to pass file descriptors without sending any data.
* We could send zero-length iovecs anyway (but we shouldn't have to do this).
Also, msg_iovlen is already a u_int, so delete a bunch of casts.


Revision tags: chs-ubc-base
# 1.38 04-Aug-1998 kleink

Per XNS Issue 5, calling recvmsg(2) or sendmsg(2) with an msg.msg_iovlen less
than or equal to 0 shall fail with EMSGSIZE; the latter condition was not being
checked for. Also, document the msg.msg_iovlen > {IOV_MAX} case.


# 1.37 04-Aug-1998 kleink

UIO_MAXIOV -> IOV_MAX


# 1.36 04-Aug-1998 perry

Abolition of bcopy, ovbcopy, bcmp, and bzero, phase one.
bcopy(x, y, z) -> memcpy(y, x, z)
ovbcopy(x, y, z) -> memmove(y, x, z)
bcmp(x, y, z) -> memcmp(x, y, z)
bzero(x, y) -> memset(x, 0, y)


# 1.35 03-Aug-1998 kleink

Fix two off-by-one bugs, both present in each recvmsg(2) and sendmsg(2):
* the first one would cause an unnecessary malloc() of iovec storage for
a msg_iovlen of UIO_SMALLIOV although the required amount of memory has
been allocated on the stack.
* the second one would cause a recvmsg() or sendmsg() with a msg_iovlen of
UIO_MAXIOV to fail with EMSGSIZE, which is also a violation of XNS5.


# 1.34 31-Jul-1998 perry

fix sizeofs so they comply with the KNF style guide. yes, it is pedantic.


Revision tags: eeh-paddr_t-base
# 1.33 29-Jul-1998 thorpej

branches: 1.33.2;
When checking for overflow in the residual count, test against SSIZE_MAX.
The read/write system calls return ssize_t because -1 is used to indicate
error, therefore the transfer size MUST be limited to SSIZE_MAX, otherwise
garbage can be returned to the user.

There is NO change from existing behavior here, only a more precise
definition of that the semantics are, except in the Alpha case, where
the full SSIZE_MAX transfer size can now be realized (ssize_t is 64-bit
on the Alpha).


# 1.32 18-Jul-1998 lukem

use AF_LOCAL instead of AF_UNIX


# 1.31 25-Jun-1998 thorpej

defopt KTRACE


# 1.30 25-Apr-1998 matt

Hook for 0-copy (or other optimized) sends and receives


# 1.29 01-Mar-1998 fvdl

Merge with Lite2 + local changes


# 1.28 06-Feb-1998 thorpej

When copying out multiple control messages, ensure that the next control
message is aligned. From David Borman <dab@bsdi.com>.


# 1.27 07-Jan-1998 thorpej

Make insertion and removal of sockets from the partial and incoming
connections queues O(C) rather than O(N).


# 1.26 07-Jan-1998 thorpej

Fix bug in recvit() that would cause recvmsg() to only receive one
control message, even if there were multiple control messages on
the queue. From Jean-Luc Richier <Jean-Luc.Richier@imag.fr>, in
bug report kern/4700.


Revision tags: netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base thorpej-signal-base marc-pcmcia-bp marc-pcmcia-base
# 1.25 26-Jun-1997 thorpej

branches: 1.25.8;
Use UCHAR_MAX rather than "255" when sanity-checking the length of a
sockaddr in sockargs().


# 1.24 26-Jun-1997 thorpej

In sockargs():
- Add a comment describing my feelings about this interface, in general.
- Remove the COMPAT_OLDSOCK length hack. Instead, if the socket argument
is too long to fit in an mbuf, allocate enough external storage to
hold it.
- If the socket argument is a sockaddr, don't allow the length to be
greater than 255, as that would overflow sa_len.
Many thanks to enami tsugutomo <enami@cv.sony.co.jp> for his sanity checking.


Revision tags: is-newarp-before-merge is-newarp-base
# 1.23 22-Dec-1996 cgd

* catch up with system call argument type fixups/const poisoning.
* Fix arguments to various copyin()/copyout() invocations, to avoid
gratuitous casts.
* Some KNF formatting fixes
* Change sockargs()'s second argument to be a const void *, to help
with dealing with the syscall argument type fixups/const poisoning.


# 1.22 14-Jun-1996 cgd

avoid unnecessary checks of m_get/MGET/etc.'s return values. When
they're called with M_WAIT, they are defined to never return NULL.


# 1.21 22-May-1996 mycroft

Pass a proc pointer down to the usrreq and pcbbind functions for PRU_ATTACH, PRU_BIND and
PRU_CONTROL. The usrreq interface really needs to be split up, but this will have to wait.
Remove SS_PRIV completely.


Revision tags: netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.20 17-May-1996 pk

branches: 1.20.4;
Don't touch retval[] in socketpair(); manual page says this system call
return 0 on success (PR#2428).


# 1.19 09-Feb-1996 christos

More proto fixes


# 1.18 04-Feb-1996 christos

First pass at prototyping


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.17 10-Oct-1995 mycroft

Add hooks for COMPAT_FREEBSD, from Noriyuki Soda.


# 1.16 07-Oct-1995 mycroft

Prefix names of system call implementation functions with `sys_'.


# 1.15 19-Sep-1995 thorpej

Make system calls conform to a standard prototype and bring those
prototypes into scope.


# 1.14 12-Aug-1995 mycroft

splnet --> splsoftnet


# 1.13 24-Jun-1995 christos

Extracted all of the compat_xxx routines, and created a library [libcompat]
for them. There are a few #ifdef COMPAT_XX remaining, but they are not easy
or worth eliminating (yet).


# 1.12 10-May-1995 christos

tty_tb.c: need to include ioctl_compat.h in order to compile.
sysv_shm.c: make shm_find_segment_by_shmid global so it can be used by
COMPAT_HPUX. There should be a better way...
rest: Add #ifdef COMPAT_HPUX where needed


# 1.11 05-Mar-1995 fvdl

Extended a couple of defines with "|| defined(COMPAT_LINUX)" to make
things compile without requiring COMPAT_43 and/or COMPAT_09.


# 1.10 30-Oct-1994 cgd

be more careful with types, also pull in headers where necessary.


# 1.9 20-Oct-1994 cgd

update for new syscall args description mechanism


Revision tags: netbsd-1-0-PATCH06 netbsd-1-0-PATCH05 netbsd-1-0-PATCH04 netbsd-1-0-PATCH03 netbsd-1-0-PATCH02 netbsd-1-0-PATCH1 netbsd-1-0-PATCH0 netbsd-1-0-RELEASE netbsd-1-0-base
# 1.8 29-Jun-1994 cgd

New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.7 04-May-1994 mycroft

Add return types where missing. Simplify some of the compat conditionals.
Include compat code if COMPAT_SUNOS with or without COMPAT_43.


# 1.6 18-Dec-1993 mycroft

Canonicalize all #includes.


Revision tags: magnum-base netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.5 17-Jul-1993 mycroft

branches: 1.5.4;
Finish moving struct definitions outside of function declarations.


# 1.4 27-Jun-1993 andrew

* ansifications
* Yuval Yarom's socket recv(2) fixes - access rights problems (see also
uipc_socket.c).


# 1.3 22-May-1993 cgd

add include of select.h if necessary for protos, or delete if extraneous


# 1.2 18-May-1993 cgd

make kernel select interface be one-stop shopping & clean it all up.


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


# 1.207 09-Sep-2023 ad

Fix a ~16 year old perf regression: when accepting a connection, add a
reference to the caller's credentials rather than copying them.


Revision tags: netbsd-10-base bouyer-sunxi-drm-base
# 1.206 01-Jul-2022 riastradh

sendto(2), recvfrom(2): Scrub internal struct msghdr on stack.

Otherwise this is kernel stack disclosure via ktrace.

Reported-by: syzbot+1d40303b310063778194@syzkaller.appspotmail.com


# 1.205 29-Jun-2022 riastradh

recvmmsg(2): More timespec validation.

Reported-by: syzbot+004ed2f264534bd27312@syzkaller.appspotmail.com
Reported-by: syzbot+6f9014c842c4e78df7bc@syzkaller.appspotmail.com


# 1.204 28-Jun-2022 riastradh

recvmmsg(2): Avoid arithmetic overflow in timeout calculations.

XXX This is not right -- it doesn't actually do anything to time
out...

Reported-by: syzbot+784209d76a94fcc6417b@syzkaller.appspotmail.com


# 1.203 27-Jun-2022 riastradh

sendmsg(2): Avoid buffer overrun in ktrace of invalid cmsghdr.

Reported-by: syzbot+efded148140b23425f5c@syzkaller.appspotmail.com


# 1.202 02-Oct-2021 thorpej

...and correct my terrible spelling.


# 1.201 02-Oct-2021 thorpej

- Strenghen the poll(2) fifo_inout test to ensure that once the reader
has read enough that exactly PIPE_BUF space is available that the FIFO
becomes writable again.
- When creating a FIFO, ensure that the receive low water mark is 1
(a FIFO must be readable when at least 1 byte is available); this
was already the case implicitly, but this makes it explicit.
- Similarly, set the send low water mark to PIPE_BUF to ensure that
the pipe is writable when at least PIPE_BUF bytes of space are available
in the send buffer. Without this change, the strengthened test case
above does not pass (the default send low water mark is larger than
PIPE_BUF; see soreserve()).
- Make the same low water mark changes to the PIPE_SOCKETPAIR case.


Revision tags: thorpej-i2c-spi-conf2-base thorpej-futex2-base thorpej-cfargs2-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base thorpej-i2c-spi-conf-base thorpej-cfargs-base thorpej-futex-base
# 1.200 23-May-2020 ad

Move proc_lock into the data segment. It was dynamically allocated because
at the time we had mutex_obj_alloc() but not __cacheline_aligned.


Revision tags: netbsd-9-2-RELEASE netbsd-9-1-RELEASE bouyer-xenpvh-base2 phil-wifi-20200421 bouyer-xenpvh-base1 phil-wifi-20200411 bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3 netbsd-9-0-RELEASE netbsd-9-0-RC2 ad-namecache-base2 ad-namecache-base1 ad-namecache-base netbsd-9-0-RC1 phil-wifi-20191119 netbsd-9-base phil-wifi-20190609 isaki-audio2-base pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126
# 1.199 12-Nov-2018 hannken

branches: 1.199.4;
sys_recvmmsg: don't defer an error that already gets returned.


# 1.198 07-Nov-2018 hannken

Don't defer errors from sendmmsg(). This matches the linux manpage.

Defer errors from recvmmsg() through so_rerror and
tests and return a deferred error on entry.

Ok: christos@


Revision tags: pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.197 03-Sep-2018 riastradh

Rename min/max -> uimin/uimax for better honesty.

These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER! Some subsystems have

#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)


# 1.196 01-Aug-2018 rjs

Add ioctl(2) handler for kernel part of sctp_peeloff().


# 1.195 31-Jul-2018 rjs

Add getsockopt2() syscall.


Revision tags: pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521
# 1.194 04-May-2018 christos

branches: 1.194.2;
define MBUFTYPES here.


# 1.193 03-May-2018 christos

Fix COMPAT_NETBSD32 cmsg handling:

1. alignment was wrong for > 1 message
2. macros were doing incorrect pointer comparisons, fortunately ending
the iteration early after the fists cmsg instead of crashing.
3. don't output 32 bit ktrace records for cmsg. 32 bit programs running
under emulation on 64 bit systems should produce 64 bit ktrace records
so that the native ktrace can handle the records; remove extra arguments
that are now not needed (the 32 bit msghdr).
4. output the correct type for cmsg trace records.
5. output all the cmsg records in traces instead of just the first one.

Welcome to 8.99.15 because of the argument removal.

XXX: Really all the code should be changed to use the CMSG_{FIRST,NXT}HDR
macros...


Revision tags: pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322
# 1.192 16-Mar-2018 christos

PR/53103: Timo Buhrmester: linux emulation of sendto(2) broken

The sockargs refactoring broke it, because sockargs only works with a user
address. Added an argument to sockargs to indicate where the address is
coming from. Welcome to 8.99.14.


Revision tags: pgoyette-compat-0315 pgoyette-compat-base
# 1.191 12-Feb-2018 maxv

branches: 1.191.2;
Add a KASSERT; we expect *from to be a single mbuf (not chained).


# 1.190 04-Jan-2018 christos

Add a new sockopt member to keep track of the actual size of the option
that should be returned to the caller in getsockopt(2).

(Tom Ivar Helbekkmo)


# 1.189 31-Dec-2017 christos

pass valsize for getsockopt like we do for setsockopt


# 1.188 26-Dec-2017 kamil

Refactor pipe1() and correct a bug in sys_pipe2() (SYS_pipe2)

sys_pipe2() returns two integers (values), the 2nd one is a copy of the 2nd
file descriptor that lands in fildes[2]. This is a side effect of reusing
the code for sys_pipe() (SYS_pipe) and not cleaning it up.

The first returned value is (on success) 0.

Introduced a small refactoring in pipe1() that it does not operate over
retval[], but on an array int[2]. A user sets retval[] for pipe() when
desired and needed.

This refactoring touches compat code: netbsd32, linux, linux32.

Before the changes on NetBSD/amd64:

$ ktruss -i ./a.out
[...]
15131 1 a.out pipe2(0x7f7fff2e62b8, 0) = 0, 4
[...]

After the changes:

$ ktruss -i ./a.out
[...]
782 1 a.out pipe2(0x7f7fff97e850, 0) = 0
[...]

There should not be a visible change for current users.

Sponsored by <The NetBSD Foundation>


Revision tags: tls-maxphys-base-20171202 nick-nhusb-base-20170825 perseant-stdc-iso10646-base
# 1.187 20-Jun-2017 christos

Change len type to be unsigned int for consistency with the input type.
Don't check for negative; it does not matter we clamp anyway. This
broke the compat32 getsockname() where an unitialized socklen_t ended
up randomly negative causing it to fail.


Revision tags: matt-nb8-mediatek-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204
# 1.186 03-Feb-2017 christos

branches: 1.186.6;
expose sendmsg_so and recvmsg_so.


# 1.185 02-Feb-2017 christos

expose copyout_sockname_sb


Revision tags: bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204
# 1.184 03-Dec-2016 christos

branches: 1.184.2;
Add missing ktrkuser


Revision tags: pgoyette-localcount-20161104 nick-nhusb-base-20161004 localcount-20160914
# 1.183 13-Sep-2016 martin

Make the ktrace record written by do_sys_sendmsg/do_sys_recvmsg overridable
by the caller. Use this in compat_netbsd32 to log the 32bit version, so
the 32bit userland kdump is happy.


Revision tags: pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.182 07-Jul-2016 msaitoh

branches: 1.182.2;
KNF. Remove extra spaces. No functional change.


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.181 01-Nov-2015 christos

Don't overwrite the user iov pointer in sendmmsg. Make the send and receive
code look the same.


Revision tags: nick-nhusb-base-20150921
# 1.180 24-Aug-2015 pooka

sprinkle _KERNEL_OPT


# 1.179 22-Jul-2015 maxv

Memory leak. Triggerable from an unprivileged user via COMPAT_43.


Revision tags: nick-nhusb-base-20150606
# 1.178 09-May-2015 rtr

change sosend() to accept sockaddr * instead of mbuf * for nam.

bump to 7.99.16


# 1.177 02-May-2015 rtr

make connect syscall use sockaddr_big and modify pr_{send,connect}
nam parameter type from buf * to sockaddr *.

final commit for parameter type changes to protocol user requests

* bump kernel version to 7.99.15 for parameter type changes to pr_{send,connect}


# 1.176 24-Apr-2015 rtr

make accept, getsockname and getpeername syscalls use sockaddr_big and modify
pr_{accept,sockname,peername} nam parameter type from mbuf * to sockaddr *.

* retained use of mbuftypes[MT_SONAME] for now.
* bump to netbsd version 7.99.12 for parameter type change.

patch posted to tech-net@ 2015/04/19


Revision tags: nick-nhusb-base-20150406
# 1.175 03-Apr-2015 rtr

* change pr_bind to accept struct sockaddr * instead of struct mbuf *
* update protocol bind implementations to use/expect sockaddr *
instead of mbuf *
* introduce sockaddr_big struct for storage of addr data passed via
sys_bind; sockaddr_big is of sufficient size and alignment to
accommodate all addr data sizes received.
* modify sys_bind to allocate sockaddr_big instead of using an mbuf.
* bump kernel version to 7.99.9 for change to pr_bind() parameter type.

Patch posted to tech-net@
http://mail-index.netbsd.org/tech-net/2015/03/15/msg005004.html

The choice to use a new structure sockaddr_big has been retained since
changing sockaddr_storage size would lead to unnecessary ABI change. The
use of the new structure does not preclude future work that increases
the size of sockaddr_storage and at that time sockaddr_big may be
trivially replaced.

Tested by mrg@ and myself, discussed with rmind@, posted to tech-net@


# 1.174 06-Mar-2015 rtr

Return EINVAL if namelen isn't large enough to encompass the expected
members of sockaddr structures. i.e. sa_len and sa_family.

Discussed with and patch by christos@


Revision tags: nick-nhusb-base
# 1.173 05-Sep-2014 matt

branches: 1.173.2;
Try not to use f_data, use f_{vnode,socket,pipe,mqueue,kqueue,ksem} to get
a correctly typed pointer.


Revision tags: netbsd-7-0-RC2 netbsd-7-0-RC1 netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.172 09-Aug-2014 rtr

branches: 1.172.2;
split PRU_CONNECT2 & PRU_PURGEIF function out of pr_generic() usrreq
switches and put into separate functions

- always KASSERT(solocked(so)) even if not implemented
(for PRU_CONNECT2 only)

- replace calls to pr_generic() with req = PRU_CONNECT2 with calls to
pr_connect2()

- replace calls to pr_generic() with req = PRU_PURGEIF with calls to
pr_purgeif()

put common code from unp_connect2() (used by unp_connect() into
unp_connect1() and call out to it when needed

patch only briefly reviewed by rmind@


# 1.171 09-Jul-2014 rtr

* split PRU_PEERADDR and PRU_SOCKADDR function out of pr_generic()
usrreq switches and put into separate functions
xxx_{peer,sock}addr(struct socket *, struct mbuf *).

- KASSERT(solocked(so)) always in new functions even if request
is not implemented

- KASSERT(pcb != NULL) and KASSERT(nam) if the request is
implemented and not for tcp.

* for tcp roll #ifdef KPROF and #ifdef DEBUG code from tcp_usrreq() into
easier to cut & paste functions tcp_debug_capture() and
tcp_debug_trace()

- functions provided by rmind
- remaining use of PRU_{PEER,SOCK}ADDR #define to be removed in a
future commit.

* rename netbt functions to permit consistency of pru function names
(as has been done with other requests already split out).

- l2cap_{peer,sock}addr() -> l2cap_{peer,sock}_addr_pcb()
- rfcomm_{peer,sock}addr() -> rfcomm_{peer,sock}_addr_pcb()
- sco_{peer,sock}addr() -> sco_{peer,sock}_addr_pcb()

* split/refactor do_sys_getsockname(lwp, fd, which, nam) into
two functions do_sys_get{peer,sock}name(fd, nam).

- move PRU_PEERADDR handling into do_sys_getpeername() from
do_sys_getsockname()
- have svr4_stream directly call do_sys_get{sock,peer}name()
respectively instead of providing `which' & fix a DPRINTF string
that incorrectly wrote "getpeername" when it meant "getsockname"
- fix sys_getpeername() and sys_getsockname() to call
do_sys_get{sock,peer}name() without `which' and `lwp' & adjust
comments
- bump kernel version for removal of lwp & which parameters from
do_sys_getsockname()

note: future cleanup to remove struct mbuf * abuse in
xxx_{peer,sock}name()
still to come, not done in this commit since it is easier to do post
split.

patch reviewed by rmind

welcome to 6.99.47


Revision tags: rmind-smpnet-nbase rmind-smpnet-base
# 1.170 18-May-2014 rmind

Add struct pr_usrreqs with a pr_generic function and prepare for the
dismantling of pr_usrreq in the protocols; no functional change intended.
PRU_ATTACH/PRU_DETACH changes will follow soon.

Bump for struct protosw. Welcome to 6.99.62!


# 1.169 17-May-2014 rmind

- fsocreate: set SS_NBIO before the file descriptor is affixed as there is
a theoretical race condition (hard to trigger, though); remove the LWP
parameter and clean up the code a little.
- Sprinkle few comments.
- Remove M_SOOPTS while here.


# 1.168 17-May-2014 rmind

makesocket: set SS_NBIO slightly earlier.


# 1.167 17-May-2014 rmind

Remove trailing whitespaces, wrap long lines, minor KNF; no functional changes.


Revision tags: yamt-pagecache-base9
# 1.166 07-Apr-2014 seanb

Fix a case where an erroneous EAGAIN was returned out of recvmmsg.
This occured when some, but not all of the mmsg array members
were filled with data from a non-blocking socket.
PR kern/48725


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.165 09-Oct-2013 christos

branches: 1.165.2;
delete extra m_len initialization.


# 1.164 09-Oct-2013 christos

PR/48292: Justin Cormack: paccept creates sockets that cannot be made blocking
Reset the socket flags not just the file flags for non-blocking I/O.
XXX: pullup 6


# 1.163 08-Oct-2013 christos

PR/47591: Michael Plass: If the unix socket is closed before accept,
unp->unp_conn will be NULL in PRU_ACCEPT, as called from
sys_accept->so_accept. This will cause the usrreq to return with
no error, leaving the mbuf gotten from m_get() with an uninitialized
length, containing junk from a previous call. Initialize m_len to
be 0 to handle this case. This is yet another reason why Beverly's
idea of setting m_len = 0 in m_get() makes a lot of sense. Arguably
this could be an error, since the data we return now has 0 family
and length.


# 1.162 29-Aug-2013 rmind

Remove SS_ISCONFIRMING, it is unused and TP4 will not come back.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.161 03-Jun-2013 christos

branches: 1.161.2;
use the proper name for kdump pretty-printing.


Revision tags: agc-symver-base
# 1.160 14-Feb-2013 christos

PR/47569: Valery Ushakov: SOCK_NONBLOCK does not work because it does not
set SS_NBIO.
XXX: there are too many flags that mean the same thing in too many places,
and too many flags that mean the same thing and are different.


# 1.159 14-Feb-2013 riastradh

Fix some screw cases in cmsg file descriptor passing.

- Don't leave garbage in the control buffer if allocating file
descriptors fails in unp_externalize.

- Scrub the space between CMSG_LEN and CMSG_SPACE to avoid kernel
memory disclosure in unp_externalize.

- Don't read past cmsg_len when closing file descriptors that
couldn't get delivered, in free_rights.

ok christos


Revision tags: yamt-pagecache-base8
# 1.158 29-Dec-2012 mlelstv

The sanity check prevented messages that carry only ancillary data.


# 1.157 29-Dec-2012 mlelstv

If an untraced process sleeps in recvmsg/sendmsg, the syscall does not
allocate an iov structure for ktrace. When tracing is then enabled
and the process wakes up, it crashes the kernel.

Undo the last commit which introduced this error path.

Avoid the mentioned kmem_alloc assertion by adding a sanity check analog
to similar code in sys_generic.c for I/O on file handles instead of
sockets.

This also causes the syscall to return EMSGSIZE if the msg_iovlen member
of the msg structure is less than or equal to 0, as defined in
recvmsg(2)/sendmsg(2).


Revision tags: yamt-pagecache-base7 yamt-pagecache-base6
# 1.156 17-Jul-2012 njoly

branches: 1.156.2;
Avoid kmem_alloc KASSERT for 0 byte allocation, when tracing processes
that use empty messages with sendmsg/recvmsg.


# 1.155 22-Jun-2012 christos

Add {send,recv}mmsg from Linux


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2 netbsd-6-base
# 1.154 25-Jan-2012 christos

branches: 1.154.2;
revert atomics for so_options since it is a short.


# 1.153 25-Jan-2012 christos

need <sys/atomic.h>


# 1.152 25-Jan-2012 christos

Add locking, requested by yamt. Note that locking is not used everywhere
for these.


# 1.151 25-Jan-2012 christos

As discussed in tech-kern, provide the means to prevent delivery of SIGPIPE
on EPIPE for all file descriptor types:

- provide O_NOSIGPIPE for open,kqueue1,pipe2,dup3,fcntl(F_{G,S}ETFL) [NetBSD]
- provide SOCK_NOSIGPIPE for socket,socketpair [NetBSD]
- provide SO_NOSIGPIPE for {g,s}seckopt [NetBSD/FreeBSD/MacOSX]
- provide F_{G,S}ETNOSIGPIPE for fcntl [MacOSX]


# 1.150 21-Dec-2011 christos

simplify expression


# 1.149 20-Dec-2011 christos

- Eliminate so_nbio and turn it into a bit SS_NBIO in so_state.
- Introduce MSG_NBIO so that we can turn non blocking i/o on a per call basis
- Use MSG_NBIO to fix the XXX: multi-threaded issues on the fifo sockets.
- Don't set SO_CANTRCVMORE, if we were interrupted (perhaps do it for all
errors?).


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2
# 1.148 04-Nov-2011 christos

branches: 1.148.4;
Fix error I introduced in previous commit that caused asymmetric connects
when SOCK_NONBLOCK or SOCK_CLOEXEC was specified. Factor out common code
and simplify error return.


Revision tags: yamt-pagecache-base
# 1.147 21-Sep-2011 christos

branches: 1.147.2;
Put the mbuf type in the ktrace record so that we know how to decode it
in userland.


# 1.146 27-Jul-2011 uebayasi

These don't need uvm/uvm_extern.h.


# 1.145 15-Jul-2011 christos

fail with EINVAL if flags not are not O_CLOEXEC|O_NONBLOCK in pipe2(2) and
dup3(2)


# 1.144 26-Jun-2011 christos

* Arrange for interfaces that create new file descriptors to be able to
set close-on-exec on creation (http://udrepper.livejournal.com/20407.html).

- Add F_DUPFD_CLOEXEC to fcntl(2).
- Add MSG_CMSG_CLOEXEC to recvmsg(2) for unix file descriptor passing.
- Add dup3(2) syscall with a flags argument for O_CLOEXEC, O_NONBLOCK.
- Add pipe2(2) syscall with a flags argument for O_CLOEXEC, O_NONBLOCK.
- Add flags SOCK_CLOEXEC, SOCK_NONBLOCK to the socket type parameter
for socket(2) and socketpair(2).
- Add new paccept(2) syscall that takes an additional sigset_t to alter
the sigmask temporarily and a flags argument to set SOCK_CLOEXEC,
SOCK_NONBLOCK.
- Add new mode character 'e' to fopen(3) and popen(3) to open pipes
and file descriptors for close on exec.
- Add new kqueue1(2) syscall with a new flags argument to open the
kqueue file descriptor with O_CLOEXEC, O_NONBLOCK.

* Fix the system calls that take socklen_t arguments to actually do so.

* Don't include userland header files (signal.h) from system header files
(rump_syscallargs.h).

* Bump libc version for the new syscalls.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.143 24-Apr-2011 rmind

- Replace few malloc(9) uses with kmem(9).
- Rename buf_malloc() to buf_alloc(), fix comments.
- Remove some unnecessary inclusions.


# 1.142 10-Apr-2011 christos

- Add O_CLOEXEC to open(2)
- Add fd_set_exclose() to encapsulate uses of FIO{,N}CLEX, O_CLOEXEC, F{G,S}ETFD
- Add a pipe1() function to allow passing flags to the fd's that pipe(2)
opens to ease implementation of linux pipe2(2)
- Factor out fp handling code from open(2) and fhopen(2)


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base matt-mips64-premerge-20101231 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10 uebayasi-xip-base1
# 1.141 23-Apr-2010 rmind

branches: 1.141.2;
Replace M_IOV and some malloc(9)s with kmem(9), and while there:
- Fix invalid free (M_TEMP vs M_IOV) in do_sys_recvmsg(), spotted by jakllsch@.
Also, same fix in osf1_sys_sendmsg_xopen().
- Fix attempt to free non-allocated memory in error path in netbsd32___getfh30().
- Plug a memory leak in compat_43_netbsd32_orecvmsg().


Revision tags: yamt-nfs-mp-base9 uebayasi-xip-base
# 1.140 21-Jan-2010 pgoyette

branches: 1.140.2; 1.140.4;
Remove unnecessary call to kauth_cred_free().

This resolves an occassional crash I'd been experiencing as reported on
current-users@

Fix suggested by and OK elad@


# 1.139 29-Dec-2009 elad

Add credentials to to sockets.

We don't need any deferred free etc. because we no longer free the
credentials in interrupt context.

Tons of help from matt@, thanks!


# 1.138 20-Dec-2009 dsl

If a multithreaded app closes an fd while another thread is blocked in
read/write/accept, then the expectation is that the blocked thread will
exit and the close complete.
Since only one fd is affected, but many fd can refer to the same file,
the close code can only request the fs code unblock with ERESTART.
Fixed for pipes and sockets, ERESTART will only be generated after such
a close - so there should be no change for other programs.
Also rename fo_abort() to fo_restart() (this used to be fo_drain()).
Fixes PR/26567


Revision tags: matt-premerge-20091211
# 1.137 09-Dec-2009 dsl

Rename fo_drain() to fo_abort(), 'drain' is used to mean 'wait for output
do drain' in many places, whereas fo_drain() was called in order to force
blocking read()/write() etc calls to return to userspace so that a close()
call from a different thread can complete.
In the sockets code comment out the broken code in the inner function,
it was being called from compat code.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 jym-xensuspend-nbase yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.136 04-Apr-2009 ad

Add fileops::fo_drain(), to be called from fd_close() when there is more
than one active reference to a file descriptor. It should dislodge threads
sleeping while holding a reference to the descriptor. Implemented only for
sockets but should be extended to pipes, fifos, etc.

Fixes the case of a multithreaded process doing something like the
following, which would have hung until the process got a signal.

thr0 accept(fd, ...)
thr1 close(fd)


Revision tags: nick-hppapmap-base2
# 1.135 21-Jan-2009 yamt

branches: 1.135.2;
restore the pre socket locking patch signal behaviour.
this fixes a busy-loop in nfs_connect.


Revision tags: netbsd-5-0-RC1 haad-dm-base2 haad-nbase2 ad-audiomp2-base netbsd-5-base matt-mips64-base2 haad-dm-base1 wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 haad-dm-base mjf-devfs2-base
# 1.134 06-Aug-2008 plunky

branches: 1.134.2; 1.134.4;
Convert socket options code to use a sockopt structure
instead of laying everything into an mbuf.

approved by core


Revision tags: simonb-wapbl-nbase simonb-wapbl-base
# 1.133 24-Jun-2008 ad

branches: 1.133.2;
Nothing uses getsock/getvnode any more.


Revision tags: wrstuden-revivesa-base-1 yamt-pf42-base4 yamt-pf42-base3 wrstuden-revivesa-base
# 1.132 30-May-2008 rmind

branches: 1.132.2;
do_sys_accept: release the reference to sock in few error paths.
Should fix PR/38790, report and test-case by Nicolas Joly.


Revision tags: hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2
# 1.131 28-Apr-2008 martin

branches: 1.131.2;
Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.130 24-Apr-2008 ad

branches: 1.130.2;
Network protocol interrupts can now block on locks, so merge the globals
proclist_mutex and proclist_lock into a single adaptive mutex (proc_lock).
Implications:

- Inspecting process state requires thread context, so signals can no longer
be sent from a hardware interrupt handler. Signal activity must be
deferred to a soft interrupt or kthread.

- As the proc state locking is simplified, it's now safe to take exit()
and wait() out from under kernel_lock.

- The system spends less time at IPL_SCHED, and there is less lock activity.


# 1.129 24-Apr-2008 ad

Merge the socket locking patch:

- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.

With much feedback from matt@ and plunky@.


Revision tags: yamt-pf42-baseX yamt-pf42-base ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14
# 1.128 21-Mar-2008 ad

branches: 1.128.2;
Catch up with descriptor handling changes. See kern_descrip.c revision
1.173 for details.


Revision tags: keiichi-mipv6-nbase nick-net80211-sync-base keiichi-mipv6-base matt-armv6-nbase mjf-devfs-base hpcarm-cleanup-base
# 1.127 06-Feb-2008 ad

branches: 1.127.6;
Don't lock the socket to set/clear FNONBLOCK. Just set it atomically.


Revision tags: vmlocking2-base3 bouyer-xeni386-nbase bouyer-xeni386-base matt-armv6-base
# 1.126 26-Dec-2007 ad

Merge more changes from vmlocking2, mainly:

- Locking improvements.
- Use pool_cache for more items.


# 1.125 20-Dec-2007 dsl

Convert all the system call entry points from:
int foo(struct lwp *l, void *v, register_t *retval)
to:
int foo(struct lwp *l, const struct foo_args *uap, register_t *retval)
Fixup compat code to not write into 'uap' and (in some cases) to actually
pass a correctly formatted 'uap' structure with the right name to the
next routine.
A few 'compat' routines that just call standard ones have been deleted.
All the 'compat' code compiles (along with the kernels required to test
build it).
98% done by automated scripts.


# 1.124 16-Dec-2007 elad

Make solisten() take an lwp pointer like the rest, so it can be passed down
to pr_usrreq.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 vmlocking-nbase jmcneill-pm-base reinoud-bufcleanup-base
# 1.123 24-Nov-2007 dyoung

branches: 1.123.2; 1.123.6;
Pass the mbuf type (e.g., MT_SONAME, MT_SOOPTS) as the second
argument to getsockmbuf().


Revision tags: jmcneill-base bouyer-xenamd64-base2 yamt-x86pmap-base4 bouyer-xenamd64-base yamt-x86pmap-base3 yamt-x86pmap-base2 vmlocking-base
# 1.122 05-Oct-2007 dyoung

branches: 1.122.4;
Use getsombuf().


Revision tags: yamt-x86pmap-base
# 1.121 19-Sep-2007 christos

branches: 1.121.2;
minor nits; no code change.


# 1.120 19-Sep-2007 dyoung

1) Introduce a new socket option, (SOL_SOCKET, SO_NOHEADER), that
tells a socket that it should both add a protocol header to tx'd
datagrams and remove the header from rx'd datagrams:

int onoff = 1, s = socket(...);
setsockopt(s, SOL_SOCKET, SO_NOHEADER, &onoff);

2) Add an implementation of (SOL_SOCKET, SO_NOHEADER) for raw IPv4
sockets.

3) Reorganize the protocols' pr_ctloutput implementations a bit.
Consistently return ENOPROTOOPT when an option is unsupported,
and EINVAL if a supported option's arguments are incorrect.
Reorganize the flow of code so that it's more clear how/when
options are passed down the stack until they are handled.

Shorten some pr_ctloutput staircases for readability.

4) Extract common mbuf code into subroutines, add new sockaddr
methods, and introduce a new subroutine, fsocreate(), for reuse
later; use it first in sys_socket():

struct mbuf *m_getsombuf(struct socket *so)

Create an mbuf and make its owner the socket `so'.

struct mbuf *m_intopt(struct socket *so, int val)

Create an mbuf, make its owner the socket `so', put the
int `val' into it, and set its length to sizeof(int).


int fsocreate(..., int *fd)

Create a socket, a la socreate(9), put the socket into the
given LWP's descriptor table, return the descriptor at `fd'
on success.

void *sockaddr_addr(struct sockaddr *sa, socklen_t *slenp)
const void *sockaddr_const_addr(const struct sockaddr *sa, socklen_t *slenp)

Extract a pointer to the address part of a sockaddr. Write
the length of the address part at `slenp', if `slenp' is
not NULL.

socklen_t sockaddr_getlen(const struct sockaddr *sa)

Return the length of a sockaddr. This just evaluates to
sa->sa_len. I only add this for consistency with code that
appears in a portable userland library that I am going to
import.

const struct sockaddr *sockaddr_any(const struct sockaddr *sa)

Return the "don't care" sockaddr in the same family as
`sa'. This is the address a client should sobind(9) if it
does not care the source address and, if applicable, the
port et cetera that it uses.

const void *sockaddr_anyaddr(const struct sockaddr *sa, socklen_t *slenp)

Return the "don't care" sockaddr in the same family as
`sa'. This is the address a client should sobind(9) if it
does not care the source address and, if applicable, the
port et cetera that it uses.


Revision tags: nick-csl-alignment-base5
# 1.119 06-Sep-2007 rmind

do_sys_sendmsg: Plug a possible leak.
From CID: 4535


# 1.118 01-Sep-2007 dsl

Don't error calls to copy socket addresses to userspace when the application
has provided a non-null buffer pointer and a zero length.


# 1.117 27-Aug-2007 dsl

ktrace socket control structures (ie msghdr, address etc) using ktrkuser().


# 1.116 15-Aug-2007 ad

branches: 1.116.2;
Changes to make ktrace LKM friendly and reduce ifdef KTRACE. Proposed
on tech-kern.


Revision tags: matt-mips64-base nick-csl-alignment-base
# 1.115 15-Jul-2007 dsl

branches: 1.115.2; 1.115.6;
Remove non-user flags (especially MSG_IOVUSRSPACE) from mp->msg_flags
before passing to so_receive.
This may (or may not) have any effect...


Revision tags: mjf-ufs-trans-base
# 1.114 01-Jul-2007 dsl

Check for SOL_SOCKET when checking for SCM_RIGHTS.


# 1.113 24-Jun-2007 dsl

Split sys_getpeername() and sys_getsockname() so they can be called when the
'name' is wanted in kernel code.
Similarly split sys_accept() and change the split in recvmsg() so that it
is useful to the compat functions, recvit() is removed and replaced by
do_sys_recvmsg().
Factor out the code that writes socked names to userspace (from mbuf) to
avoid replicated code.
Extract the code that writes socket 'control' (CMSG) data out to userspace,
being more careful about the 'fd' that may exist inside SCM_RIGHTS msgs.
(they still get lost if some of the latter copyout calls fail).
Since these are new functions, and old LKMs will fail to load.


# 1.112 02-Jun-2007 enami

- Fix obvious typos so that sendto(2) works.
- Wrap lines again.


# 1.111 01-Jun-2007 dsl

Split sys_bind() and sys_connect() so that compat code can use common code
once the 'address' has been copied into an mbuf.
Add extra flags for 'struct msghdr.msg_flags' to indicate that the address
and control are already in mbufs, and that the uio structure is in userspace
for sending data, rename sendit() to do_sys_sendmsg() to ensure no old code
passes in random flags.
Changes to compat code to use new functions - removing some stackgap use.
Fix a 'use after free' in compat_43_sys_recvmsg.
I ***THINK*** the code that converts 'cmsg' formatted data is borked!
svr4_stream.c ought to be generated from svr4_32_stream.c during the build.


Revision tags: yamt-idlelwp-base8
# 1.110 13-May-2007 dsl

Fallout from caddr_t deletion - remove a load of redundant (void *) casts.


# 1.109 18-Apr-2007 yamt

sys_accept: fix usecount botch and double soclose in rev.1.108.


# 1.108 15-Apr-2007 yamt

sys_accept: don't leak a socket on error.


Revision tags: thorpej-atomic-base
# 1.107 04-Mar-2007 christos

branches: 1.107.2; 1.107.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


Revision tags: ad-audiomp-base post-newlock2-merge
# 1.106 09-Feb-2007 ad

branches: 1.106.2;
Merge newlock2 to head.


Revision tags: newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base netbsd-4-base
# 1.105 01-Nov-2006 yamt

branches: 1.105.2;
remove some __unused from function parameters.


# 1.104 23-Oct-2006 elad

PR/34873: Ryo Shimizu: sendmsg() can cause kernel panic

Patch applied, thanks!


Revision tags: yamt-splraiseipl-base2
# 1.103 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


Revision tags: yamt-splraiseipl-base yamt-pdpolicy-base9 yamt-pdpolicy-base8 rpaulo-netinet-merge-pcb-base
# 1.102 22-Aug-2006 seanb

branches: 1.102.2; 1.102.4;
Don't leave a dangling socket (no associated struct file) if
user supplied a bad name or anamelen parameter to accept(2).
If bad paramaters were suplied and a copyout() failed, the
struct file was cleaned up but not the associated socket. This
could leave sockets in CLOSE_WAIT that could never be closed.


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base7
# 1.101 23-Jul-2006 ad

branches: 1.101.2;
Use the LWP cached credentials where sane.


# 1.100 26-Jun-2006 mrg

version the socket(2) syscall. for compat30 socket, we use
EPROTONOSUPPORT instead of EAFNOSUPPORT.

from pavel@ with a little bit of clean up from myself.

XXX: netbsd32 (and perhaps other emulations) should be able
XXX: to call the standard socket calls for this i think, but
XXX: revisit this at another time.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base yamt-pdpolicy-base5 chap-midi-base simonb-timecounters-base
# 1.99 16-May-2006 christos

branches: 1.99.4;
Don't set mature an fd that has been ffree'd


Revision tags: elad-kernelauth-base
# 1.98 11-May-2006 christos

Add MSG_NOSIGNAL (from FreeBSD)


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2 yamt-pdpolicy-base
# 1.97 01-Mar-2006 yamt

branches: 1.97.2; 1.97.4; 1.97.6;
merge yamt-uio_vmspace branch.

- use vmspace rather than proc or lwp where appropriate.
the latter is more natural to specify an address space.
(and less likely to be abused for random purposes.)
- fix a swdmover race.


Revision tags: yamt-uio_vmspace-base5
# 1.96 26-Dec-2005 perry

branches: 1.96.2; 1.96.4; 1.96.6;
u_intN_t -> uintN_t


# 1.95 11-Dec-2005 christos

merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base ktrace-lwp-base
# 1.94 03-Sep-2005 martin

In adjust_rights() Use CMSG_SPACE() to calculate the number of
filedescriptors passed in this message - the counterpart in
unp_externalize does this as well.

Note that CMSG_SPACE(0) does not make sense, since it does not invoke
the alignment magic - so use CMSG_SPACE(sizeof(int)) and adjust the
calculated total later.

This fixes the postfix conection cache for 64bit platforms. Previously
the number of passed filed descriptors (nfds) would have been
calculeted too high, causing the fdrelease() of uninitialized junk.


# 1.93 03-Sep-2005 martin

minor knf tweak


# 1.92 30-May-2005 martin

branches: 1.92.2;
Close additional file descriptors if we set MSG_CTRUNC in a SCM_RIGHTS
message. From der Mouse in PR kern/30370.


# 1.91 29-May-2005 christos

- add const.
- remove unnecessary casts.
- add __UNCONST casts and mark them with XXXUNCONST as necessary.


Revision tags: yamt-km-base4 yamt-km-base3 netbsd-3-base kent-audio2-base
# 1.90 26-Feb-2005 perry

branches: 1.90.2;
nuke trailing whitespace


Revision tags: yamt-km-base2 yamt-km-base kent-audio1-beforemerge kent-audio1-base
# 1.89 30-Nov-2004 christos

branches: 1.89.4; 1.89.6;
Cloning cleanup:
1. make fileops const
2. add 2 new negative errno's to `officially' support the cloning hack:
- EDUPFD (used to overload ENODEV)
- EMOVEFD (used to overload ENXIO)
3. Created an fdclone() function to encapsulate the operations needed for
EMOVEFD, and made all cloners use it.
4. Centralize the local noop/badop fileops functions to:
fnullop_fcntl, fnullop_poll, fnullop_kqfilter, fbadop_stat


# 1.88 22-May-2004 jonathan

Eliminate several uses of `curproc' from the socket-layer code and from NFS.

Add a new explicit `struct proc *p' argument to socreate(), sosend().
Use that argument instead of curproc. Follow-on changes to pass that
argument to socreate(), sosend(), and (*so->so_send)() calls.
These changes reviewed and independently recoded by Matt Thomas.

Changes to soreceive() and (*dom->dom_exernalize() from Matt Thomas:
pass soreceive()'s struct uio* uio->uio_procp to unp_externalize().
Eliminate curproc from unp_externalize. Also, now soreceive() uses
its uio->uio_procp value, pass that same value downward to
((pr->pru_usrreq)() calls for consistency, instead of (struct proc * )0.

Similar changes in sys/nfs to eliminate (most) uses of curproc,
either via the req-> r_procp field of a struct nfsreq *req argument,
or by passing down new explicit struct proc * arguments.

Reviewed by: Matt Thomas, posted to tech-kern.
NB: The (*pr->pru_usrreq)() change should be tested on more (all!) protocols.


# 1.87 18-May-2004 ragge

Fix connect() "bug": If connect() is interrupted by a signal, the connection
attempt is terminated, so if a process needs frequent timer interrupts
it can't ever connect() to a machine far away.

Bug found by Erik Lundgren, bugfix (for the same problem) is similar to
the way FreeBSD solved the same problem.

As a side effect, the new connect() behaviour conformes to Posix.


Revision tags: netbsd-2-0-base
# 1.86 29-Nov-2003 matt

branches: 1.86.2;
Restore a change that made AF_LOCAL sockets block on connect(2) until
accepted. However, this time this behavor is not the default. Instead
it must enabled by using the LOCAL_CONNWAIT socket option on either the
connecting or accepting socket.


# 1.85 29-Nov-2003 perry

Revert a change that altered the semantics of AF_LOCAL sockets. Sadly
this made us API incompatible with other Unixes.


# 1.84 13-Nov-2003 chs

eliminate uvm_useracc() in favor of checking the return value of
copyin() or copyout().

uvm_useracc() tells us whether the mapping permissions allow access to
the desired part of an address space, and many callers assume that
this is the same as knowing whether an attempt to access that part of
the address space will succeed. however, access to user space can
fail for reasons other than insufficient permission, most notably that
paging in any non-resident data can fail due to i/o errors. most of
the callers of uvm_useracc() make the above incorrect assumption. the
rest are all misguided optimizations, which optimize for the case
where an operation will fail. we'd rather optimize for operations
succeeding, in which case we should just attempt the access and handle
failures due to insufficient permissions the same way we handle i/o
errors. since there appear to be no good uses of uvm_useracc(), we'll
just remove it.


# 1.83 04-Sep-2003 matt

Adapt to the new calling conventions of unp_connect2


# 1.82 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.81 29-Jun-2003 fvdl

branches: 1.81.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.


# 1.80 28-Jun-2003 darrenr

Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V


# 1.79 05-Apr-2003 christos

PR/21030: Naoto Shimazaki: fcntl to accepted socket does not work properly


# 1.78 26-Feb-2003 matt

Remove leftover MBUFTRACE asserts.


# 1.77 26-Feb-2003 drochner

deactivate MBUFTRACE related KASSERT()s in the !MBUFTRACE case


# 1.76 26-Feb-2003 matt

Add MBUFTRACE kernel option.
Do a little mbuf rework while here. Change all uses of MGET*(*, M_WAIT, *)
to m_get*(M_WAIT, *). These are not performance critical and making them
call m_get saves considerable space. Add m_clget analogue of MCLGET and
make corresponding change for M_WAIT uses.
Modify netinet, gem, fxp, tulip, nfs to support MBUFTRACE.
Begin to change netstat to use sysctl.


# 1.75 18-Jan-2003 thorpej

Merge the nathanw_sa branch.


Revision tags: nathanw_sa_before_merge fvdl_fs64_base gmcgarry_ctxsw_base gmcgarry_ucred_base nathanw_sa_base
# 1.74 26-Nov-2002 christos

si_ -> sel_ to avoid conflicts with siginfo.


# 1.73 25-Nov-2002 itojun

no need for error check after MEXTMALLOC - jdolecek


# 1.72 25-Nov-2002 itojun

MEXTMALLOC() can fail even if M_WAITOK, if arg is too big for malloc().


Revision tags: kqueue-aftermerge
# 1.71 23-Oct-2002 jdolecek

merge kqueue branch into -current

kqueue provides a stateful and efficient event notification framework
currently supported events include socket, file, directory, fifo,
pipe, tty and device changes, and monitoring of processes and signals

kqueue is supported by all writable filesystems in NetBSD tree
(with exception of Coda) and all device drivers supporting poll(2)

based on work done by Jonathan Lemon for FreeBSD
initial NetBSD port done by Luke Mewburn and Jason Thorpe


Revision tags: kqueue-beforemerge kqueue-base
# 1.70 04-Sep-2002 matt

Use the queue macros from <sys/queue.h> instead of referring to the queue
members directly. Use *_FOREACH whenever possible.


Revision tags: gehenna-devsw-base
# 1.69 31-May-2002 itojun

support setsockopt() with larger data (up to MCLBYTES).
From: Hitoshi Asaeda <Hitoshi.Asaeda@sophia.inria.fr>


Revision tags: netbsd-1-6-PATCH002-RELEASE netbsd-1-6-PATCH002 netbsd-1-6-PATCH002-RC4 netbsd-1-6-PATCH002-RC3 netbsd-1-6-PATCH002-RC2 netbsd-1-6-PATCH002-RC1 netbsd-1-6-PATCH001 netbsd-1-6-PATCH001-RELEASE netbsd-1-6-PATCH001-RC3 netbsd-1-6-PATCH001-RC2 netbsd-1-6-PATCH001-RC1 netbsd-1-6-RELEASE netbsd-1-6-RC3 netbsd-1-6-RC2 netbsd-1-6-RC1 netbsd-1-6-base eeh-devprop-base newlock-base ifpoll-base
# 1.68 11-Feb-2002 jdolecek

branches: 1.68.8;
Switch default for pipes to the faster John S. Dyson's implementation.
Old, socketpair-based ones are available with option PIPE_SOCKETPAIR.


Revision tags: thorpej-mips-cache-base
# 1.67 12-Nov-2001 lukem

add RCSIDs


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2
# 1.66 16-Sep-2001 wiz

branches: 1.66.2;
Spell 'occurred' with two 'r's.


Revision tags: post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.65 17-Jul-2001 jdolecek

branches: 1.65.2;
Expel MSG_COMPAT/COMPAT_OLDSOCK, make the COMPAT_43 wrappers
arrange things as needed. Unfortunately, the check in sockargs()
have to stay, since 4.3BSD bind(2), connect(2) and sendto(2) were
not versioned at the time :(

This code was tested to pass regression tests.


# 1.64 01-Jul-2001 matt

branches: 1.64.2;
Use consistent types for len. Limit sockarg length to reasonable values.


# 1.63 25-Jun-2001 jdolecek

Back off the sendit()/recvit() change, some have problems with it


# 1.62 25-Jun-2001 jdolecek

sys_connect(): fix the call to FILE_UNUSE() so that it's done on return, rather
than immediatelly after getsock() call


# 1.61 25-Jun-2001 jdolecek

Add 'kernsa' parameter for sendit()/recvit(); if nonzero, msg->msg_name
is supposed to point directly to struct mbuf or struct sockaddr in kernel
space as appropriate, rather than being a pointer to memory in userland.

This is to be used by compat/* when emulation needs to wrap
send{to|msg}(2)/recv{from|msg}(2) and modify the passed struct
sockaddr.


# 1.60 16-Jun-2001 jdolecek

Add port of high performance pipe implementation written by John S. Dyson
for FreeBSD project. Besides huge speed boost compared with socketpair-based
pipes, this implementation also uses pagable kernel memory instead of mbufs.

Significant differences to FreeBSD version:
* uses uvm_loan() facility for direct write
* async/SIGIO handling correct also for sync writer, async reader
* limits settable via sysctl, amountpipekva and nbigpipes available via sysctl
* pipes are unidirectional - this is enforced on file descriptor level
for now only, the code would be updated to take advantage of it
eventually
* uses lockmgr(9)-based locks instead of home brew variant
* scatter-gather write is handled correctly for direct write case, data
is transferred by PIPE_DIRECT_CHUNK bytes maximum, to avoid running out of kva

All FreeBSD/NetBSD specific code is within appropriate #ifdef, in preparation
to feed changes back to FreeBSD tree.

This pipe implementation is optional for now, add 'options NEW_PIPE'
to your kernel config to use it.


# 1.59 14-Jun-2001 thorpej

Fix a partial construction problem that can cause race conditions
between creation of a file descriptor and close(2) when using kernel
assisted threads. What we do is stick descriptors in the table, but
mark them as "larval". This causes essentially everything to treat
it as a non-existent descriptor, except for fdalloc(), which sees a
filled slot so that it won't (incorrectly) allocate it again. When
a descriptor is fully constructed, the code that has constructed it
marks it as "mature" (which actually clears the "larval" flag), and
things continue to work as normal.

While here, gather all the code that gets a descriptor from the table
into a fd_getfile() function, and call it, rather than having the
same (sometimes incorrect) code copied all over the place.


# 1.58 06-May-2001 manu

implement the recently introduced EMUL_BSD_ASYNCIO_PIPE emulation package
flag.

EMUL_BSD_ASYNCIO_PIPE notes that the emulated binaries expect the original
BSD pipe behavior for asynchronous I/O, which is to fire SIGIO on read() and
write(). OSes without this flag do not expect any SIGIO to be fired on
read() and write() for pipes, even when async I/O was requested. As far as
we know, the OSes that need EMUL_BSD_ASYNCIO_PIPE are NetBSD, OSF/1 and
Darwin.


Revision tags: thorpej_scsipi_beforemerge thorpej_scsipi_nbase thorpej_scsipi_base
# 1.57 27-Feb-2001 lukem

branches: 1.57.2;
convert to ANSI KNF


# 1.56 10-Dec-2000 fvdl

Make sobind() take a struct proc *. It already took curproc and
passed it down to the appropriate usrreq function, and this
allows usage for contexts that need to be explicitly different
from curproc (like in the NFS code when binding to a reserved port).


# 1.55 24-Nov-2000 jdolecek

define COMPAT_OLDSOCK unconditionally - the code is used virtually for all
emulations besides NetBSD, and this way it's LKM-safe


# 1.54 02-Aug-2000 thorpej

MALLOC()/FREE() are not to be used for variable sized allocations.


# 1.53 27-Jun-2000 mrg

remove include of <vm/vm.h>


Revision tags: netbsd-1-5-RELEASE netbsd-1-5-BETA2 netbsd-1-5-BETA netbsd-1-5-ALPHA2 netbsd-1-5-base minoura-xpg4dl-base
# 1.52 27-May-2000 sommerfeld

branches: 1.52.4;
Reduce use of curproc in several places:

- Change ktrace interface to pass in the current process, rather than
p->p_tracep, since the various ktr* function need curproc anyway.

- Add curproc as a parameter to mi_switch() since all callers had it
handy anyway.

- Add a second proc argument for inferior() since callers all had
curproc handy.

Also, miscellaneous cleanups in ktrace:

- ktrace now always uses file-based, rather than vnode-based I/O
(simplifies, increases type safety); eliminate KTRFLAG_FD & KTRFAC_FD.
Do non-blocking I/O, and yield a finite number of times when receiving
EWOULDBLOCK before giving up.

- move code duplicated between sys_fktrace and sys_ktrace into ktrace_common.

- simplify interface to ktrwrite()


# 1.51 30-Mar-2000 augustss

Get rid of register declarations.


# 1.50 23-Mar-2000 thorpej

Implement fdremove() which is used in place of all the code that
did the "fdp->fd_ofiles[fd] = 0" assignment; fdremove() make sure
the fd_freefiles hints stay in sync.

From OpenBSD.


Revision tags: chs-ubc2-newbase wrstuden-devbsize-19991221 wrstuden-devbsize-base comdex-fall-1999-base fvdl-softdep-base
# 1.49 05-Nov-1999 mycroft

branches: 1.49.2;
Fix recent bug in sys_accept(): we must remove the file descriptor from the
file descriptor table before freeing the file description.


# 1.48 30-Oct-1999 enami

back out unnecessary stylistic changes in recent changes, to keep coding
style closer to NKF.


# 1.47 27-Oct-1999 jdolecek

minor cleanup of previous - avoid goto and code duplication


# 1.46 27-Oct-1999 darrenr

patch from Greg A. Woods to fix panic problems with code that attempts to
recover from failures to accept a socket successfully. Problem suggested
by this:
> It would appear (from two "panic: closef: count < 0" failures in less
> than 12 hours) that Darren's fix to accept(2) for lost file descriptors
> isn't quite correct. His fix inserts a call to closef() to handle one
> of several possible error conditions. However everywhere else in the
> socket code in the same file where falloc() cleanup is necessary the
> function used is ffree().


Revision tags: chs-ubc2-base
# 1.45 01-Jul-1999 itojun

branches: 1.45.2; 1.45.4; 1.45.6;
IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.


# 1.44 01-Jul-1999 darrenr

fix sys_accept() to return EOPNOTSUPP for protocols which don't support
listen/accept (PR_LISTEN flag in protosw) and detect obvious faults in
parameters passed. It is still possible for the address used for copying
the socket information to become invalid between that check and the copyout
so close the connection's allocated fd if the copyout fails so that we can
return EFAULT without allocating an fd and the application not knowing about
it. Ideally we'd be able to queue the connection back up so a later accept
could retrieve it but unfortunately that's not possible.


# 1.43 05-May-1999 thorpej

Add "use counting" to file entries. When closing a file, and it's reference
count is 0, wait for use count to drain before finishing the close.

This is necessary in order for multiple processes to safely share file
descriptor tables.


# 1.42 30-Apr-1999 cgd

add checks for COMPAT_OSF1 in the appropriate places


Revision tags: netbsd-1-4-RELEASE netbsd-1-4-base
# 1.41 10-Feb-1999 kleink

branches: 1.41.2; 1.41.4; 1.41.6;
* Due to addition and use of socklen_t, make the socket option and address
arguments passed to accept(), bind(), connect(), getpeername(), getsockname(),
getsockopt(), recvfrom(), sendto() and sendmsg() unsigned, which also elimiates
a few casts.
* Reflect the (now) signedness of msg_iovlen, which necessiates the addition
of a few casts.


# 1.40 18-Dec-1998 drochner

solve the COMPAT_OLDSOCK/MSG_COMPAT problem differently:
The source files which need MSG_COMPAT define COMPAT_OLDSOCK.


Revision tags: kenh-if-detach-base
# 1.39 26-Nov-1998 mycroft

Revert the functional change in rev 1.38; permit a msg_iovlen of 0.
There are two reasons for this:
* We should be able to pass file descriptors without sending any data.
* We could send zero-length iovecs anyway (but we shouldn't have to do this).
Also, msg_iovlen is already a u_int, so delete a bunch of casts.


Revision tags: chs-ubc-base
# 1.38 04-Aug-1998 kleink

Per XNS Issue 5, calling recvmsg(2) or sendmsg(2) with an msg.msg_iovlen less
than or equal to 0 shall fail with EMSGSIZE; the latter condition was not being
checked for. Also, document the msg.msg_iovlen > {IOV_MAX} case.


# 1.37 04-Aug-1998 kleink

UIO_MAXIOV -> IOV_MAX


# 1.36 04-Aug-1998 perry

Abolition of bcopy, ovbcopy, bcmp, and bzero, phase one.
bcopy(x, y, z) -> memcpy(y, x, z)
ovbcopy(x, y, z) -> memmove(y, x, z)
bcmp(x, y, z) -> memcmp(x, y, z)
bzero(x, y) -> memset(x, 0, y)


# 1.35 03-Aug-1998 kleink

Fix two off-by-one bugs, both present in each recvmsg(2) and sendmsg(2):
* the first one would cause an unnecessary malloc() of iovec storage for
a msg_iovlen of UIO_SMALLIOV although the required amount of memory has
been allocated on the stack.
* the second one would cause a recvmsg() or sendmsg() with a msg_iovlen of
UIO_MAXIOV to fail with EMSGSIZE, which is also a violation of XNS5.


# 1.34 31-Jul-1998 perry

fix sizeofs so they comply with the KNF style guide. yes, it is pedantic.


Revision tags: eeh-paddr_t-base
# 1.33 29-Jul-1998 thorpej

branches: 1.33.2;
When checking for overflow in the residual count, test against SSIZE_MAX.
The read/write system calls return ssize_t because -1 is used to indicate
error, therefore the transfer size MUST be limited to SSIZE_MAX, otherwise
garbage can be returned to the user.

There is NO change from existing behavior here, only a more precise
definition of that the semantics are, except in the Alpha case, where
the full SSIZE_MAX transfer size can now be realized (ssize_t is 64-bit
on the Alpha).


# 1.32 18-Jul-1998 lukem

use AF_LOCAL instead of AF_UNIX


# 1.31 25-Jun-1998 thorpej

defopt KTRACE


# 1.30 25-Apr-1998 matt

Hook for 0-copy (or other optimized) sends and receives


# 1.29 01-Mar-1998 fvdl

Merge with Lite2 + local changes


# 1.28 06-Feb-1998 thorpej

When copying out multiple control messages, ensure that the next control
message is aligned. From David Borman <dab@bsdi.com>.


# 1.27 07-Jan-1998 thorpej

Make insertion and removal of sockets from the partial and incoming
connections queues O(C) rather than O(N).


# 1.26 07-Jan-1998 thorpej

Fix bug in recvit() that would cause recvmsg() to only receive one
control message, even if there were multiple control messages on
the queue. From Jean-Luc Richier <Jean-Luc.Richier@imag.fr>, in
bug report kern/4700.


Revision tags: netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base thorpej-signal-base marc-pcmcia-bp marc-pcmcia-base
# 1.25 26-Jun-1997 thorpej

branches: 1.25.8;
Use UCHAR_MAX rather than "255" when sanity-checking the length of a
sockaddr in sockargs().


# 1.24 26-Jun-1997 thorpej

In sockargs():
- Add a comment describing my feelings about this interface, in general.
- Remove the COMPAT_OLDSOCK length hack. Instead, if the socket argument
is too long to fit in an mbuf, allocate enough external storage to
hold it.
- If the socket argument is a sockaddr, don't allow the length to be
greater than 255, as that would overflow sa_len.
Many thanks to enami tsugutomo <enami@cv.sony.co.jp> for his sanity checking.


Revision tags: is-newarp-before-merge is-newarp-base
# 1.23 22-Dec-1996 cgd

* catch up with system call argument type fixups/const poisoning.
* Fix arguments to various copyin()/copyout() invocations, to avoid
gratuitous casts.
* Some KNF formatting fixes
* Change sockargs()'s second argument to be a const void *, to help
with dealing with the syscall argument type fixups/const poisoning.


# 1.22 14-Jun-1996 cgd

avoid unnecessary checks of m_get/MGET/etc.'s return values. When
they're called with M_WAIT, they are defined to never return NULL.


# 1.21 22-May-1996 mycroft

Pass a proc pointer down to the usrreq and pcbbind functions for PRU_ATTACH, PRU_BIND and
PRU_CONTROL. The usrreq interface really needs to be split up, but this will have to wait.
Remove SS_PRIV completely.


Revision tags: netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.20 17-May-1996 pk

branches: 1.20.4;
Don't touch retval[] in socketpair(); manual page says this system call
return 0 on success (PR#2428).


# 1.19 09-Feb-1996 christos

More proto fixes


# 1.18 04-Feb-1996 christos

First pass at prototyping


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.17 10-Oct-1995 mycroft

Add hooks for COMPAT_FREEBSD, from Noriyuki Soda.


# 1.16 07-Oct-1995 mycroft

Prefix names of system call implementation functions with `sys_'.


# 1.15 19-Sep-1995 thorpej

Make system calls conform to a standard prototype and bring those
prototypes into scope.


# 1.14 12-Aug-1995 mycroft

splnet --> splsoftnet


# 1.13 24-Jun-1995 christos

Extracted all of the compat_xxx routines, and created a library [libcompat]
for them. There are a few #ifdef COMPAT_XX remaining, but they are not easy
or worth eliminating (yet).


# 1.12 10-May-1995 christos

tty_tb.c: need to include ioctl_compat.h in order to compile.
sysv_shm.c: make shm_find_segment_by_shmid global so it can be used by
COMPAT_HPUX. There should be a better way...
rest: Add #ifdef COMPAT_HPUX where needed


# 1.11 05-Mar-1995 fvdl

Extended a couple of defines with "|| defined(COMPAT_LINUX)" to make
things compile without requiring COMPAT_43 and/or COMPAT_09.


# 1.10 30-Oct-1994 cgd

be more careful with types, also pull in headers where necessary.


# 1.9 20-Oct-1994 cgd

update for new syscall args description mechanism


Revision tags: netbsd-1-0-PATCH06 netbsd-1-0-PATCH05 netbsd-1-0-PATCH04 netbsd-1-0-PATCH03 netbsd-1-0-PATCH02 netbsd-1-0-PATCH1 netbsd-1-0-PATCH0 netbsd-1-0-RELEASE netbsd-1-0-base
# 1.8 29-Jun-1994 cgd

New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.7 04-May-1994 mycroft

Add return types where missing. Simplify some of the compat conditionals.
Include compat code if COMPAT_SUNOS with or without COMPAT_43.


# 1.6 18-Dec-1993 mycroft

Canonicalize all #includes.


Revision tags: magnum-base netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.5 17-Jul-1993 mycroft

branches: 1.5.4;
Finish moving struct definitions outside of function declarations.


# 1.4 27-Jun-1993 andrew

* ansifications
* Yuval Yarom's socket recv(2) fixes - access rights problems (see also
uipc_socket.c).


# 1.3 22-May-1993 cgd

add include of select.h if necessary for protos, or delete if extraneous


# 1.2 18-May-1993 cgd

make kernel select interface be one-stop shopping & clean it all up.


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


# 1.206 01-Jul-2022 riastradh

sendto(2), recvfrom(2): Scrub internal struct msghdr on stack.

Otherwise this is kernel stack disclosure via ktrace.

Reported-by: syzbot+1d40303b310063778194@syzkaller.appspotmail.com


# 1.205 29-Jun-2022 riastradh

recvmmsg(2): More timespec validation.

Reported-by: syzbot+004ed2f264534bd27312@syzkaller.appspotmail.com
Reported-by: syzbot+6f9014c842c4e78df7bc@syzkaller.appspotmail.com


# 1.204 28-Jun-2022 riastradh

recvmmsg(2): Avoid arithmetic overflow in timeout calculations.

XXX This is not right -- it doesn't actually do anything to time
out...

Reported-by: syzbot+784209d76a94fcc6417b@syzkaller.appspotmail.com


# 1.203 27-Jun-2022 riastradh

sendmsg(2): Avoid buffer overrun in ktrace of invalid cmsghdr.

Reported-by: syzbot+efded148140b23425f5c@syzkaller.appspotmail.com


# 1.202 02-Oct-2021 thorpej

...and correct my terrible spelling.


# 1.201 02-Oct-2021 thorpej

- Strenghen the poll(2) fifo_inout test to ensure that once the reader
has read enough that exactly PIPE_BUF space is available that the FIFO
becomes writable again.
- When creating a FIFO, ensure that the receive low water mark is 1
(a FIFO must be readable when at least 1 byte is available); this
was already the case implicitly, but this makes it explicit.
- Similarly, set the send low water mark to PIPE_BUF to ensure that
the pipe is writable when at least PIPE_BUF bytes of space are available
in the send buffer. Without this change, the strengthened test case
above does not pass (the default send low water mark is larger than
PIPE_BUF; see soreserve()).
- Make the same low water mark changes to the PIPE_SOCKETPAIR case.


Revision tags: thorpej-i2c-spi-conf2-base thorpej-futex2-base thorpej-cfargs2-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base thorpej-i2c-spi-conf-base thorpej-cfargs-base thorpej-futex-base
# 1.200 23-May-2020 ad

Move proc_lock into the data segment. It was dynamically allocated because
at the time we had mutex_obj_alloc() but not __cacheline_aligned.


Revision tags: netbsd-9-2-RELEASE netbsd-9-1-RELEASE bouyer-xenpvh-base2 phil-wifi-20200421 bouyer-xenpvh-base1 phil-wifi-20200411 bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3 netbsd-9-0-RELEASE netbsd-9-0-RC2 ad-namecache-base2 ad-namecache-base1 ad-namecache-base netbsd-9-0-RC1 phil-wifi-20191119 netbsd-9-base phil-wifi-20190609 isaki-audio2-base pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126
# 1.199 12-Nov-2018 hannken

branches: 1.199.4;
sys_recvmmsg: don't defer an error that already gets returned.


# 1.198 07-Nov-2018 hannken

Don't defer errors from sendmmsg(). This matches the linux manpage.

Defer errors from recvmmsg() through so_rerror and
tests and return a deferred error on entry.

Ok: christos@


Revision tags: pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.197 03-Sep-2018 riastradh

Rename min/max -> uimin/uimax for better honesty.

These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER! Some subsystems have

#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)


# 1.196 01-Aug-2018 rjs

Add ioctl(2) handler for kernel part of sctp_peeloff().


# 1.195 31-Jul-2018 rjs

Add getsockopt2() syscall.


Revision tags: pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521
# 1.194 04-May-2018 christos

branches: 1.194.2;
define MBUFTYPES here.


# 1.193 03-May-2018 christos

Fix COMPAT_NETBSD32 cmsg handling:

1. alignment was wrong for > 1 message
2. macros were doing incorrect pointer comparisons, fortunately ending
the iteration early after the fists cmsg instead of crashing.
3. don't output 32 bit ktrace records for cmsg. 32 bit programs running
under emulation on 64 bit systems should produce 64 bit ktrace records
so that the native ktrace can handle the records; remove extra arguments
that are now not needed (the 32 bit msghdr).
4. output the correct type for cmsg trace records.
5. output all the cmsg records in traces instead of just the first one.

Welcome to 8.99.15 because of the argument removal.

XXX: Really all the code should be changed to use the CMSG_{FIRST,NXT}HDR
macros...


Revision tags: pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322
# 1.192 16-Mar-2018 christos

PR/53103: Timo Buhrmester: linux emulation of sendto(2) broken

The sockargs refactoring broke it, because sockargs only works with a user
address. Added an argument to sockargs to indicate where the address is
coming from. Welcome to 8.99.14.


Revision tags: pgoyette-compat-0315 pgoyette-compat-base
# 1.191 12-Feb-2018 maxv

branches: 1.191.2;
Add a KASSERT; we expect *from to be a single mbuf (not chained).


# 1.190 04-Jan-2018 christos

Add a new sockopt member to keep track of the actual size of the option
that should be returned to the caller in getsockopt(2).

(Tom Ivar Helbekkmo)


# 1.189 31-Dec-2017 christos

pass valsize for getsockopt like we do for setsockopt


# 1.188 26-Dec-2017 kamil

Refactor pipe1() and correct a bug in sys_pipe2() (SYS_pipe2)

sys_pipe2() returns two integers (values), the 2nd one is a copy of the 2nd
file descriptor that lands in fildes[2]. This is a side effect of reusing
the code for sys_pipe() (SYS_pipe) and not cleaning it up.

The first returned value is (on success) 0.

Introduced a small refactoring in pipe1() that it does not operate over
retval[], but on an array int[2]. A user sets retval[] for pipe() when
desired and needed.

This refactoring touches compat code: netbsd32, linux, linux32.

Before the changes on NetBSD/amd64:

$ ktruss -i ./a.out
[...]
15131 1 a.out pipe2(0x7f7fff2e62b8, 0) = 0, 4
[...]

After the changes:

$ ktruss -i ./a.out
[...]
782 1 a.out pipe2(0x7f7fff97e850, 0) = 0
[...]

There should not be a visible change for current users.

Sponsored by <The NetBSD Foundation>


Revision tags: tls-maxphys-base-20171202 nick-nhusb-base-20170825 perseant-stdc-iso10646-base
# 1.187 20-Jun-2017 christos

Change len type to be unsigned int for consistency with the input type.
Don't check for negative; it does not matter we clamp anyway. This
broke the compat32 getsockname() where an unitialized socklen_t ended
up randomly negative causing it to fail.


Revision tags: matt-nb8-mediatek-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204
# 1.186 03-Feb-2017 christos

branches: 1.186.6;
expose sendmsg_so and recvmsg_so.


# 1.185 02-Feb-2017 christos

expose copyout_sockname_sb


Revision tags: bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204
# 1.184 03-Dec-2016 christos

branches: 1.184.2;
Add missing ktrkuser


Revision tags: pgoyette-localcount-20161104 nick-nhusb-base-20161004 localcount-20160914
# 1.183 13-Sep-2016 martin

Make the ktrace record written by do_sys_sendmsg/do_sys_recvmsg overridable
by the caller. Use this in compat_netbsd32 to log the 32bit version, so
the 32bit userland kdump is happy.


Revision tags: pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.182 07-Jul-2016 msaitoh

branches: 1.182.2;
KNF. Remove extra spaces. No functional change.


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.181 01-Nov-2015 christos

Don't overwrite the user iov pointer in sendmmsg. Make the send and receive
code look the same.


Revision tags: nick-nhusb-base-20150921
# 1.180 24-Aug-2015 pooka

sprinkle _KERNEL_OPT


# 1.179 22-Jul-2015 maxv

Memory leak. Triggerable from an unprivileged user via COMPAT_43.


Revision tags: nick-nhusb-base-20150606
# 1.178 09-May-2015 rtr

change sosend() to accept sockaddr * instead of mbuf * for nam.

bump to 7.99.16


# 1.177 02-May-2015 rtr

make connect syscall use sockaddr_big and modify pr_{send,connect}
nam parameter type from buf * to sockaddr *.

final commit for parameter type changes to protocol user requests

* bump kernel version to 7.99.15 for parameter type changes to pr_{send,connect}


# 1.176 24-Apr-2015 rtr

make accept, getsockname and getpeername syscalls use sockaddr_big and modify
pr_{accept,sockname,peername} nam parameter type from mbuf * to sockaddr *.

* retained use of mbuftypes[MT_SONAME] for now.
* bump to netbsd version 7.99.12 for parameter type change.

patch posted to tech-net@ 2015/04/19


Revision tags: nick-nhusb-base-20150406
# 1.175 03-Apr-2015 rtr

* change pr_bind to accept struct sockaddr * instead of struct mbuf *
* update protocol bind implementations to use/expect sockaddr *
instead of mbuf *
* introduce sockaddr_big struct for storage of addr data passed via
sys_bind; sockaddr_big is of sufficient size and alignment to
accommodate all addr data sizes received.
* modify sys_bind to allocate sockaddr_big instead of using an mbuf.
* bump kernel version to 7.99.9 for change to pr_bind() parameter type.

Patch posted to tech-net@
http://mail-index.netbsd.org/tech-net/2015/03/15/msg005004.html

The choice to use a new structure sockaddr_big has been retained since
changing sockaddr_storage size would lead to unnecessary ABI change. The
use of the new structure does not preclude future work that increases
the size of sockaddr_storage and at that time sockaddr_big may be
trivially replaced.

Tested by mrg@ and myself, discussed with rmind@, posted to tech-net@


# 1.174 06-Mar-2015 rtr

Return EINVAL if namelen isn't large enough to encompass the expected
members of sockaddr structures. i.e. sa_len and sa_family.

Discussed with and patch by christos@


Revision tags: nick-nhusb-base
# 1.173 05-Sep-2014 matt

branches: 1.173.2;
Try not to use f_data, use f_{vnode,socket,pipe,mqueue,kqueue,ksem} to get
a correctly typed pointer.


Revision tags: netbsd-7-0-RC2 netbsd-7-0-RC1 netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.172 09-Aug-2014 rtr

branches: 1.172.2;
split PRU_CONNECT2 & PRU_PURGEIF function out of pr_generic() usrreq
switches and put into separate functions

- always KASSERT(solocked(so)) even if not implemented
(for PRU_CONNECT2 only)

- replace calls to pr_generic() with req = PRU_CONNECT2 with calls to
pr_connect2()

- replace calls to pr_generic() with req = PRU_PURGEIF with calls to
pr_purgeif()

put common code from unp_connect2() (used by unp_connect() into
unp_connect1() and call out to it when needed

patch only briefly reviewed by rmind@


# 1.171 09-Jul-2014 rtr

* split PRU_PEERADDR and PRU_SOCKADDR function out of pr_generic()
usrreq switches and put into separate functions
xxx_{peer,sock}addr(struct socket *, struct mbuf *).

- KASSERT(solocked(so)) always in new functions even if request
is not implemented

- KASSERT(pcb != NULL) and KASSERT(nam) if the request is
implemented and not for tcp.

* for tcp roll #ifdef KPROF and #ifdef DEBUG code from tcp_usrreq() into
easier to cut & paste functions tcp_debug_capture() and
tcp_debug_trace()

- functions provided by rmind
- remaining use of PRU_{PEER,SOCK}ADDR #define to be removed in a
future commit.

* rename netbt functions to permit consistency of pru function names
(as has been done with other requests already split out).

- l2cap_{peer,sock}addr() -> l2cap_{peer,sock}_addr_pcb()
- rfcomm_{peer,sock}addr() -> rfcomm_{peer,sock}_addr_pcb()
- sco_{peer,sock}addr() -> sco_{peer,sock}_addr_pcb()

* split/refactor do_sys_getsockname(lwp, fd, which, nam) into
two functions do_sys_get{peer,sock}name(fd, nam).

- move PRU_PEERADDR handling into do_sys_getpeername() from
do_sys_getsockname()
- have svr4_stream directly call do_sys_get{sock,peer}name()
respectively instead of providing `which' & fix a DPRINTF string
that incorrectly wrote "getpeername" when it meant "getsockname"
- fix sys_getpeername() and sys_getsockname() to call
do_sys_get{sock,peer}name() without `which' and `lwp' & adjust
comments
- bump kernel version for removal of lwp & which parameters from
do_sys_getsockname()

note: future cleanup to remove struct mbuf * abuse in
xxx_{peer,sock}name()
still to come, not done in this commit since it is easier to do post
split.

patch reviewed by rmind

welcome to 6.99.47


Revision tags: rmind-smpnet-nbase rmind-smpnet-base
# 1.170 18-May-2014 rmind

Add struct pr_usrreqs with a pr_generic function and prepare for the
dismantling of pr_usrreq in the protocols; no functional change intended.
PRU_ATTACH/PRU_DETACH changes will follow soon.

Bump for struct protosw. Welcome to 6.99.62!


# 1.169 17-May-2014 rmind

- fsocreate: set SS_NBIO before the file descriptor is affixed as there is
a theoretical race condition (hard to trigger, though); remove the LWP
parameter and clean up the code a little.
- Sprinkle few comments.
- Remove M_SOOPTS while here.


# 1.168 17-May-2014 rmind

makesocket: set SS_NBIO slightly earlier.


# 1.167 17-May-2014 rmind

Remove trailing whitespaces, wrap long lines, minor KNF; no functional changes.


Revision tags: yamt-pagecache-base9
# 1.166 07-Apr-2014 seanb

Fix a case where an erroneous EAGAIN was returned out of recvmmsg.
This occured when some, but not all of the mmsg array members
were filled with data from a non-blocking socket.
PR kern/48725


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.165 09-Oct-2013 christos

branches: 1.165.2;
delete extra m_len initialization.


# 1.164 09-Oct-2013 christos

PR/48292: Justin Cormack: paccept creates sockets that cannot be made blocking
Reset the socket flags not just the file flags for non-blocking I/O.
XXX: pullup 6


# 1.163 08-Oct-2013 christos

PR/47591: Michael Plass: If the unix socket is closed before accept,
unp->unp_conn will be NULL in PRU_ACCEPT, as called from
sys_accept->so_accept. This will cause the usrreq to return with
no error, leaving the mbuf gotten from m_get() with an uninitialized
length, containing junk from a previous call. Initialize m_len to
be 0 to handle this case. This is yet another reason why Beverly's
idea of setting m_len = 0 in m_get() makes a lot of sense. Arguably
this could be an error, since the data we return now has 0 family
and length.


# 1.162 29-Aug-2013 rmind

Remove SS_ISCONFIRMING, it is unused and TP4 will not come back.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.161 03-Jun-2013 christos

branches: 1.161.2;
use the proper name for kdump pretty-printing.


Revision tags: agc-symver-base
# 1.160 14-Feb-2013 christos

PR/47569: Valery Ushakov: SOCK_NONBLOCK does not work because it does not
set SS_NBIO.
XXX: there are too many flags that mean the same thing in too many places,
and too many flags that mean the same thing and are different.


# 1.159 14-Feb-2013 riastradh

Fix some screw cases in cmsg file descriptor passing.

- Don't leave garbage in the control buffer if allocating file
descriptors fails in unp_externalize.

- Scrub the space between CMSG_LEN and CMSG_SPACE to avoid kernel
memory disclosure in unp_externalize.

- Don't read past cmsg_len when closing file descriptors that
couldn't get delivered, in free_rights.

ok christos


Revision tags: yamt-pagecache-base8
# 1.158 29-Dec-2012 mlelstv

The sanity check prevented messages that carry only ancillary data.


# 1.157 29-Dec-2012 mlelstv

If an untraced process sleeps in recvmsg/sendmsg, the syscall does not
allocate an iov structure for ktrace. When tracing is then enabled
and the process wakes up, it crashes the kernel.

Undo the last commit which introduced this error path.

Avoid the mentioned kmem_alloc assertion by adding a sanity check analog
to similar code in sys_generic.c for I/O on file handles instead of
sockets.

This also causes the syscall to return EMSGSIZE if the msg_iovlen member
of the msg structure is less than or equal to 0, as defined in
recvmsg(2)/sendmsg(2).


Revision tags: yamt-pagecache-base7 yamt-pagecache-base6
# 1.156 17-Jul-2012 njoly

branches: 1.156.2;
Avoid kmem_alloc KASSERT for 0 byte allocation, when tracing processes
that use empty messages with sendmsg/recvmsg.


# 1.155 22-Jun-2012 christos

Add {send,recv}mmsg from Linux


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2 netbsd-6-base
# 1.154 25-Jan-2012 christos

branches: 1.154.2;
revert atomics for so_options since it is a short.


# 1.153 25-Jan-2012 christos

need <sys/atomic.h>


# 1.152 25-Jan-2012 christos

Add locking, requested by yamt. Note that locking is not used everywhere
for these.


# 1.151 25-Jan-2012 christos

As discussed in tech-kern, provide the means to prevent delivery of SIGPIPE
on EPIPE for all file descriptor types:

- provide O_NOSIGPIPE for open,kqueue1,pipe2,dup3,fcntl(F_{G,S}ETFL) [NetBSD]
- provide SOCK_NOSIGPIPE for socket,socketpair [NetBSD]
- provide SO_NOSIGPIPE for {g,s}seckopt [NetBSD/FreeBSD/MacOSX]
- provide F_{G,S}ETNOSIGPIPE for fcntl [MacOSX]


# 1.150 21-Dec-2011 christos

simplify expression


# 1.149 20-Dec-2011 christos

- Eliminate so_nbio and turn it into a bit SS_NBIO in so_state.
- Introduce MSG_NBIO so that we can turn non blocking i/o on a per call basis
- Use MSG_NBIO to fix the XXX: multi-threaded issues on the fifo sockets.
- Don't set SO_CANTRCVMORE, if we were interrupted (perhaps do it for all
errors?).


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2
# 1.148 04-Nov-2011 christos

branches: 1.148.4;
Fix error I introduced in previous commit that caused asymmetric connects
when SOCK_NONBLOCK or SOCK_CLOEXEC was specified. Factor out common code
and simplify error return.


Revision tags: yamt-pagecache-base
# 1.147 21-Sep-2011 christos

branches: 1.147.2;
Put the mbuf type in the ktrace record so that we know how to decode it
in userland.


# 1.146 27-Jul-2011 uebayasi

These don't need uvm/uvm_extern.h.


# 1.145 15-Jul-2011 christos

fail with EINVAL if flags not are not O_CLOEXEC|O_NONBLOCK in pipe2(2) and
dup3(2)


# 1.144 26-Jun-2011 christos

* Arrange for interfaces that create new file descriptors to be able to
set close-on-exec on creation (http://udrepper.livejournal.com/20407.html).

- Add F_DUPFD_CLOEXEC to fcntl(2).
- Add MSG_CMSG_CLOEXEC to recvmsg(2) for unix file descriptor passing.
- Add dup3(2) syscall with a flags argument for O_CLOEXEC, O_NONBLOCK.
- Add pipe2(2) syscall with a flags argument for O_CLOEXEC, O_NONBLOCK.
- Add flags SOCK_CLOEXEC, SOCK_NONBLOCK to the socket type parameter
for socket(2) and socketpair(2).
- Add new paccept(2) syscall that takes an additional sigset_t to alter
the sigmask temporarily and a flags argument to set SOCK_CLOEXEC,
SOCK_NONBLOCK.
- Add new mode character 'e' to fopen(3) and popen(3) to open pipes
and file descriptors for close on exec.
- Add new kqueue1(2) syscall with a new flags argument to open the
kqueue file descriptor with O_CLOEXEC, O_NONBLOCK.

* Fix the system calls that take socklen_t arguments to actually do so.

* Don't include userland header files (signal.h) from system header files
(rump_syscallargs.h).

* Bump libc version for the new syscalls.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.143 24-Apr-2011 rmind

- Replace few malloc(9) uses with kmem(9).
- Rename buf_malloc() to buf_alloc(), fix comments.
- Remove some unnecessary inclusions.


# 1.142 10-Apr-2011 christos

- Add O_CLOEXEC to open(2)
- Add fd_set_exclose() to encapsulate uses of FIO{,N}CLEX, O_CLOEXEC, F{G,S}ETFD
- Add a pipe1() function to allow passing flags to the fd's that pipe(2)
opens to ease implementation of linux pipe2(2)
- Factor out fp handling code from open(2) and fhopen(2)


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base matt-mips64-premerge-20101231 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10 uebayasi-xip-base1
# 1.141 23-Apr-2010 rmind

branches: 1.141.2;
Replace M_IOV and some malloc(9)s with kmem(9), and while there:
- Fix invalid free (M_TEMP vs M_IOV) in do_sys_recvmsg(), spotted by jakllsch@.
Also, same fix in osf1_sys_sendmsg_xopen().
- Fix attempt to free non-allocated memory in error path in netbsd32___getfh30().
- Plug a memory leak in compat_43_netbsd32_orecvmsg().


Revision tags: yamt-nfs-mp-base9 uebayasi-xip-base
# 1.140 21-Jan-2010 pgoyette

branches: 1.140.2; 1.140.4;
Remove unnecessary call to kauth_cred_free().

This resolves an occassional crash I'd been experiencing as reported on
current-users@

Fix suggested by and OK elad@


# 1.139 29-Dec-2009 elad

Add credentials to to sockets.

We don't need any deferred free etc. because we no longer free the
credentials in interrupt context.

Tons of help from matt@, thanks!


# 1.138 20-Dec-2009 dsl

If a multithreaded app closes an fd while another thread is blocked in
read/write/accept, then the expectation is that the blocked thread will
exit and the close complete.
Since only one fd is affected, but many fd can refer to the same file,
the close code can only request the fs code unblock with ERESTART.
Fixed for pipes and sockets, ERESTART will only be generated after such
a close - so there should be no change for other programs.
Also rename fo_abort() to fo_restart() (this used to be fo_drain()).
Fixes PR/26567


Revision tags: matt-premerge-20091211
# 1.137 09-Dec-2009 dsl

Rename fo_drain() to fo_abort(), 'drain' is used to mean 'wait for output
do drain' in many places, whereas fo_drain() was called in order to force
blocking read()/write() etc calls to return to userspace so that a close()
call from a different thread can complete.
In the sockets code comment out the broken code in the inner function,
it was being called from compat code.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 jym-xensuspend-nbase yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.136 04-Apr-2009 ad

Add fileops::fo_drain(), to be called from fd_close() when there is more
than one active reference to a file descriptor. It should dislodge threads
sleeping while holding a reference to the descriptor. Implemented only for
sockets but should be extended to pipes, fifos, etc.

Fixes the case of a multithreaded process doing something like the
following, which would have hung until the process got a signal.

thr0 accept(fd, ...)
thr1 close(fd)


Revision tags: nick-hppapmap-base2
# 1.135 21-Jan-2009 yamt

branches: 1.135.2;
restore the pre socket locking patch signal behaviour.
this fixes a busy-loop in nfs_connect.


Revision tags: netbsd-5-0-RC1 haad-dm-base2 haad-nbase2 ad-audiomp2-base netbsd-5-base matt-mips64-base2 haad-dm-base1 wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 haad-dm-base mjf-devfs2-base
# 1.134 06-Aug-2008 plunky

branches: 1.134.2; 1.134.4;
Convert socket options code to use a sockopt structure
instead of laying everything into an mbuf.

approved by core


Revision tags: simonb-wapbl-nbase simonb-wapbl-base
# 1.133 24-Jun-2008 ad

branches: 1.133.2;
Nothing uses getsock/getvnode any more.


Revision tags: wrstuden-revivesa-base-1 yamt-pf42-base4 yamt-pf42-base3 wrstuden-revivesa-base
# 1.132 30-May-2008 rmind

branches: 1.132.2;
do_sys_accept: release the reference to sock in few error paths.
Should fix PR/38790, report and test-case by Nicolas Joly.


Revision tags: hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2
# 1.131 28-Apr-2008 martin

branches: 1.131.2;
Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.130 24-Apr-2008 ad

branches: 1.130.2;
Network protocol interrupts can now block on locks, so merge the globals
proclist_mutex and proclist_lock into a single adaptive mutex (proc_lock).
Implications:

- Inspecting process state requires thread context, so signals can no longer
be sent from a hardware interrupt handler. Signal activity must be
deferred to a soft interrupt or kthread.

- As the proc state locking is simplified, it's now safe to take exit()
and wait() out from under kernel_lock.

- The system spends less time at IPL_SCHED, and there is less lock activity.


# 1.129 24-Apr-2008 ad

Merge the socket locking patch:

- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.

With much feedback from matt@ and plunky@.


Revision tags: yamt-pf42-baseX yamt-pf42-base ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14
# 1.128 21-Mar-2008 ad

branches: 1.128.2;
Catch up with descriptor handling changes. See kern_descrip.c revision
1.173 for details.


Revision tags: keiichi-mipv6-nbase nick-net80211-sync-base keiichi-mipv6-base matt-armv6-nbase mjf-devfs-base hpcarm-cleanup-base
# 1.127 06-Feb-2008 ad

branches: 1.127.6;
Don't lock the socket to set/clear FNONBLOCK. Just set it atomically.


Revision tags: vmlocking2-base3 bouyer-xeni386-nbase bouyer-xeni386-base matt-armv6-base
# 1.126 26-Dec-2007 ad

Merge more changes from vmlocking2, mainly:

- Locking improvements.
- Use pool_cache for more items.


# 1.125 20-Dec-2007 dsl

Convert all the system call entry points from:
int foo(struct lwp *l, void *v, register_t *retval)
to:
int foo(struct lwp *l, const struct foo_args *uap, register_t *retval)
Fixup compat code to not write into 'uap' and (in some cases) to actually
pass a correctly formatted 'uap' structure with the right name to the
next routine.
A few 'compat' routines that just call standard ones have been deleted.
All the 'compat' code compiles (along with the kernels required to test
build it).
98% done by automated scripts.


# 1.124 16-Dec-2007 elad

Make solisten() take an lwp pointer like the rest, so it can be passed down
to pr_usrreq.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 vmlocking-nbase jmcneill-pm-base reinoud-bufcleanup-base
# 1.123 24-Nov-2007 dyoung

branches: 1.123.2; 1.123.6;
Pass the mbuf type (e.g., MT_SONAME, MT_SOOPTS) as the second
argument to getsockmbuf().


Revision tags: jmcneill-base bouyer-xenamd64-base2 yamt-x86pmap-base4 bouyer-xenamd64-base yamt-x86pmap-base3 yamt-x86pmap-base2 vmlocking-base
# 1.122 05-Oct-2007 dyoung

branches: 1.122.4;
Use getsombuf().


Revision tags: yamt-x86pmap-base
# 1.121 19-Sep-2007 christos

branches: 1.121.2;
minor nits; no code change.


# 1.120 19-Sep-2007 dyoung

1) Introduce a new socket option, (SOL_SOCKET, SO_NOHEADER), that
tells a socket that it should both add a protocol header to tx'd
datagrams and remove the header from rx'd datagrams:

int onoff = 1, s = socket(...);
setsockopt(s, SOL_SOCKET, SO_NOHEADER, &onoff);

2) Add an implementation of (SOL_SOCKET, SO_NOHEADER) for raw IPv4
sockets.

3) Reorganize the protocols' pr_ctloutput implementations a bit.
Consistently return ENOPROTOOPT when an option is unsupported,
and EINVAL if a supported option's arguments are incorrect.
Reorganize the flow of code so that it's more clear how/when
options are passed down the stack until they are handled.

Shorten some pr_ctloutput staircases for readability.

4) Extract common mbuf code into subroutines, add new sockaddr
methods, and introduce a new subroutine, fsocreate(), for reuse
later; use it first in sys_socket():

struct mbuf *m_getsombuf(struct socket *so)

Create an mbuf and make its owner the socket `so'.

struct mbuf *m_intopt(struct socket *so, int val)

Create an mbuf, make its owner the socket `so', put the
int `val' into it, and set its length to sizeof(int).


int fsocreate(..., int *fd)

Create a socket, a la socreate(9), put the socket into the
given LWP's descriptor table, return the descriptor at `fd'
on success.

void *sockaddr_addr(struct sockaddr *sa, socklen_t *slenp)
const void *sockaddr_const_addr(const struct sockaddr *sa, socklen_t *slenp)

Extract a pointer to the address part of a sockaddr. Write
the length of the address part at `slenp', if `slenp' is
not NULL.

socklen_t sockaddr_getlen(const struct sockaddr *sa)

Return the length of a sockaddr. This just evaluates to
sa->sa_len. I only add this for consistency with code that
appears in a portable userland library that I am going to
import.

const struct sockaddr *sockaddr_any(const struct sockaddr *sa)

Return the "don't care" sockaddr in the same family as
`sa'. This is the address a client should sobind(9) if it
does not care the source address and, if applicable, the
port et cetera that it uses.

const void *sockaddr_anyaddr(const struct sockaddr *sa, socklen_t *slenp)

Return the "don't care" sockaddr in the same family as
`sa'. This is the address a client should sobind(9) if it
does not care the source address and, if applicable, the
port et cetera that it uses.


Revision tags: nick-csl-alignment-base5
# 1.119 06-Sep-2007 rmind

do_sys_sendmsg: Plug a possible leak.
From CID: 4535


# 1.118 01-Sep-2007 dsl

Don't error calls to copy socket addresses to userspace when the application
has provided a non-null buffer pointer and a zero length.


# 1.117 27-Aug-2007 dsl

ktrace socket control structures (ie msghdr, address etc) using ktrkuser().


# 1.116 15-Aug-2007 ad

branches: 1.116.2;
Changes to make ktrace LKM friendly and reduce ifdef KTRACE. Proposed
on tech-kern.


Revision tags: matt-mips64-base nick-csl-alignment-base
# 1.115 15-Jul-2007 dsl

branches: 1.115.2; 1.115.6;
Remove non-user flags (especially MSG_IOVUSRSPACE) from mp->msg_flags
before passing to so_receive.
This may (or may not) have any effect...


Revision tags: mjf-ufs-trans-base
# 1.114 01-Jul-2007 dsl

Check for SOL_SOCKET when checking for SCM_RIGHTS.


# 1.113 24-Jun-2007 dsl

Split sys_getpeername() and sys_getsockname() so they can be called when the
'name' is wanted in kernel code.
Similarly split sys_accept() and change the split in recvmsg() so that it
is useful to the compat functions, recvit() is removed and replaced by
do_sys_recvmsg().
Factor out the code that writes socked names to userspace (from mbuf) to
avoid replicated code.
Extract the code that writes socket 'control' (CMSG) data out to userspace,
being more careful about the 'fd' that may exist inside SCM_RIGHTS msgs.
(they still get lost if some of the latter copyout calls fail).
Since these are new functions, and old LKMs will fail to load.


# 1.112 02-Jun-2007 enami

- Fix obvious typos so that sendto(2) works.
- Wrap lines again.


# 1.111 01-Jun-2007 dsl

Split sys_bind() and sys_connect() so that compat code can use common code
once the 'address' has been copied into an mbuf.
Add extra flags for 'struct msghdr.msg_flags' to indicate that the address
and control are already in mbufs, and that the uio structure is in userspace
for sending data, rename sendit() to do_sys_sendmsg() to ensure no old code
passes in random flags.
Changes to compat code to use new functions - removing some stackgap use.
Fix a 'use after free' in compat_43_sys_recvmsg.
I ***THINK*** the code that converts 'cmsg' formatted data is borked!
svr4_stream.c ought to be generated from svr4_32_stream.c during the build.


Revision tags: yamt-idlelwp-base8
# 1.110 13-May-2007 dsl

Fallout from caddr_t deletion - remove a load of redundant (void *) casts.


# 1.109 18-Apr-2007 yamt

sys_accept: fix usecount botch and double soclose in rev.1.108.


# 1.108 15-Apr-2007 yamt

sys_accept: don't leak a socket on error.


Revision tags: thorpej-atomic-base
# 1.107 04-Mar-2007 christos

branches: 1.107.2; 1.107.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


Revision tags: ad-audiomp-base post-newlock2-merge
# 1.106 09-Feb-2007 ad

branches: 1.106.2;
Merge newlock2 to head.


Revision tags: newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base netbsd-4-base
# 1.105 01-Nov-2006 yamt

branches: 1.105.2;
remove some __unused from function parameters.


# 1.104 23-Oct-2006 elad

PR/34873: Ryo Shimizu: sendmsg() can cause kernel panic

Patch applied, thanks!


Revision tags: yamt-splraiseipl-base2
# 1.103 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


Revision tags: yamt-splraiseipl-base yamt-pdpolicy-base9 yamt-pdpolicy-base8 rpaulo-netinet-merge-pcb-base
# 1.102 22-Aug-2006 seanb

branches: 1.102.2; 1.102.4;
Don't leave a dangling socket (no associated struct file) if
user supplied a bad name or anamelen parameter to accept(2).
If bad paramaters were suplied and a copyout() failed, the
struct file was cleaned up but not the associated socket. This
could leave sockets in CLOSE_WAIT that could never be closed.


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base7
# 1.101 23-Jul-2006 ad

branches: 1.101.2;
Use the LWP cached credentials where sane.


# 1.100 26-Jun-2006 mrg

version the socket(2) syscall. for compat30 socket, we use
EPROTONOSUPPORT instead of EAFNOSUPPORT.

from pavel@ with a little bit of clean up from myself.

XXX: netbsd32 (and perhaps other emulations) should be able
XXX: to call the standard socket calls for this i think, but
XXX: revisit this at another time.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base yamt-pdpolicy-base5 chap-midi-base simonb-timecounters-base
# 1.99 16-May-2006 christos

branches: 1.99.4;
Don't set mature an fd that has been ffree'd


Revision tags: elad-kernelauth-base
# 1.98 11-May-2006 christos

Add MSG_NOSIGNAL (from FreeBSD)


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2 yamt-pdpolicy-base
# 1.97 01-Mar-2006 yamt

branches: 1.97.2; 1.97.4; 1.97.6;
merge yamt-uio_vmspace branch.

- use vmspace rather than proc or lwp where appropriate.
the latter is more natural to specify an address space.
(and less likely to be abused for random purposes.)
- fix a swdmover race.


Revision tags: yamt-uio_vmspace-base5
# 1.96 26-Dec-2005 perry

branches: 1.96.2; 1.96.4; 1.96.6;
u_intN_t -> uintN_t


# 1.95 11-Dec-2005 christos

merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base ktrace-lwp-base
# 1.94 03-Sep-2005 martin

In adjust_rights() Use CMSG_SPACE() to calculate the number of
filedescriptors passed in this message - the counterpart in
unp_externalize does this as well.

Note that CMSG_SPACE(0) does not make sense, since it does not invoke
the alignment magic - so use CMSG_SPACE(sizeof(int)) and adjust the
calculated total later.

This fixes the postfix conection cache for 64bit platforms. Previously
the number of passed filed descriptors (nfds) would have been
calculeted too high, causing the fdrelease() of uninitialized junk.


# 1.93 03-Sep-2005 martin

minor knf tweak


# 1.92 30-May-2005 martin

branches: 1.92.2;
Close additional file descriptors if we set MSG_CTRUNC in a SCM_RIGHTS
message. From der Mouse in PR kern/30370.


# 1.91 29-May-2005 christos

- add const.
- remove unnecessary casts.
- add __UNCONST casts and mark them with XXXUNCONST as necessary.


Revision tags: yamt-km-base4 yamt-km-base3 netbsd-3-base kent-audio2-base
# 1.90 26-Feb-2005 perry

branches: 1.90.2;
nuke trailing whitespace


Revision tags: yamt-km-base2 yamt-km-base kent-audio1-beforemerge kent-audio1-base
# 1.89 30-Nov-2004 christos

branches: 1.89.4; 1.89.6;
Cloning cleanup:
1. make fileops const
2. add 2 new negative errno's to `officially' support the cloning hack:
- EDUPFD (used to overload ENODEV)
- EMOVEFD (used to overload ENXIO)
3. Created an fdclone() function to encapsulate the operations needed for
EMOVEFD, and made all cloners use it.
4. Centralize the local noop/badop fileops functions to:
fnullop_fcntl, fnullop_poll, fnullop_kqfilter, fbadop_stat


# 1.88 22-May-2004 jonathan

Eliminate several uses of `curproc' from the socket-layer code and from NFS.

Add a new explicit `struct proc *p' argument to socreate(), sosend().
Use that argument instead of curproc. Follow-on changes to pass that
argument to socreate(), sosend(), and (*so->so_send)() calls.
These changes reviewed and independently recoded by Matt Thomas.

Changes to soreceive() and (*dom->dom_exernalize() from Matt Thomas:
pass soreceive()'s struct uio* uio->uio_procp to unp_externalize().
Eliminate curproc from unp_externalize. Also, now soreceive() uses
its uio->uio_procp value, pass that same value downward to
((pr->pru_usrreq)() calls for consistency, instead of (struct proc * )0.

Similar changes in sys/nfs to eliminate (most) uses of curproc,
either via the req-> r_procp field of a struct nfsreq *req argument,
or by passing down new explicit struct proc * arguments.

Reviewed by: Matt Thomas, posted to tech-kern.
NB: The (*pr->pru_usrreq)() change should be tested on more (all!) protocols.


# 1.87 18-May-2004 ragge

Fix connect() "bug": If connect() is interrupted by a signal, the connection
attempt is terminated, so if a process needs frequent timer interrupts
it can't ever connect() to a machine far away.

Bug found by Erik Lundgren, bugfix (for the same problem) is similar to
the way FreeBSD solved the same problem.

As a side effect, the new connect() behaviour conformes to Posix.


Revision tags: netbsd-2-0-base
# 1.86 29-Nov-2003 matt

branches: 1.86.2;
Restore a change that made AF_LOCAL sockets block on connect(2) until
accepted. However, this time this behavor is not the default. Instead
it must enabled by using the LOCAL_CONNWAIT socket option on either the
connecting or accepting socket.


# 1.85 29-Nov-2003 perry

Revert a change that altered the semantics of AF_LOCAL sockets. Sadly
this made us API incompatible with other Unixes.


# 1.84 13-Nov-2003 chs

eliminate uvm_useracc() in favor of checking the return value of
copyin() or copyout().

uvm_useracc() tells us whether the mapping permissions allow access to
the desired part of an address space, and many callers assume that
this is the same as knowing whether an attempt to access that part of
the address space will succeed. however, access to user space can
fail for reasons other than insufficient permission, most notably that
paging in any non-resident data can fail due to i/o errors. most of
the callers of uvm_useracc() make the above incorrect assumption. the
rest are all misguided optimizations, which optimize for the case
where an operation will fail. we'd rather optimize for operations
succeeding, in which case we should just attempt the access and handle
failures due to insufficient permissions the same way we handle i/o
errors. since there appear to be no good uses of uvm_useracc(), we'll
just remove it.


# 1.83 04-Sep-2003 matt

Adapt to the new calling conventions of unp_connect2


# 1.82 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.81 29-Jun-2003 fvdl

branches: 1.81.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.


# 1.80 28-Jun-2003 darrenr

Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V


# 1.79 05-Apr-2003 christos

PR/21030: Naoto Shimazaki: fcntl to accepted socket does not work properly


# 1.78 26-Feb-2003 matt

Remove leftover MBUFTRACE asserts.


# 1.77 26-Feb-2003 drochner

deactivate MBUFTRACE related KASSERT()s in the !MBUFTRACE case


# 1.76 26-Feb-2003 matt

Add MBUFTRACE kernel option.
Do a little mbuf rework while here. Change all uses of MGET*(*, M_WAIT, *)
to m_get*(M_WAIT, *). These are not performance critical and making them
call m_get saves considerable space. Add m_clget analogue of MCLGET and
make corresponding change for M_WAIT uses.
Modify netinet, gem, fxp, tulip, nfs to support MBUFTRACE.
Begin to change netstat to use sysctl.


# 1.75 18-Jan-2003 thorpej

Merge the nathanw_sa branch.


Revision tags: nathanw_sa_before_merge fvdl_fs64_base gmcgarry_ctxsw_base gmcgarry_ucred_base nathanw_sa_base
# 1.74 26-Nov-2002 christos

si_ -> sel_ to avoid conflicts with siginfo.


# 1.73 25-Nov-2002 itojun

no need for error check after MEXTMALLOC - jdolecek


# 1.72 25-Nov-2002 itojun

MEXTMALLOC() can fail even if M_WAITOK, if arg is too big for malloc().


Revision tags: kqueue-aftermerge
# 1.71 23-Oct-2002 jdolecek

merge kqueue branch into -current

kqueue provides a stateful and efficient event notification framework
currently supported events include socket, file, directory, fifo,
pipe, tty and device changes, and monitoring of processes and signals

kqueue is supported by all writable filesystems in NetBSD tree
(with exception of Coda) and all device drivers supporting poll(2)

based on work done by Jonathan Lemon for FreeBSD
initial NetBSD port done by Luke Mewburn and Jason Thorpe


Revision tags: kqueue-beforemerge kqueue-base
# 1.70 04-Sep-2002 matt

Use the queue macros from <sys/queue.h> instead of referring to the queue
members directly. Use *_FOREACH whenever possible.


Revision tags: gehenna-devsw-base
# 1.69 31-May-2002 itojun

support setsockopt() with larger data (up to MCLBYTES).
From: Hitoshi Asaeda <Hitoshi.Asaeda@sophia.inria.fr>


Revision tags: netbsd-1-6-PATCH002-RELEASE netbsd-1-6-PATCH002 netbsd-1-6-PATCH002-RC4 netbsd-1-6-PATCH002-RC3 netbsd-1-6-PATCH002-RC2 netbsd-1-6-PATCH002-RC1 netbsd-1-6-PATCH001 netbsd-1-6-PATCH001-RELEASE netbsd-1-6-PATCH001-RC3 netbsd-1-6-PATCH001-RC2 netbsd-1-6-PATCH001-RC1 netbsd-1-6-RELEASE netbsd-1-6-RC3 netbsd-1-6-RC2 netbsd-1-6-RC1 netbsd-1-6-base eeh-devprop-base newlock-base ifpoll-base
# 1.68 11-Feb-2002 jdolecek

branches: 1.68.8;
Switch default for pipes to the faster John S. Dyson's implementation.
Old, socketpair-based ones are available with option PIPE_SOCKETPAIR.


Revision tags: thorpej-mips-cache-base
# 1.67 12-Nov-2001 lukem

add RCSIDs


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2
# 1.66 16-Sep-2001 wiz

branches: 1.66.2;
Spell 'occurred' with two 'r's.


Revision tags: post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.65 17-Jul-2001 jdolecek

branches: 1.65.2;
Expel MSG_COMPAT/COMPAT_OLDSOCK, make the COMPAT_43 wrappers
arrange things as needed. Unfortunately, the check in sockargs()
have to stay, since 4.3BSD bind(2), connect(2) and sendto(2) were
not versioned at the time :(

This code was tested to pass regression tests.


# 1.64 01-Jul-2001 matt

branches: 1.64.2;
Use consistent types for len. Limit sockarg length to reasonable values.


# 1.63 25-Jun-2001 jdolecek

Back off the sendit()/recvit() change, some have problems with it


# 1.62 25-Jun-2001 jdolecek

sys_connect(): fix the call to FILE_UNUSE() so that it's done on return, rather
than immediatelly after getsock() call


# 1.61 25-Jun-2001 jdolecek

Add 'kernsa' parameter for sendit()/recvit(); if nonzero, msg->msg_name
is supposed to point directly to struct mbuf or struct sockaddr in kernel
space as appropriate, rather than being a pointer to memory in userland.

This is to be used by compat/* when emulation needs to wrap
send{to|msg}(2)/recv{from|msg}(2) and modify the passed struct
sockaddr.


# 1.60 16-Jun-2001 jdolecek

Add port of high performance pipe implementation written by John S. Dyson
for FreeBSD project. Besides huge speed boost compared with socketpair-based
pipes, this implementation also uses pagable kernel memory instead of mbufs.

Significant differences to FreeBSD version:
* uses uvm_loan() facility for direct write
* async/SIGIO handling correct also for sync writer, async reader
* limits settable via sysctl, amountpipekva and nbigpipes available via sysctl
* pipes are unidirectional - this is enforced on file descriptor level
for now only, the code would be updated to take advantage of it
eventually
* uses lockmgr(9)-based locks instead of home brew variant
* scatter-gather write is handled correctly for direct write case, data
is transferred by PIPE_DIRECT_CHUNK bytes maximum, to avoid running out of kva

All FreeBSD/NetBSD specific code is within appropriate #ifdef, in preparation
to feed changes back to FreeBSD tree.

This pipe implementation is optional for now, add 'options NEW_PIPE'
to your kernel config to use it.


# 1.59 14-Jun-2001 thorpej

Fix a partial construction problem that can cause race conditions
between creation of a file descriptor and close(2) when using kernel
assisted threads. What we do is stick descriptors in the table, but
mark them as "larval". This causes essentially everything to treat
it as a non-existent descriptor, except for fdalloc(), which sees a
filled slot so that it won't (incorrectly) allocate it again. When
a descriptor is fully constructed, the code that has constructed it
marks it as "mature" (which actually clears the "larval" flag), and
things continue to work as normal.

While here, gather all the code that gets a descriptor from the table
into a fd_getfile() function, and call it, rather than having the
same (sometimes incorrect) code copied all over the place.


# 1.58 06-May-2001 manu

implement the recently introduced EMUL_BSD_ASYNCIO_PIPE emulation package
flag.

EMUL_BSD_ASYNCIO_PIPE notes that the emulated binaries expect the original
BSD pipe behavior for asynchronous I/O, which is to fire SIGIO on read() and
write(). OSes without this flag do not expect any SIGIO to be fired on
read() and write() for pipes, even when async I/O was requested. As far as
we know, the OSes that need EMUL_BSD_ASYNCIO_PIPE are NetBSD, OSF/1 and
Darwin.


Revision tags: thorpej_scsipi_beforemerge thorpej_scsipi_nbase thorpej_scsipi_base
# 1.57 27-Feb-2001 lukem

branches: 1.57.2;
convert to ANSI KNF


# 1.56 10-Dec-2000 fvdl

Make sobind() take a struct proc *. It already took curproc and
passed it down to the appropriate usrreq function, and this
allows usage for contexts that need to be explicitly different
from curproc (like in the NFS code when binding to a reserved port).


# 1.55 24-Nov-2000 jdolecek

define COMPAT_OLDSOCK unconditionally - the code is used virtually for all
emulations besides NetBSD, and this way it's LKM-safe


# 1.54 02-Aug-2000 thorpej

MALLOC()/FREE() are not to be used for variable sized allocations.


# 1.53 27-Jun-2000 mrg

remove include of <vm/vm.h>


Revision tags: netbsd-1-5-RELEASE netbsd-1-5-BETA2 netbsd-1-5-BETA netbsd-1-5-ALPHA2 netbsd-1-5-base minoura-xpg4dl-base
# 1.52 27-May-2000 sommerfeld

branches: 1.52.4;
Reduce use of curproc in several places:

- Change ktrace interface to pass in the current process, rather than
p->p_tracep, since the various ktr* function need curproc anyway.

- Add curproc as a parameter to mi_switch() since all callers had it
handy anyway.

- Add a second proc argument for inferior() since callers all had
curproc handy.

Also, miscellaneous cleanups in ktrace:

- ktrace now always uses file-based, rather than vnode-based I/O
(simplifies, increases type safety); eliminate KTRFLAG_FD & KTRFAC_FD.
Do non-blocking I/O, and yield a finite number of times when receiving
EWOULDBLOCK before giving up.

- move code duplicated between sys_fktrace and sys_ktrace into ktrace_common.

- simplify interface to ktrwrite()


# 1.51 30-Mar-2000 augustss

Get rid of register declarations.


# 1.50 23-Mar-2000 thorpej

Implement fdremove() which is used in place of all the code that
did the "fdp->fd_ofiles[fd] = 0" assignment; fdremove() make sure
the fd_freefiles hints stay in sync.

From OpenBSD.


Revision tags: chs-ubc2-newbase wrstuden-devbsize-19991221 wrstuden-devbsize-base comdex-fall-1999-base fvdl-softdep-base
# 1.49 05-Nov-1999 mycroft

branches: 1.49.2;
Fix recent bug in sys_accept(): we must remove the file descriptor from the
file descriptor table before freeing the file description.


# 1.48 30-Oct-1999 enami

back out unnecessary stylistic changes in recent changes, to keep coding
style closer to NKF.


# 1.47 27-Oct-1999 jdolecek

minor cleanup of previous - avoid goto and code duplication


# 1.46 27-Oct-1999 darrenr

patch from Greg A. Woods to fix panic problems with code that attempts to
recover from failures to accept a socket successfully. Problem suggested
by this:
> It would appear (from two "panic: closef: count < 0" failures in less
> than 12 hours) that Darren's fix to accept(2) for lost file descriptors
> isn't quite correct. His fix inserts a call to closef() to handle one
> of several possible error conditions. However everywhere else in the
> socket code in the same file where falloc() cleanup is necessary the
> function used is ffree().


Revision tags: chs-ubc2-base
# 1.45 01-Jul-1999 itojun

branches: 1.45.2; 1.45.4; 1.45.6;
IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.


# 1.44 01-Jul-1999 darrenr

fix sys_accept() to return EOPNOTSUPP for protocols which don't support
listen/accept (PR_LISTEN flag in protosw) and detect obvious faults in
parameters passed. It is still possible for the address used for copying
the socket information to become invalid between that check and the copyout
so close the connection's allocated fd if the copyout fails so that we can
return EFAULT without allocating an fd and the application not knowing about
it. Ideally we'd be able to queue the connection back up so a later accept
could retrieve it but unfortunately that's not possible.


# 1.43 05-May-1999 thorpej

Add "use counting" to file entries. When closing a file, and it's reference
count is 0, wait for use count to drain before finishing the close.

This is necessary in order for multiple processes to safely share file
descriptor tables.


# 1.42 30-Apr-1999 cgd

add checks for COMPAT_OSF1 in the appropriate places


Revision tags: netbsd-1-4-RELEASE netbsd-1-4-base
# 1.41 10-Feb-1999 kleink

branches: 1.41.2; 1.41.4; 1.41.6;
* Due to addition and use of socklen_t, make the socket option and address
arguments passed to accept(), bind(), connect(), getpeername(), getsockname(),
getsockopt(), recvfrom(), sendto() and sendmsg() unsigned, which also elimiates
a few casts.
* Reflect the (now) signedness of msg_iovlen, which necessiates the addition
of a few casts.


# 1.40 18-Dec-1998 drochner

solve the COMPAT_OLDSOCK/MSG_COMPAT problem differently:
The source files which need MSG_COMPAT define COMPAT_OLDSOCK.


Revision tags: kenh-if-detach-base
# 1.39 26-Nov-1998 mycroft

Revert the functional change in rev 1.38; permit a msg_iovlen of 0.
There are two reasons for this:
* We should be able to pass file descriptors without sending any data.
* We could send zero-length iovecs anyway (but we shouldn't have to do this).
Also, msg_iovlen is already a u_int, so delete a bunch of casts.


Revision tags: chs-ubc-base
# 1.38 04-Aug-1998 kleink

Per XNS Issue 5, calling recvmsg(2) or sendmsg(2) with an msg.msg_iovlen less
than or equal to 0 shall fail with EMSGSIZE; the latter condition was not being
checked for. Also, document the msg.msg_iovlen > {IOV_MAX} case.


# 1.37 04-Aug-1998 kleink

UIO_MAXIOV -> IOV_MAX


# 1.36 04-Aug-1998 perry

Abolition of bcopy, ovbcopy, bcmp, and bzero, phase one.
bcopy(x, y, z) -> memcpy(y, x, z)
ovbcopy(x, y, z) -> memmove(y, x, z)
bcmp(x, y, z) -> memcmp(x, y, z)
bzero(x, y) -> memset(x, 0, y)


# 1.35 03-Aug-1998 kleink

Fix two off-by-one bugs, both present in each recvmsg(2) and sendmsg(2):
* the first one would cause an unnecessary malloc() of iovec storage for
a msg_iovlen of UIO_SMALLIOV although the required amount of memory has
been allocated on the stack.
* the second one would cause a recvmsg() or sendmsg() with a msg_iovlen of
UIO_MAXIOV to fail with EMSGSIZE, which is also a violation of XNS5.


# 1.34 31-Jul-1998 perry

fix sizeofs so they comply with the KNF style guide. yes, it is pedantic.


Revision tags: eeh-paddr_t-base
# 1.33 29-Jul-1998 thorpej

branches: 1.33.2;
When checking for overflow in the residual count, test against SSIZE_MAX.
The read/write system calls return ssize_t because -1 is used to indicate
error, therefore the transfer size MUST be limited to SSIZE_MAX, otherwise
garbage can be returned to the user.

There is NO change from existing behavior here, only a more precise
definition of that the semantics are, except in the Alpha case, where
the full SSIZE_MAX transfer size can now be realized (ssize_t is 64-bit
on the Alpha).


# 1.32 18-Jul-1998 lukem

use AF_LOCAL instead of AF_UNIX


# 1.31 25-Jun-1998 thorpej

defopt KTRACE


# 1.30 25-Apr-1998 matt

Hook for 0-copy (or other optimized) sends and receives


# 1.29 01-Mar-1998 fvdl

Merge with Lite2 + local changes


# 1.28 06-Feb-1998 thorpej

When copying out multiple control messages, ensure that the next control
message is aligned. From David Borman <dab@bsdi.com>.


# 1.27 07-Jan-1998 thorpej

Make insertion and removal of sockets from the partial and incoming
connections queues O(C) rather than O(N).


# 1.26 07-Jan-1998 thorpej

Fix bug in recvit() that would cause recvmsg() to only receive one
control message, even if there were multiple control messages on
the queue. From Jean-Luc Richier <Jean-Luc.Richier@imag.fr>, in
bug report kern/4700.


Revision tags: netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base thorpej-signal-base marc-pcmcia-bp marc-pcmcia-base
# 1.25 26-Jun-1997 thorpej

branches: 1.25.8;
Use UCHAR_MAX rather than "255" when sanity-checking the length of a
sockaddr in sockargs().


# 1.24 26-Jun-1997 thorpej

In sockargs():
- Add a comment describing my feelings about this interface, in general.
- Remove the COMPAT_OLDSOCK length hack. Instead, if the socket argument
is too long to fit in an mbuf, allocate enough external storage to
hold it.
- If the socket argument is a sockaddr, don't allow the length to be
greater than 255, as that would overflow sa_len.
Many thanks to enami tsugutomo <enami@cv.sony.co.jp> for his sanity checking.


Revision tags: is-newarp-before-merge is-newarp-base
# 1.23 22-Dec-1996 cgd

* catch up with system call argument type fixups/const poisoning.
* Fix arguments to various copyin()/copyout() invocations, to avoid
gratuitous casts.
* Some KNF formatting fixes
* Change sockargs()'s second argument to be a const void *, to help
with dealing with the syscall argument type fixups/const poisoning.


# 1.22 14-Jun-1996 cgd

avoid unnecessary checks of m_get/MGET/etc.'s return values. When
they're called with M_WAIT, they are defined to never return NULL.


# 1.21 22-May-1996 mycroft

Pass a proc pointer down to the usrreq and pcbbind functions for PRU_ATTACH, PRU_BIND and
PRU_CONTROL. The usrreq interface really needs to be split up, but this will have to wait.
Remove SS_PRIV completely.


Revision tags: netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.20 17-May-1996 pk

branches: 1.20.4;
Don't touch retval[] in socketpair(); manual page says this system call
return 0 on success (PR#2428).


# 1.19 09-Feb-1996 christos

More proto fixes


# 1.18 04-Feb-1996 christos

First pass at prototyping


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.17 10-Oct-1995 mycroft

Add hooks for COMPAT_FREEBSD, from Noriyuki Soda.


# 1.16 07-Oct-1995 mycroft

Prefix names of system call implementation functions with `sys_'.


# 1.15 19-Sep-1995 thorpej

Make system calls conform to a standard prototype and bring those
prototypes into scope.


# 1.14 12-Aug-1995 mycroft

splnet --> splsoftnet


# 1.13 24-Jun-1995 christos

Extracted all of the compat_xxx routines, and created a library [libcompat]
for them. There are a few #ifdef COMPAT_XX remaining, but they are not easy
or worth eliminating (yet).


# 1.12 10-May-1995 christos

tty_tb.c: need to include ioctl_compat.h in order to compile.
sysv_shm.c: make shm_find_segment_by_shmid global so it can be used by
COMPAT_HPUX. There should be a better way...
rest: Add #ifdef COMPAT_HPUX where needed


# 1.11 05-Mar-1995 fvdl

Extended a couple of defines with "|| defined(COMPAT_LINUX)" to make
things compile without requiring COMPAT_43 and/or COMPAT_09.


# 1.10 30-Oct-1994 cgd

be more careful with types, also pull in headers where necessary.


# 1.9 20-Oct-1994 cgd

update for new syscall args description mechanism


Revision tags: netbsd-1-0-PATCH06 netbsd-1-0-PATCH05 netbsd-1-0-PATCH04 netbsd-1-0-PATCH03 netbsd-1-0-PATCH02 netbsd-1-0-PATCH1 netbsd-1-0-PATCH0 netbsd-1-0-RELEASE netbsd-1-0-base
# 1.8 29-Jun-1994 cgd

New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.7 04-May-1994 mycroft

Add return types where missing. Simplify some of the compat conditionals.
Include compat code if COMPAT_SUNOS with or without COMPAT_43.


# 1.6 18-Dec-1993 mycroft

Canonicalize all #includes.


Revision tags: magnum-base netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.5 17-Jul-1993 mycroft

branches: 1.5.4;
Finish moving struct definitions outside of function declarations.


# 1.4 27-Jun-1993 andrew

* ansifications
* Yuval Yarom's socket recv(2) fixes - access rights problems (see also
uipc_socket.c).


# 1.3 22-May-1993 cgd

add include of select.h if necessary for protos, or delete if extraneous


# 1.2 18-May-1993 cgd

make kernel select interface be one-stop shopping & clean it all up.


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


# 1.205 29-Jun-2022 riastradh

recvmmsg(2): More timespec validation.

Reported-by: syzbot+004ed2f264534bd27312@syzkaller.appspotmail.com
Reported-by: syzbot+6f9014c842c4e78df7bc@syzkaller.appspotmail.com


# 1.204 28-Jun-2022 riastradh

recvmmsg(2): Avoid arithmetic overflow in timeout calculations.

XXX This is not right -- it doesn't actually do anything to time
out...

Reported-by: syzbot+784209d76a94fcc6417b@syzkaller.appspotmail.com


# 1.203 27-Jun-2022 riastradh

sendmsg(2): Avoid buffer overrun in ktrace of invalid cmsghdr.

Reported-by: syzbot+efded148140b23425f5c@syzkaller.appspotmail.com


# 1.202 02-Oct-2021 thorpej

...and correct my terrible spelling.


# 1.201 02-Oct-2021 thorpej

- Strenghen the poll(2) fifo_inout test to ensure that once the reader
has read enough that exactly PIPE_BUF space is available that the FIFO
becomes writable again.
- When creating a FIFO, ensure that the receive low water mark is 1
(a FIFO must be readable when at least 1 byte is available); this
was already the case implicitly, but this makes it explicit.
- Similarly, set the send low water mark to PIPE_BUF to ensure that
the pipe is writable when at least PIPE_BUF bytes of space are available
in the send buffer. Without this change, the strengthened test case
above does not pass (the default send low water mark is larger than
PIPE_BUF; see soreserve()).
- Make the same low water mark changes to the PIPE_SOCKETPAIR case.


Revision tags: thorpej-i2c-spi-conf2-base thorpej-futex2-base thorpej-cfargs2-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base thorpej-i2c-spi-conf-base thorpej-cfargs-base thorpej-futex-base
# 1.200 23-May-2020 ad

Move proc_lock into the data segment. It was dynamically allocated because
at the time we had mutex_obj_alloc() but not __cacheline_aligned.


Revision tags: netbsd-9-2-RELEASE netbsd-9-1-RELEASE bouyer-xenpvh-base2 phil-wifi-20200421 bouyer-xenpvh-base1 phil-wifi-20200411 bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3 netbsd-9-0-RELEASE netbsd-9-0-RC2 ad-namecache-base2 ad-namecache-base1 ad-namecache-base netbsd-9-0-RC1 phil-wifi-20191119 netbsd-9-base phil-wifi-20190609 isaki-audio2-base pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126
# 1.199 12-Nov-2018 hannken

branches: 1.199.4;
sys_recvmmsg: don't defer an error that already gets returned.


# 1.198 07-Nov-2018 hannken

Don't defer errors from sendmmsg(). This matches the linux manpage.

Defer errors from recvmmsg() through so_rerror and
tests and return a deferred error on entry.

Ok: christos@


Revision tags: pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.197 03-Sep-2018 riastradh

Rename min/max -> uimin/uimax for better honesty.

These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER! Some subsystems have

#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)


# 1.196 01-Aug-2018 rjs

Add ioctl(2) handler for kernel part of sctp_peeloff().


# 1.195 31-Jul-2018 rjs

Add getsockopt2() syscall.


Revision tags: pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521
# 1.194 04-May-2018 christos

branches: 1.194.2;
define MBUFTYPES here.


# 1.193 03-May-2018 christos

Fix COMPAT_NETBSD32 cmsg handling:

1. alignment was wrong for > 1 message
2. macros were doing incorrect pointer comparisons, fortunately ending
the iteration early after the fists cmsg instead of crashing.
3. don't output 32 bit ktrace records for cmsg. 32 bit programs running
under emulation on 64 bit systems should produce 64 bit ktrace records
so that the native ktrace can handle the records; remove extra arguments
that are now not needed (the 32 bit msghdr).
4. output the correct type for cmsg trace records.
5. output all the cmsg records in traces instead of just the first one.

Welcome to 8.99.15 because of the argument removal.

XXX: Really all the code should be changed to use the CMSG_{FIRST,NXT}HDR
macros...


Revision tags: pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322
# 1.192 16-Mar-2018 christos

PR/53103: Timo Buhrmester: linux emulation of sendto(2) broken

The sockargs refactoring broke it, because sockargs only works with a user
address. Added an argument to sockargs to indicate where the address is
coming from. Welcome to 8.99.14.


Revision tags: pgoyette-compat-0315 pgoyette-compat-base
# 1.191 12-Feb-2018 maxv

branches: 1.191.2;
Add a KASSERT; we expect *from to be a single mbuf (not chained).


# 1.190 04-Jan-2018 christos

Add a new sockopt member to keep track of the actual size of the option
that should be returned to the caller in getsockopt(2).

(Tom Ivar Helbekkmo)


# 1.189 31-Dec-2017 christos

pass valsize for getsockopt like we do for setsockopt


# 1.188 26-Dec-2017 kamil

Refactor pipe1() and correct a bug in sys_pipe2() (SYS_pipe2)

sys_pipe2() returns two integers (values), the 2nd one is a copy of the 2nd
file descriptor that lands in fildes[2]. This is a side effect of reusing
the code for sys_pipe() (SYS_pipe) and not cleaning it up.

The first returned value is (on success) 0.

Introduced a small refactoring in pipe1() that it does not operate over
retval[], but on an array int[2]. A user sets retval[] for pipe() when
desired and needed.

This refactoring touches compat code: netbsd32, linux, linux32.

Before the changes on NetBSD/amd64:

$ ktruss -i ./a.out
[...]
15131 1 a.out pipe2(0x7f7fff2e62b8, 0) = 0, 4
[...]

After the changes:

$ ktruss -i ./a.out
[...]
782 1 a.out pipe2(0x7f7fff97e850, 0) = 0
[...]

There should not be a visible change for current users.

Sponsored by <The NetBSD Foundation>


Revision tags: tls-maxphys-base-20171202 nick-nhusb-base-20170825 perseant-stdc-iso10646-base
# 1.187 20-Jun-2017 christos

Change len type to be unsigned int for consistency with the input type.
Don't check for negative; it does not matter we clamp anyway. This
broke the compat32 getsockname() where an unitialized socklen_t ended
up randomly negative causing it to fail.


Revision tags: matt-nb8-mediatek-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204
# 1.186 03-Feb-2017 christos

branches: 1.186.6;
expose sendmsg_so and recvmsg_so.


# 1.185 02-Feb-2017 christos

expose copyout_sockname_sb


Revision tags: bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204
# 1.184 03-Dec-2016 christos

branches: 1.184.2;
Add missing ktrkuser


Revision tags: pgoyette-localcount-20161104 nick-nhusb-base-20161004 localcount-20160914
# 1.183 13-Sep-2016 martin

Make the ktrace record written by do_sys_sendmsg/do_sys_recvmsg overridable
by the caller. Use this in compat_netbsd32 to log the 32bit version, so
the 32bit userland kdump is happy.


Revision tags: pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.182 07-Jul-2016 msaitoh

branches: 1.182.2;
KNF. Remove extra spaces. No functional change.


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.181 01-Nov-2015 christos

Don't overwrite the user iov pointer in sendmmsg. Make the send and receive
code look the same.


Revision tags: nick-nhusb-base-20150921
# 1.180 24-Aug-2015 pooka

sprinkle _KERNEL_OPT


# 1.179 22-Jul-2015 maxv

Memory leak. Triggerable from an unprivileged user via COMPAT_43.


Revision tags: nick-nhusb-base-20150606
# 1.178 09-May-2015 rtr

change sosend() to accept sockaddr * instead of mbuf * for nam.

bump to 7.99.16


# 1.177 02-May-2015 rtr

make connect syscall use sockaddr_big and modify pr_{send,connect}
nam parameter type from buf * to sockaddr *.

final commit for parameter type changes to protocol user requests

* bump kernel version to 7.99.15 for parameter type changes to pr_{send,connect}


# 1.176 24-Apr-2015 rtr

make accept, getsockname and getpeername syscalls use sockaddr_big and modify
pr_{accept,sockname,peername} nam parameter type from mbuf * to sockaddr *.

* retained use of mbuftypes[MT_SONAME] for now.
* bump to netbsd version 7.99.12 for parameter type change.

patch posted to tech-net@ 2015/04/19


Revision tags: nick-nhusb-base-20150406
# 1.175 03-Apr-2015 rtr

* change pr_bind to accept struct sockaddr * instead of struct mbuf *
* update protocol bind implementations to use/expect sockaddr *
instead of mbuf *
* introduce sockaddr_big struct for storage of addr data passed via
sys_bind; sockaddr_big is of sufficient size and alignment to
accommodate all addr data sizes received.
* modify sys_bind to allocate sockaddr_big instead of using an mbuf.
* bump kernel version to 7.99.9 for change to pr_bind() parameter type.

Patch posted to tech-net@
http://mail-index.netbsd.org/tech-net/2015/03/15/msg005004.html

The choice to use a new structure sockaddr_big has been retained since
changing sockaddr_storage size would lead to unnecessary ABI change. The
use of the new structure does not preclude future work that increases
the size of sockaddr_storage and at that time sockaddr_big may be
trivially replaced.

Tested by mrg@ and myself, discussed with rmind@, posted to tech-net@


# 1.174 06-Mar-2015 rtr

Return EINVAL if namelen isn't large enough to encompass the expected
members of sockaddr structures. i.e. sa_len and sa_family.

Discussed with and patch by christos@


Revision tags: nick-nhusb-base
# 1.173 05-Sep-2014 matt

branches: 1.173.2;
Try not to use f_data, use f_{vnode,socket,pipe,mqueue,kqueue,ksem} to get
a correctly typed pointer.


Revision tags: netbsd-7-0-RC2 netbsd-7-0-RC1 netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.172 09-Aug-2014 rtr

branches: 1.172.2;
split PRU_CONNECT2 & PRU_PURGEIF function out of pr_generic() usrreq
switches and put into separate functions

- always KASSERT(solocked(so)) even if not implemented
(for PRU_CONNECT2 only)

- replace calls to pr_generic() with req = PRU_CONNECT2 with calls to
pr_connect2()

- replace calls to pr_generic() with req = PRU_PURGEIF with calls to
pr_purgeif()

put common code from unp_connect2() (used by unp_connect() into
unp_connect1() and call out to it when needed

patch only briefly reviewed by rmind@


# 1.171 09-Jul-2014 rtr

* split PRU_PEERADDR and PRU_SOCKADDR function out of pr_generic()
usrreq switches and put into separate functions
xxx_{peer,sock}addr(struct socket *, struct mbuf *).

- KASSERT(solocked(so)) always in new functions even if request
is not implemented

- KASSERT(pcb != NULL) and KASSERT(nam) if the request is
implemented and not for tcp.

* for tcp roll #ifdef KPROF and #ifdef DEBUG code from tcp_usrreq() into
easier to cut & paste functions tcp_debug_capture() and
tcp_debug_trace()

- functions provided by rmind
- remaining use of PRU_{PEER,SOCK}ADDR #define to be removed in a
future commit.

* rename netbt functions to permit consistency of pru function names
(as has been done with other requests already split out).

- l2cap_{peer,sock}addr() -> l2cap_{peer,sock}_addr_pcb()
- rfcomm_{peer,sock}addr() -> rfcomm_{peer,sock}_addr_pcb()
- sco_{peer,sock}addr() -> sco_{peer,sock}_addr_pcb()

* split/refactor do_sys_getsockname(lwp, fd, which, nam) into
two functions do_sys_get{peer,sock}name(fd, nam).

- move PRU_PEERADDR handling into do_sys_getpeername() from
do_sys_getsockname()
- have svr4_stream directly call do_sys_get{sock,peer}name()
respectively instead of providing `which' & fix a DPRINTF string
that incorrectly wrote "getpeername" when it meant "getsockname"
- fix sys_getpeername() and sys_getsockname() to call
do_sys_get{sock,peer}name() without `which' and `lwp' & adjust
comments
- bump kernel version for removal of lwp & which parameters from
do_sys_getsockname()

note: future cleanup to remove struct mbuf * abuse in
xxx_{peer,sock}name()
still to come, not done in this commit since it is easier to do post
split.

patch reviewed by rmind

welcome to 6.99.47


Revision tags: rmind-smpnet-nbase rmind-smpnet-base
# 1.170 18-May-2014 rmind

Add struct pr_usrreqs with a pr_generic function and prepare for the
dismantling of pr_usrreq in the protocols; no functional change intended.
PRU_ATTACH/PRU_DETACH changes will follow soon.

Bump for struct protosw. Welcome to 6.99.62!


# 1.169 17-May-2014 rmind

- fsocreate: set SS_NBIO before the file descriptor is affixed as there is
a theoretical race condition (hard to trigger, though); remove the LWP
parameter and clean up the code a little.
- Sprinkle few comments.
- Remove M_SOOPTS while here.


# 1.168 17-May-2014 rmind

makesocket: set SS_NBIO slightly earlier.


# 1.167 17-May-2014 rmind

Remove trailing whitespaces, wrap long lines, minor KNF; no functional changes.


Revision tags: yamt-pagecache-base9
# 1.166 07-Apr-2014 seanb

Fix a case where an erroneous EAGAIN was returned out of recvmmsg.
This occured when some, but not all of the mmsg array members
were filled with data from a non-blocking socket.
PR kern/48725


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.165 09-Oct-2013 christos

branches: 1.165.2;
delete extra m_len initialization.


# 1.164 09-Oct-2013 christos

PR/48292: Justin Cormack: paccept creates sockets that cannot be made blocking
Reset the socket flags not just the file flags for non-blocking I/O.
XXX: pullup 6


# 1.163 08-Oct-2013 christos

PR/47591: Michael Plass: If the unix socket is closed before accept,
unp->unp_conn will be NULL in PRU_ACCEPT, as called from
sys_accept->so_accept. This will cause the usrreq to return with
no error, leaving the mbuf gotten from m_get() with an uninitialized
length, containing junk from a previous call. Initialize m_len to
be 0 to handle this case. This is yet another reason why Beverly's
idea of setting m_len = 0 in m_get() makes a lot of sense. Arguably
this could be an error, since the data we return now has 0 family
and length.


# 1.162 29-Aug-2013 rmind

Remove SS_ISCONFIRMING, it is unused and TP4 will not come back.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.161 03-Jun-2013 christos

branches: 1.161.2;
use the proper name for kdump pretty-printing.


Revision tags: agc-symver-base
# 1.160 14-Feb-2013 christos

PR/47569: Valery Ushakov: SOCK_NONBLOCK does not work because it does not
set SS_NBIO.
XXX: there are too many flags that mean the same thing in too many places,
and too many flags that mean the same thing and are different.


# 1.159 14-Feb-2013 riastradh

Fix some screw cases in cmsg file descriptor passing.

- Don't leave garbage in the control buffer if allocating file
descriptors fails in unp_externalize.

- Scrub the space between CMSG_LEN and CMSG_SPACE to avoid kernel
memory disclosure in unp_externalize.

- Don't read past cmsg_len when closing file descriptors that
couldn't get delivered, in free_rights.

ok christos


Revision tags: yamt-pagecache-base8
# 1.158 29-Dec-2012 mlelstv

The sanity check prevented messages that carry only ancillary data.


# 1.157 29-Dec-2012 mlelstv

If an untraced process sleeps in recvmsg/sendmsg, the syscall does not
allocate an iov structure for ktrace. When tracing is then enabled
and the process wakes up, it crashes the kernel.

Undo the last commit which introduced this error path.

Avoid the mentioned kmem_alloc assertion by adding a sanity check analog
to similar code in sys_generic.c for I/O on file handles instead of
sockets.

This also causes the syscall to return EMSGSIZE if the msg_iovlen member
of the msg structure is less than or equal to 0, as defined in
recvmsg(2)/sendmsg(2).


Revision tags: yamt-pagecache-base7 yamt-pagecache-base6
# 1.156 17-Jul-2012 njoly

branches: 1.156.2;
Avoid kmem_alloc KASSERT for 0 byte allocation, when tracing processes
that use empty messages with sendmsg/recvmsg.


# 1.155 22-Jun-2012 christos

Add {send,recv}mmsg from Linux


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2 netbsd-6-base
# 1.154 25-Jan-2012 christos

branches: 1.154.2;
revert atomics for so_options since it is a short.


# 1.153 25-Jan-2012 christos

need <sys/atomic.h>


# 1.152 25-Jan-2012 christos

Add locking, requested by yamt. Note that locking is not used everywhere
for these.


# 1.151 25-Jan-2012 christos

As discussed in tech-kern, provide the means to prevent delivery of SIGPIPE
on EPIPE for all file descriptor types:

- provide O_NOSIGPIPE for open,kqueue1,pipe2,dup3,fcntl(F_{G,S}ETFL) [NetBSD]
- provide SOCK_NOSIGPIPE for socket,socketpair [NetBSD]
- provide SO_NOSIGPIPE for {g,s}seckopt [NetBSD/FreeBSD/MacOSX]
- provide F_{G,S}ETNOSIGPIPE for fcntl [MacOSX]


# 1.150 21-Dec-2011 christos

simplify expression


# 1.149 20-Dec-2011 christos

- Eliminate so_nbio and turn it into a bit SS_NBIO in so_state.
- Introduce MSG_NBIO so that we can turn non blocking i/o on a per call basis
- Use MSG_NBIO to fix the XXX: multi-threaded issues on the fifo sockets.
- Don't set SO_CANTRCVMORE, if we were interrupted (perhaps do it for all
errors?).


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2
# 1.148 04-Nov-2011 christos

branches: 1.148.4;
Fix error I introduced in previous commit that caused asymmetric connects
when SOCK_NONBLOCK or SOCK_CLOEXEC was specified. Factor out common code
and simplify error return.


Revision tags: yamt-pagecache-base
# 1.147 21-Sep-2011 christos

branches: 1.147.2;
Put the mbuf type in the ktrace record so that we know how to decode it
in userland.


# 1.146 27-Jul-2011 uebayasi

These don't need uvm/uvm_extern.h.


# 1.145 15-Jul-2011 christos

fail with EINVAL if flags not are not O_CLOEXEC|O_NONBLOCK in pipe2(2) and
dup3(2)


# 1.144 26-Jun-2011 christos

* Arrange for interfaces that create new file descriptors to be able to
set close-on-exec on creation (http://udrepper.livejournal.com/20407.html).

- Add F_DUPFD_CLOEXEC to fcntl(2).
- Add MSG_CMSG_CLOEXEC to recvmsg(2) for unix file descriptor passing.
- Add dup3(2) syscall with a flags argument for O_CLOEXEC, O_NONBLOCK.
- Add pipe2(2) syscall with a flags argument for O_CLOEXEC, O_NONBLOCK.
- Add flags SOCK_CLOEXEC, SOCK_NONBLOCK to the socket type parameter
for socket(2) and socketpair(2).
- Add new paccept(2) syscall that takes an additional sigset_t to alter
the sigmask temporarily and a flags argument to set SOCK_CLOEXEC,
SOCK_NONBLOCK.
- Add new mode character 'e' to fopen(3) and popen(3) to open pipes
and file descriptors for close on exec.
- Add new kqueue1(2) syscall with a new flags argument to open the
kqueue file descriptor with O_CLOEXEC, O_NONBLOCK.

* Fix the system calls that take socklen_t arguments to actually do so.

* Don't include userland header files (signal.h) from system header files
(rump_syscallargs.h).

* Bump libc version for the new syscalls.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.143 24-Apr-2011 rmind

- Replace few malloc(9) uses with kmem(9).
- Rename buf_malloc() to buf_alloc(), fix comments.
- Remove some unnecessary inclusions.


# 1.142 10-Apr-2011 christos

- Add O_CLOEXEC to open(2)
- Add fd_set_exclose() to encapsulate uses of FIO{,N}CLEX, O_CLOEXEC, F{G,S}ETFD
- Add a pipe1() function to allow passing flags to the fd's that pipe(2)
opens to ease implementation of linux pipe2(2)
- Factor out fp handling code from open(2) and fhopen(2)


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base matt-mips64-premerge-20101231 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10 uebayasi-xip-base1
# 1.141 23-Apr-2010 rmind

branches: 1.141.2;
Replace M_IOV and some malloc(9)s with kmem(9), and while there:
- Fix invalid free (M_TEMP vs M_IOV) in do_sys_recvmsg(), spotted by jakllsch@.
Also, same fix in osf1_sys_sendmsg_xopen().
- Fix attempt to free non-allocated memory in error path in netbsd32___getfh30().
- Plug a memory leak in compat_43_netbsd32_orecvmsg().


Revision tags: yamt-nfs-mp-base9 uebayasi-xip-base
# 1.140 21-Jan-2010 pgoyette

branches: 1.140.2; 1.140.4;
Remove unnecessary call to kauth_cred_free().

This resolves an occassional crash I'd been experiencing as reported on
current-users@

Fix suggested by and OK elad@


# 1.139 29-Dec-2009 elad

Add credentials to to sockets.

We don't need any deferred free etc. because we no longer free the
credentials in interrupt context.

Tons of help from matt@, thanks!


# 1.138 20-Dec-2009 dsl

If a multithreaded app closes an fd while another thread is blocked in
read/write/accept, then the expectation is that the blocked thread will
exit and the close complete.
Since only one fd is affected, but many fd can refer to the same file,
the close code can only request the fs code unblock with ERESTART.
Fixed for pipes and sockets, ERESTART will only be generated after such
a close - so there should be no change for other programs.
Also rename fo_abort() to fo_restart() (this used to be fo_drain()).
Fixes PR/26567


Revision tags: matt-premerge-20091211
# 1.137 09-Dec-2009 dsl

Rename fo_drain() to fo_abort(), 'drain' is used to mean 'wait for output
do drain' in many places, whereas fo_drain() was called in order to force
blocking read()/write() etc calls to return to userspace so that a close()
call from a different thread can complete.
In the sockets code comment out the broken code in the inner function,
it was being called from compat code.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 jym-xensuspend-nbase yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.136 04-Apr-2009 ad

Add fileops::fo_drain(), to be called from fd_close() when there is more
than one active reference to a file descriptor. It should dislodge threads
sleeping while holding a reference to the descriptor. Implemented only for
sockets but should be extended to pipes, fifos, etc.

Fixes the case of a multithreaded process doing something like the
following, which would have hung until the process got a signal.

thr0 accept(fd, ...)
thr1 close(fd)


Revision tags: nick-hppapmap-base2
# 1.135 21-Jan-2009 yamt

branches: 1.135.2;
restore the pre socket locking patch signal behaviour.
this fixes a busy-loop in nfs_connect.


Revision tags: netbsd-5-0-RC1 haad-dm-base2 haad-nbase2 ad-audiomp2-base netbsd-5-base matt-mips64-base2 haad-dm-base1 wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 haad-dm-base mjf-devfs2-base
# 1.134 06-Aug-2008 plunky

branches: 1.134.2; 1.134.4;
Convert socket options code to use a sockopt structure
instead of laying everything into an mbuf.

approved by core


Revision tags: simonb-wapbl-nbase simonb-wapbl-base
# 1.133 24-Jun-2008 ad

branches: 1.133.2;
Nothing uses getsock/getvnode any more.


Revision tags: wrstuden-revivesa-base-1 yamt-pf42-base4 yamt-pf42-base3 wrstuden-revivesa-base
# 1.132 30-May-2008 rmind

branches: 1.132.2;
do_sys_accept: release the reference to sock in few error paths.
Should fix PR/38790, report and test-case by Nicolas Joly.


Revision tags: hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2
# 1.131 28-Apr-2008 martin

branches: 1.131.2;
Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.130 24-Apr-2008 ad

branches: 1.130.2;
Network protocol interrupts can now block on locks, so merge the globals
proclist_mutex and proclist_lock into a single adaptive mutex (proc_lock).
Implications:

- Inspecting process state requires thread context, so signals can no longer
be sent from a hardware interrupt handler. Signal activity must be
deferred to a soft interrupt or kthread.

- As the proc state locking is simplified, it's now safe to take exit()
and wait() out from under kernel_lock.

- The system spends less time at IPL_SCHED, and there is less lock activity.


# 1.129 24-Apr-2008 ad

Merge the socket locking patch:

- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.

With much feedback from matt@ and plunky@.


Revision tags: yamt-pf42-baseX yamt-pf42-base ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14
# 1.128 21-Mar-2008 ad

branches: 1.128.2;
Catch up with descriptor handling changes. See kern_descrip.c revision
1.173 for details.


Revision tags: keiichi-mipv6-nbase nick-net80211-sync-base keiichi-mipv6-base matt-armv6-nbase mjf-devfs-base hpcarm-cleanup-base
# 1.127 06-Feb-2008 ad

branches: 1.127.6;
Don't lock the socket to set/clear FNONBLOCK. Just set it atomically.


Revision tags: vmlocking2-base3 bouyer-xeni386-nbase bouyer-xeni386-base matt-armv6-base
# 1.126 26-Dec-2007 ad

Merge more changes from vmlocking2, mainly:

- Locking improvements.
- Use pool_cache for more items.


# 1.125 20-Dec-2007 dsl

Convert all the system call entry points from:
int foo(struct lwp *l, void *v, register_t *retval)
to:
int foo(struct lwp *l, const struct foo_args *uap, register_t *retval)
Fixup compat code to not write into 'uap' and (in some cases) to actually
pass a correctly formatted 'uap' structure with the right name to the
next routine.
A few 'compat' routines that just call standard ones have been deleted.
All the 'compat' code compiles (along with the kernels required to test
build it).
98% done by automated scripts.


# 1.124 16-Dec-2007 elad

Make solisten() take an lwp pointer like the rest, so it can be passed down
to pr_usrreq.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 vmlocking-nbase jmcneill-pm-base reinoud-bufcleanup-base
# 1.123 24-Nov-2007 dyoung

branches: 1.123.2; 1.123.6;
Pass the mbuf type (e.g., MT_SONAME, MT_SOOPTS) as the second
argument to getsockmbuf().


Revision tags: jmcneill-base bouyer-xenamd64-base2 yamt-x86pmap-base4 bouyer-xenamd64-base yamt-x86pmap-base3 yamt-x86pmap-base2 vmlocking-base
# 1.122 05-Oct-2007 dyoung

branches: 1.122.4;
Use getsombuf().


Revision tags: yamt-x86pmap-base
# 1.121 19-Sep-2007 christos

branches: 1.121.2;
minor nits; no code change.


# 1.120 19-Sep-2007 dyoung

1) Introduce a new socket option, (SOL_SOCKET, SO_NOHEADER), that
tells a socket that it should both add a protocol header to tx'd
datagrams and remove the header from rx'd datagrams:

int onoff = 1, s = socket(...);
setsockopt(s, SOL_SOCKET, SO_NOHEADER, &onoff);

2) Add an implementation of (SOL_SOCKET, SO_NOHEADER) for raw IPv4
sockets.

3) Reorganize the protocols' pr_ctloutput implementations a bit.
Consistently return ENOPROTOOPT when an option is unsupported,
and EINVAL if a supported option's arguments are incorrect.
Reorganize the flow of code so that it's more clear how/when
options are passed down the stack until they are handled.

Shorten some pr_ctloutput staircases for readability.

4) Extract common mbuf code into subroutines, add new sockaddr
methods, and introduce a new subroutine, fsocreate(), for reuse
later; use it first in sys_socket():

struct mbuf *m_getsombuf(struct socket *so)

Create an mbuf and make its owner the socket `so'.

struct mbuf *m_intopt(struct socket *so, int val)

Create an mbuf, make its owner the socket `so', put the
int `val' into it, and set its length to sizeof(int).


int fsocreate(..., int *fd)

Create a socket, a la socreate(9), put the socket into the
given LWP's descriptor table, return the descriptor at `fd'
on success.

void *sockaddr_addr(struct sockaddr *sa, socklen_t *slenp)
const void *sockaddr_const_addr(const struct sockaddr *sa, socklen_t *slenp)

Extract a pointer to the address part of a sockaddr. Write
the length of the address part at `slenp', if `slenp' is
not NULL.

socklen_t sockaddr_getlen(const struct sockaddr *sa)

Return the length of a sockaddr. This just evaluates to
sa->sa_len. I only add this for consistency with code that
appears in a portable userland library that I am going to
import.

const struct sockaddr *sockaddr_any(const struct sockaddr *sa)

Return the "don't care" sockaddr in the same family as
`sa'. This is the address a client should sobind(9) if it
does not care the source address and, if applicable, the
port et cetera that it uses.

const void *sockaddr_anyaddr(const struct sockaddr *sa, socklen_t *slenp)

Return the "don't care" sockaddr in the same family as
`sa'. This is the address a client should sobind(9) if it
does not care the source address and, if applicable, the
port et cetera that it uses.


Revision tags: nick-csl-alignment-base5
# 1.119 06-Sep-2007 rmind

do_sys_sendmsg: Plug a possible leak.
From CID: 4535


# 1.118 01-Sep-2007 dsl

Don't error calls to copy socket addresses to userspace when the application
has provided a non-null buffer pointer and a zero length.


# 1.117 27-Aug-2007 dsl

ktrace socket control structures (ie msghdr, address etc) using ktrkuser().


# 1.116 15-Aug-2007 ad

branches: 1.116.2;
Changes to make ktrace LKM friendly and reduce ifdef KTRACE. Proposed
on tech-kern.


Revision tags: matt-mips64-base nick-csl-alignment-base
# 1.115 15-Jul-2007 dsl

branches: 1.115.2; 1.115.6;
Remove non-user flags (especially MSG_IOVUSRSPACE) from mp->msg_flags
before passing to so_receive.
This may (or may not) have any effect...


Revision tags: mjf-ufs-trans-base
# 1.114 01-Jul-2007 dsl

Check for SOL_SOCKET when checking for SCM_RIGHTS.


# 1.113 24-Jun-2007 dsl

Split sys_getpeername() and sys_getsockname() so they can be called when the
'name' is wanted in kernel code.
Similarly split sys_accept() and change the split in recvmsg() so that it
is useful to the compat functions, recvit() is removed and replaced by
do_sys_recvmsg().
Factor out the code that writes socked names to userspace (from mbuf) to
avoid replicated code.
Extract the code that writes socket 'control' (CMSG) data out to userspace,
being more careful about the 'fd' that may exist inside SCM_RIGHTS msgs.
(they still get lost if some of the latter copyout calls fail).
Since these are new functions, and old LKMs will fail to load.


# 1.112 02-Jun-2007 enami

- Fix obvious typos so that sendto(2) works.
- Wrap lines again.


# 1.111 01-Jun-2007 dsl

Split sys_bind() and sys_connect() so that compat code can use common code
once the 'address' has been copied into an mbuf.
Add extra flags for 'struct msghdr.msg_flags' to indicate that the address
and control are already in mbufs, and that the uio structure is in userspace
for sending data, rename sendit() to do_sys_sendmsg() to ensure no old code
passes in random flags.
Changes to compat code to use new functions - removing some stackgap use.
Fix a 'use after free' in compat_43_sys_recvmsg.
I ***THINK*** the code that converts 'cmsg' formatted data is borked!
svr4_stream.c ought to be generated from svr4_32_stream.c during the build.


Revision tags: yamt-idlelwp-base8
# 1.110 13-May-2007 dsl

Fallout from caddr_t deletion - remove a load of redundant (void *) casts.


# 1.109 18-Apr-2007 yamt

sys_accept: fix usecount botch and double soclose in rev.1.108.


# 1.108 15-Apr-2007 yamt

sys_accept: don't leak a socket on error.


Revision tags: thorpej-atomic-base
# 1.107 04-Mar-2007 christos

branches: 1.107.2; 1.107.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


Revision tags: ad-audiomp-base post-newlock2-merge
# 1.106 09-Feb-2007 ad

branches: 1.106.2;
Merge newlock2 to head.


Revision tags: newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base netbsd-4-base
# 1.105 01-Nov-2006 yamt

branches: 1.105.2;
remove some __unused from function parameters.


# 1.104 23-Oct-2006 elad

PR/34873: Ryo Shimizu: sendmsg() can cause kernel panic

Patch applied, thanks!


Revision tags: yamt-splraiseipl-base2
# 1.103 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


Revision tags: yamt-splraiseipl-base yamt-pdpolicy-base9 yamt-pdpolicy-base8 rpaulo-netinet-merge-pcb-base
# 1.102 22-Aug-2006 seanb

branches: 1.102.2; 1.102.4;
Don't leave a dangling socket (no associated struct file) if
user supplied a bad name or anamelen parameter to accept(2).
If bad paramaters were suplied and a copyout() failed, the
struct file was cleaned up but not the associated socket. This
could leave sockets in CLOSE_WAIT that could never be closed.


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base7
# 1.101 23-Jul-2006 ad

branches: 1.101.2;
Use the LWP cached credentials where sane.


# 1.100 26-Jun-2006 mrg

version the socket(2) syscall. for compat30 socket, we use
EPROTONOSUPPORT instead of EAFNOSUPPORT.

from pavel@ with a little bit of clean up from myself.

XXX: netbsd32 (and perhaps other emulations) should be able
XXX: to call the standard socket calls for this i think, but
XXX: revisit this at another time.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base yamt-pdpolicy-base5 chap-midi-base simonb-timecounters-base
# 1.99 16-May-2006 christos

branches: 1.99.4;
Don't set mature an fd that has been ffree'd


Revision tags: elad-kernelauth-base
# 1.98 11-May-2006 christos

Add MSG_NOSIGNAL (from FreeBSD)


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2 yamt-pdpolicy-base
# 1.97 01-Mar-2006 yamt

branches: 1.97.2; 1.97.4; 1.97.6;
merge yamt-uio_vmspace branch.

- use vmspace rather than proc or lwp where appropriate.
the latter is more natural to specify an address space.
(and less likely to be abused for random purposes.)
- fix a swdmover race.


Revision tags: yamt-uio_vmspace-base5
# 1.96 26-Dec-2005 perry

branches: 1.96.2; 1.96.4; 1.96.6;
u_intN_t -> uintN_t


# 1.95 11-Dec-2005 christos

merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base ktrace-lwp-base
# 1.94 03-Sep-2005 martin

In adjust_rights() Use CMSG_SPACE() to calculate the number of
filedescriptors passed in this message - the counterpart in
unp_externalize does this as well.

Note that CMSG_SPACE(0) does not make sense, since it does not invoke
the alignment magic - so use CMSG_SPACE(sizeof(int)) and adjust the
calculated total later.

This fixes the postfix conection cache for 64bit platforms. Previously
the number of passed filed descriptors (nfds) would have been
calculeted too high, causing the fdrelease() of uninitialized junk.


# 1.93 03-Sep-2005 martin

minor knf tweak


# 1.92 30-May-2005 martin

branches: 1.92.2;
Close additional file descriptors if we set MSG_CTRUNC in a SCM_RIGHTS
message. From der Mouse in PR kern/30370.


# 1.91 29-May-2005 christos

- add const.
- remove unnecessary casts.
- add __UNCONST casts and mark them with XXXUNCONST as necessary.


Revision tags: yamt-km-base4 yamt-km-base3 netbsd-3-base kent-audio2-base
# 1.90 26-Feb-2005 perry

branches: 1.90.2;
nuke trailing whitespace


Revision tags: yamt-km-base2 yamt-km-base kent-audio1-beforemerge kent-audio1-base
# 1.89 30-Nov-2004 christos

branches: 1.89.4; 1.89.6;
Cloning cleanup:
1. make fileops const
2. add 2 new negative errno's to `officially' support the cloning hack:
- EDUPFD (used to overload ENODEV)
- EMOVEFD (used to overload ENXIO)
3. Created an fdclone() function to encapsulate the operations needed for
EMOVEFD, and made all cloners use it.
4. Centralize the local noop/badop fileops functions to:
fnullop_fcntl, fnullop_poll, fnullop_kqfilter, fbadop_stat


# 1.88 22-May-2004 jonathan

Eliminate several uses of `curproc' from the socket-layer code and from NFS.

Add a new explicit `struct proc *p' argument to socreate(), sosend().
Use that argument instead of curproc. Follow-on changes to pass that
argument to socreate(), sosend(), and (*so->so_send)() calls.
These changes reviewed and independently recoded by Matt Thomas.

Changes to soreceive() and (*dom->dom_exernalize() from Matt Thomas:
pass soreceive()'s struct uio* uio->uio_procp to unp_externalize().
Eliminate curproc from unp_externalize. Also, now soreceive() uses
its uio->uio_procp value, pass that same value downward to
((pr->pru_usrreq)() calls for consistency, instead of (struct proc * )0.

Similar changes in sys/nfs to eliminate (most) uses of curproc,
either via the req-> r_procp field of a struct nfsreq *req argument,
or by passing down new explicit struct proc * arguments.

Reviewed by: Matt Thomas, posted to tech-kern.
NB: The (*pr->pru_usrreq)() change should be tested on more (all!) protocols.


# 1.87 18-May-2004 ragge

Fix connect() "bug": If connect() is interrupted by a signal, the connection
attempt is terminated, so if a process needs frequent timer interrupts
it can't ever connect() to a machine far away.

Bug found by Erik Lundgren, bugfix (for the same problem) is similar to
the way FreeBSD solved the same problem.

As a side effect, the new connect() behaviour conformes to Posix.


Revision tags: netbsd-2-0-base
# 1.86 29-Nov-2003 matt

branches: 1.86.2;
Restore a change that made AF_LOCAL sockets block on connect(2) until
accepted. However, this time this behavor is not the default. Instead
it must enabled by using the LOCAL_CONNWAIT socket option on either the
connecting or accepting socket.


# 1.85 29-Nov-2003 perry

Revert a change that altered the semantics of AF_LOCAL sockets. Sadly
this made us API incompatible with other Unixes.


# 1.84 13-Nov-2003 chs

eliminate uvm_useracc() in favor of checking the return value of
copyin() or copyout().

uvm_useracc() tells us whether the mapping permissions allow access to
the desired part of an address space, and many callers assume that
this is the same as knowing whether an attempt to access that part of
the address space will succeed. however, access to user space can
fail for reasons other than insufficient permission, most notably that
paging in any non-resident data can fail due to i/o errors. most of
the callers of uvm_useracc() make the above incorrect assumption. the
rest are all misguided optimizations, which optimize for the case
where an operation will fail. we'd rather optimize for operations
succeeding, in which case we should just attempt the access and handle
failures due to insufficient permissions the same way we handle i/o
errors. since there appear to be no good uses of uvm_useracc(), we'll
just remove it.


# 1.83 04-Sep-2003 matt

Adapt to the new calling conventions of unp_connect2


# 1.82 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.81 29-Jun-2003 fvdl

branches: 1.81.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.


# 1.80 28-Jun-2003 darrenr

Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V


# 1.79 05-Apr-2003 christos

PR/21030: Naoto Shimazaki: fcntl to accepted socket does not work properly


# 1.78 26-Feb-2003 matt

Remove leftover MBUFTRACE asserts.


# 1.77 26-Feb-2003 drochner

deactivate MBUFTRACE related KASSERT()s in the !MBUFTRACE case


# 1.76 26-Feb-2003 matt

Add MBUFTRACE kernel option.
Do a little mbuf rework while here. Change all uses of MGET*(*, M_WAIT, *)
to m_get*(M_WAIT, *). These are not performance critical and making them
call m_get saves considerable space. Add m_clget analogue of MCLGET and
make corresponding change for M_WAIT uses.
Modify netinet, gem, fxp, tulip, nfs to support MBUFTRACE.
Begin to change netstat to use sysctl.


# 1.75 18-Jan-2003 thorpej

Merge the nathanw_sa branch.


Revision tags: nathanw_sa_before_merge fvdl_fs64_base gmcgarry_ctxsw_base gmcgarry_ucred_base nathanw_sa_base
# 1.74 26-Nov-2002 christos

si_ -> sel_ to avoid conflicts with siginfo.


# 1.73 25-Nov-2002 itojun

no need for error check after MEXTMALLOC - jdolecek


# 1.72 25-Nov-2002 itojun

MEXTMALLOC() can fail even if M_WAITOK, if arg is too big for malloc().


Revision tags: kqueue-aftermerge
# 1.71 23-Oct-2002 jdolecek

merge kqueue branch into -current

kqueue provides a stateful and efficient event notification framework
currently supported events include socket, file, directory, fifo,
pipe, tty and device changes, and monitoring of processes and signals

kqueue is supported by all writable filesystems in NetBSD tree
(with exception of Coda) and all device drivers supporting poll(2)

based on work done by Jonathan Lemon for FreeBSD
initial NetBSD port done by Luke Mewburn and Jason Thorpe


Revision tags: kqueue-beforemerge kqueue-base
# 1.70 04-Sep-2002 matt

Use the queue macros from <sys/queue.h> instead of referring to the queue
members directly. Use *_FOREACH whenever possible.


Revision tags: gehenna-devsw-base
# 1.69 31-May-2002 itojun

support setsockopt() with larger data (up to MCLBYTES).
From: Hitoshi Asaeda <Hitoshi.Asaeda@sophia.inria.fr>


Revision tags: netbsd-1-6-PATCH002-RELEASE netbsd-1-6-PATCH002 netbsd-1-6-PATCH002-RC4 netbsd-1-6-PATCH002-RC3 netbsd-1-6-PATCH002-RC2 netbsd-1-6-PATCH002-RC1 netbsd-1-6-PATCH001 netbsd-1-6-PATCH001-RELEASE netbsd-1-6-PATCH001-RC3 netbsd-1-6-PATCH001-RC2 netbsd-1-6-PATCH001-RC1 netbsd-1-6-RELEASE netbsd-1-6-RC3 netbsd-1-6-RC2 netbsd-1-6-RC1 netbsd-1-6-base eeh-devprop-base newlock-base ifpoll-base
# 1.68 11-Feb-2002 jdolecek

branches: 1.68.8;
Switch default for pipes to the faster John S. Dyson's implementation.
Old, socketpair-based ones are available with option PIPE_SOCKETPAIR.


Revision tags: thorpej-mips-cache-base
# 1.67 12-Nov-2001 lukem

add RCSIDs


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2
# 1.66 16-Sep-2001 wiz

branches: 1.66.2;
Spell 'occurred' with two 'r's.


Revision tags: post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.65 17-Jul-2001 jdolecek

branches: 1.65.2;
Expel MSG_COMPAT/COMPAT_OLDSOCK, make the COMPAT_43 wrappers
arrange things as needed. Unfortunately, the check in sockargs()
have to stay, since 4.3BSD bind(2), connect(2) and sendto(2) were
not versioned at the time :(

This code was tested to pass regression tests.


# 1.64 01-Jul-2001 matt

branches: 1.64.2;
Use consistent types for len. Limit sockarg length to reasonable values.


# 1.63 25-Jun-2001 jdolecek

Back off the sendit()/recvit() change, some have problems with it


# 1.62 25-Jun-2001 jdolecek

sys_connect(): fix the call to FILE_UNUSE() so that it's done on return, rather
than immediatelly after getsock() call


# 1.61 25-Jun-2001 jdolecek

Add 'kernsa' parameter for sendit()/recvit(); if nonzero, msg->msg_name
is supposed to point directly to struct mbuf or struct sockaddr in kernel
space as appropriate, rather than being a pointer to memory in userland.

This is to be used by compat/* when emulation needs to wrap
send{to|msg}(2)/recv{from|msg}(2) and modify the passed struct
sockaddr.


# 1.60 16-Jun-2001 jdolecek

Add port of high performance pipe implementation written by John S. Dyson
for FreeBSD project. Besides huge speed boost compared with socketpair-based
pipes, this implementation also uses pagable kernel memory instead of mbufs.

Significant differences to FreeBSD version:
* uses uvm_loan() facility for direct write
* async/SIGIO handling correct also for sync writer, async reader
* limits settable via sysctl, amountpipekva and nbigpipes available via sysctl
* pipes are unidirectional - this is enforced on file descriptor level
for now only, the code would be updated to take advantage of it
eventually
* uses lockmgr(9)-based locks instead of home brew variant
* scatter-gather write is handled correctly for direct write case, data
is transferred by PIPE_DIRECT_CHUNK bytes maximum, to avoid running out of kva

All FreeBSD/NetBSD specific code is within appropriate #ifdef, in preparation
to feed changes back to FreeBSD tree.

This pipe implementation is optional for now, add 'options NEW_PIPE'
to your kernel config to use it.


# 1.59 14-Jun-2001 thorpej

Fix a partial construction problem that can cause race conditions
between creation of a file descriptor and close(2) when using kernel
assisted threads. What we do is stick descriptors in the table, but
mark them as "larval". This causes essentially everything to treat
it as a non-existent descriptor, except for fdalloc(), which sees a
filled slot so that it won't (incorrectly) allocate it again. When
a descriptor is fully constructed, the code that has constructed it
marks it as "mature" (which actually clears the "larval" flag), and
things continue to work as normal.

While here, gather all the code that gets a descriptor from the table
into a fd_getfile() function, and call it, rather than having the
same (sometimes incorrect) code copied all over the place.


# 1.58 06-May-2001 manu

implement the recently introduced EMUL_BSD_ASYNCIO_PIPE emulation package
flag.

EMUL_BSD_ASYNCIO_PIPE notes that the emulated binaries expect the original
BSD pipe behavior for asynchronous I/O, which is to fire SIGIO on read() and
write(). OSes without this flag do not expect any SIGIO to be fired on
read() and write() for pipes, even when async I/O was requested. As far as
we know, the OSes that need EMUL_BSD_ASYNCIO_PIPE are NetBSD, OSF/1 and
Darwin.


Revision tags: thorpej_scsipi_beforemerge thorpej_scsipi_nbase thorpej_scsipi_base
# 1.57 27-Feb-2001 lukem

branches: 1.57.2;
convert to ANSI KNF


# 1.56 10-Dec-2000 fvdl

Make sobind() take a struct proc *. It already took curproc and
passed it down to the appropriate usrreq function, and this
allows usage for contexts that need to be explicitly different
from curproc (like in the NFS code when binding to a reserved port).


# 1.55 24-Nov-2000 jdolecek

define COMPAT_OLDSOCK unconditionally - the code is used virtually for all
emulations besides NetBSD, and this way it's LKM-safe


# 1.54 02-Aug-2000 thorpej

MALLOC()/FREE() are not to be used for variable sized allocations.


# 1.53 27-Jun-2000 mrg

remove include of <vm/vm.h>


Revision tags: netbsd-1-5-RELEASE netbsd-1-5-BETA2 netbsd-1-5-BETA netbsd-1-5-ALPHA2 netbsd-1-5-base minoura-xpg4dl-base
# 1.52 27-May-2000 sommerfeld

branches: 1.52.4;
Reduce use of curproc in several places:

- Change ktrace interface to pass in the current process, rather than
p->p_tracep, since the various ktr* function need curproc anyway.

- Add curproc as a parameter to mi_switch() since all callers had it
handy anyway.

- Add a second proc argument for inferior() since callers all had
curproc handy.

Also, miscellaneous cleanups in ktrace:

- ktrace now always uses file-based, rather than vnode-based I/O
(simplifies, increases type safety); eliminate KTRFLAG_FD & KTRFAC_FD.
Do non-blocking I/O, and yield a finite number of times when receiving
EWOULDBLOCK before giving up.

- move code duplicated between sys_fktrace and sys_ktrace into ktrace_common.

- simplify interface to ktrwrite()


# 1.51 30-Mar-2000 augustss

Get rid of register declarations.


# 1.50 23-Mar-2000 thorpej

Implement fdremove() which is used in place of all the code that
did the "fdp->fd_ofiles[fd] = 0" assignment; fdremove() make sure
the fd_freefiles hints stay in sync.

From OpenBSD.


Revision tags: chs-ubc2-newbase wrstuden-devbsize-19991221 wrstuden-devbsize-base comdex-fall-1999-base fvdl-softdep-base
# 1.49 05-Nov-1999 mycroft

branches: 1.49.2;
Fix recent bug in sys_accept(): we must remove the file descriptor from the
file descriptor table before freeing the file description.


# 1.48 30-Oct-1999 enami

back out unnecessary stylistic changes in recent changes, to keep coding
style closer to NKF.


# 1.47 27-Oct-1999 jdolecek

minor cleanup of previous - avoid goto and code duplication


# 1.46 27-Oct-1999 darrenr

patch from Greg A. Woods to fix panic problems with code that attempts to
recover from failures to accept a socket successfully. Problem suggested
by this:
> It would appear (from two "panic: closef: count < 0" failures in less
> than 12 hours) that Darren's fix to accept(2) for lost file descriptors
> isn't quite correct. His fix inserts a call to closef() to handle one
> of several possible error conditions. However everywhere else in the
> socket code in the same file where falloc() cleanup is necessary the
> function used is ffree().


Revision tags: chs-ubc2-base
# 1.45 01-Jul-1999 itojun

branches: 1.45.2; 1.45.4; 1.45.6;
IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.


# 1.44 01-Jul-1999 darrenr

fix sys_accept() to return EOPNOTSUPP for protocols which don't support
listen/accept (PR_LISTEN flag in protosw) and detect obvious faults in
parameters passed. It is still possible for the address used for copying
the socket information to become invalid between that check and the copyout
so close the connection's allocated fd if the copyout fails so that we can
return EFAULT without allocating an fd and the application not knowing about
it. Ideally we'd be able to queue the connection back up so a later accept
could retrieve it but unfortunately that's not possible.


# 1.43 05-May-1999 thorpej

Add "use counting" to file entries. When closing a file, and it's reference
count is 0, wait for use count to drain before finishing the close.

This is necessary in order for multiple processes to safely share file
descriptor tables.


# 1.42 30-Apr-1999 cgd

add checks for COMPAT_OSF1 in the appropriate places


Revision tags: netbsd-1-4-RELEASE netbsd-1-4-base
# 1.41 10-Feb-1999 kleink

branches: 1.41.2; 1.41.4; 1.41.6;
* Due to addition and use of socklen_t, make the socket option and address
arguments passed to accept(), bind(), connect(), getpeername(), getsockname(),
getsockopt(), recvfrom(), sendto() and sendmsg() unsigned, which also elimiates
a few casts.
* Reflect the (now) signedness of msg_iovlen, which necessiates the addition
of a few casts.


# 1.40 18-Dec-1998 drochner

solve the COMPAT_OLDSOCK/MSG_COMPAT problem differently:
The source files which need MSG_COMPAT define COMPAT_OLDSOCK.


Revision tags: kenh-if-detach-base
# 1.39 26-Nov-1998 mycroft

Revert the functional change in rev 1.38; permit a msg_iovlen of 0.
There are two reasons for this:
* We should be able to pass file descriptors without sending any data.
* We could send zero-length iovecs anyway (but we shouldn't have to do this).
Also, msg_iovlen is already a u_int, so delete a bunch of casts.


Revision tags: chs-ubc-base
# 1.38 04-Aug-1998 kleink

Per XNS Issue 5, calling recvmsg(2) or sendmsg(2) with an msg.msg_iovlen less
than or equal to 0 shall fail with EMSGSIZE; the latter condition was not being
checked for. Also, document the msg.msg_iovlen > {IOV_MAX} case.


# 1.37 04-Aug-1998 kleink

UIO_MAXIOV -> IOV_MAX


# 1.36 04-Aug-1998 perry

Abolition of bcopy, ovbcopy, bcmp, and bzero, phase one.
bcopy(x, y, z) -> memcpy(y, x, z)
ovbcopy(x, y, z) -> memmove(y, x, z)
bcmp(x, y, z) -> memcmp(x, y, z)
bzero(x, y) -> memset(x, 0, y)


# 1.35 03-Aug-1998 kleink

Fix two off-by-one bugs, both present in each recvmsg(2) and sendmsg(2):
* the first one would cause an unnecessary malloc() of iovec storage for
a msg_iovlen of UIO_SMALLIOV although the required amount of memory has
been allocated on the stack.
* the second one would cause a recvmsg() or sendmsg() with a msg_iovlen of
UIO_MAXIOV to fail with EMSGSIZE, which is also a violation of XNS5.


# 1.34 31-Jul-1998 perry

fix sizeofs so they comply with the KNF style guide. yes, it is pedantic.


Revision tags: eeh-paddr_t-base
# 1.33 29-Jul-1998 thorpej

branches: 1.33.2;
When checking for overflow in the residual count, test against SSIZE_MAX.
The read/write system calls return ssize_t because -1 is used to indicate
error, therefore the transfer size MUST be limited to SSIZE_MAX, otherwise
garbage can be returned to the user.

There is NO change from existing behavior here, only a more precise
definition of that the semantics are, except in the Alpha case, where
the full SSIZE_MAX transfer size can now be realized (ssize_t is 64-bit
on the Alpha).


# 1.32 18-Jul-1998 lukem

use AF_LOCAL instead of AF_UNIX


# 1.31 25-Jun-1998 thorpej

defopt KTRACE


# 1.30 25-Apr-1998 matt

Hook for 0-copy (or other optimized) sends and receives


# 1.29 01-Mar-1998 fvdl

Merge with Lite2 + local changes


# 1.28 06-Feb-1998 thorpej

When copying out multiple control messages, ensure that the next control
message is aligned. From David Borman <dab@bsdi.com>.


# 1.27 07-Jan-1998 thorpej

Make insertion and removal of sockets from the partial and incoming
connections queues O(C) rather than O(N).


# 1.26 07-Jan-1998 thorpej

Fix bug in recvit() that would cause recvmsg() to only receive one
control message, even if there were multiple control messages on
the queue. From Jean-Luc Richier <Jean-Luc.Richier@imag.fr>, in
bug report kern/4700.


Revision tags: netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base thorpej-signal-base marc-pcmcia-bp marc-pcmcia-base
# 1.25 26-Jun-1997 thorpej

branches: 1.25.8;
Use UCHAR_MAX rather than "255" when sanity-checking the length of a
sockaddr in sockargs().


# 1.24 26-Jun-1997 thorpej

In sockargs():
- Add a comment describing my feelings about this interface, in general.
- Remove the COMPAT_OLDSOCK length hack. Instead, if the socket argument
is too long to fit in an mbuf, allocate enough external storage to
hold it.
- If the socket argument is a sockaddr, don't allow the length to be
greater than 255, as that would overflow sa_len.
Many thanks to enami tsugutomo <enami@cv.sony.co.jp> for his sanity checking.


Revision tags: is-newarp-before-merge is-newarp-base
# 1.23 22-Dec-1996 cgd

* catch up with system call argument type fixups/const poisoning.
* Fix arguments to various copyin()/copyout() invocations, to avoid
gratuitous casts.
* Some KNF formatting fixes
* Change sockargs()'s second argument to be a const void *, to help
with dealing with the syscall argument type fixups/const poisoning.


# 1.22 14-Jun-1996 cgd

avoid unnecessary checks of m_get/MGET/etc.'s return values. When
they're called with M_WAIT, they are defined to never return NULL.


# 1.21 22-May-1996 mycroft

Pass a proc pointer down to the usrreq and pcbbind functions for PRU_ATTACH, PRU_BIND and
PRU_CONTROL. The usrreq interface really needs to be split up, but this will have to wait.
Remove SS_PRIV completely.


Revision tags: netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.20 17-May-1996 pk

branches: 1.20.4;
Don't touch retval[] in socketpair(); manual page says this system call
return 0 on success (PR#2428).


# 1.19 09-Feb-1996 christos

More proto fixes


# 1.18 04-Feb-1996 christos

First pass at prototyping


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.17 10-Oct-1995 mycroft

Add hooks for COMPAT_FREEBSD, from Noriyuki Soda.


# 1.16 07-Oct-1995 mycroft

Prefix names of system call implementation functions with `sys_'.


# 1.15 19-Sep-1995 thorpej

Make system calls conform to a standard prototype and bring those
prototypes into scope.


# 1.14 12-Aug-1995 mycroft

splnet --> splsoftnet


# 1.13 24-Jun-1995 christos

Extracted all of the compat_xxx routines, and created a library [libcompat]
for them. There are a few #ifdef COMPAT_XX remaining, but they are not easy
or worth eliminating (yet).


# 1.12 10-May-1995 christos

tty_tb.c: need to include ioctl_compat.h in order to compile.
sysv_shm.c: make shm_find_segment_by_shmid global so it can be used by
COMPAT_HPUX. There should be a better way...
rest: Add #ifdef COMPAT_HPUX where needed


# 1.11 05-Mar-1995 fvdl

Extended a couple of defines with "|| defined(COMPAT_LINUX)" to make
things compile without requiring COMPAT_43 and/or COMPAT_09.


# 1.10 30-Oct-1994 cgd

be more careful with types, also pull in headers where necessary.


# 1.9 20-Oct-1994 cgd

update for new syscall args description mechanism


Revision tags: netbsd-1-0-PATCH06 netbsd-1-0-PATCH05 netbsd-1-0-PATCH04 netbsd-1-0-PATCH03 netbsd-1-0-PATCH02 netbsd-1-0-PATCH1 netbsd-1-0-PATCH0 netbsd-1-0-RELEASE netbsd-1-0-base
# 1.8 29-Jun-1994 cgd

New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.7 04-May-1994 mycroft

Add return types where missing. Simplify some of the compat conditionals.
Include compat code if COMPAT_SUNOS with or without COMPAT_43.


# 1.6 18-Dec-1993 mycroft

Canonicalize all #includes.


Revision tags: magnum-base netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.5 17-Jul-1993 mycroft

branches: 1.5.4;
Finish moving struct definitions outside of function declarations.


# 1.4 27-Jun-1993 andrew

* ansifications
* Yuval Yarom's socket recv(2) fixes - access rights problems (see also
uipc_socket.c).


# 1.3 22-May-1993 cgd

add include of select.h if necessary for protos, or delete if extraneous


# 1.2 18-May-1993 cgd

make kernel select interface be one-stop shopping & clean it all up.


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


# 1.204 28-Jun-2022 riastradh

recvmmsg(2): Avoid arithmetic overflow in timeout calculations.

XXX This is not right -- it doesn't actually do anything to time
out...

Reported-by: syzbot+784209d76a94fcc6417b@syzkaller.appspotmail.com


# 1.203 27-Jun-2022 riastradh

sendmsg(2): Avoid buffer overrun in ktrace of invalid cmsghdr.

Reported-by: syzbot+efded148140b23425f5c@syzkaller.appspotmail.com


# 1.202 02-Oct-2021 thorpej

...and correct my terrible spelling.


# 1.201 02-Oct-2021 thorpej

- Strenghen the poll(2) fifo_inout test to ensure that once the reader
has read enough that exactly PIPE_BUF space is available that the FIFO
becomes writable again.
- When creating a FIFO, ensure that the receive low water mark is 1
(a FIFO must be readable when at least 1 byte is available); this
was already the case implicitly, but this makes it explicit.
- Similarly, set the send low water mark to PIPE_BUF to ensure that
the pipe is writable when at least PIPE_BUF bytes of space are available
in the send buffer. Without this change, the strengthened test case
above does not pass (the default send low water mark is larger than
PIPE_BUF; see soreserve()).
- Make the same low water mark changes to the PIPE_SOCKETPAIR case.


Revision tags: thorpej-i2c-spi-conf2-base thorpej-futex2-base thorpej-cfargs2-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base thorpej-i2c-spi-conf-base thorpej-cfargs-base thorpej-futex-base
# 1.200 23-May-2020 ad

Move proc_lock into the data segment. It was dynamically allocated because
at the time we had mutex_obj_alloc() but not __cacheline_aligned.


Revision tags: netbsd-9-2-RELEASE netbsd-9-1-RELEASE bouyer-xenpvh-base2 phil-wifi-20200421 bouyer-xenpvh-base1 phil-wifi-20200411 bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3 netbsd-9-0-RELEASE netbsd-9-0-RC2 ad-namecache-base2 ad-namecache-base1 ad-namecache-base netbsd-9-0-RC1 phil-wifi-20191119 netbsd-9-base phil-wifi-20190609 isaki-audio2-base pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126
# 1.199 12-Nov-2018 hannken

branches: 1.199.4;
sys_recvmmsg: don't defer an error that already gets returned.


# 1.198 07-Nov-2018 hannken

Don't defer errors from sendmmsg(). This matches the linux manpage.

Defer errors from recvmmsg() through so_rerror and
tests and return a deferred error on entry.

Ok: christos@


Revision tags: pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.197 03-Sep-2018 riastradh

Rename min/max -> uimin/uimax for better honesty.

These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER! Some subsystems have

#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)


# 1.196 01-Aug-2018 rjs

Add ioctl(2) handler for kernel part of sctp_peeloff().


# 1.195 31-Jul-2018 rjs

Add getsockopt2() syscall.


Revision tags: pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521
# 1.194 04-May-2018 christos

branches: 1.194.2;
define MBUFTYPES here.


# 1.193 03-May-2018 christos

Fix COMPAT_NETBSD32 cmsg handling:

1. alignment was wrong for > 1 message
2. macros were doing incorrect pointer comparisons, fortunately ending
the iteration early after the fists cmsg instead of crashing.
3. don't output 32 bit ktrace records for cmsg. 32 bit programs running
under emulation on 64 bit systems should produce 64 bit ktrace records
so that the native ktrace can handle the records; remove extra arguments
that are now not needed (the 32 bit msghdr).
4. output the correct type for cmsg trace records.
5. output all the cmsg records in traces instead of just the first one.

Welcome to 8.99.15 because of the argument removal.

XXX: Really all the code should be changed to use the CMSG_{FIRST,NXT}HDR
macros...


Revision tags: pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322
# 1.192 16-Mar-2018 christos

PR/53103: Timo Buhrmester: linux emulation of sendto(2) broken

The sockargs refactoring broke it, because sockargs only works with a user
address. Added an argument to sockargs to indicate where the address is
coming from. Welcome to 8.99.14.


Revision tags: pgoyette-compat-0315 pgoyette-compat-base
# 1.191 12-Feb-2018 maxv

branches: 1.191.2;
Add a KASSERT; we expect *from to be a single mbuf (not chained).


# 1.190 04-Jan-2018 christos

Add a new sockopt member to keep track of the actual size of the option
that should be returned to the caller in getsockopt(2).

(Tom Ivar Helbekkmo)


# 1.189 31-Dec-2017 christos

pass valsize for getsockopt like we do for setsockopt


# 1.188 26-Dec-2017 kamil

Refactor pipe1() and correct a bug in sys_pipe2() (SYS_pipe2)

sys_pipe2() returns two integers (values), the 2nd one is a copy of the 2nd
file descriptor that lands in fildes[2]. This is a side effect of reusing
the code for sys_pipe() (SYS_pipe) and not cleaning it up.

The first returned value is (on success) 0.

Introduced a small refactoring in pipe1() that it does not operate over
retval[], but on an array int[2]. A user sets retval[] for pipe() when
desired and needed.

This refactoring touches compat code: netbsd32, linux, linux32.

Before the changes on NetBSD/amd64:

$ ktruss -i ./a.out
[...]
15131 1 a.out pipe2(0x7f7fff2e62b8, 0) = 0, 4
[...]

After the changes:

$ ktruss -i ./a.out
[...]
782 1 a.out pipe2(0x7f7fff97e850, 0) = 0
[...]

There should not be a visible change for current users.

Sponsored by <The NetBSD Foundation>


Revision tags: tls-maxphys-base-20171202 nick-nhusb-base-20170825 perseant-stdc-iso10646-base
# 1.187 20-Jun-2017 christos

Change len type to be unsigned int for consistency with the input type.
Don't check for negative; it does not matter we clamp anyway. This
broke the compat32 getsockname() where an unitialized socklen_t ended
up randomly negative causing it to fail.


Revision tags: matt-nb8-mediatek-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204
# 1.186 03-Feb-2017 christos

branches: 1.186.6;
expose sendmsg_so and recvmsg_so.


# 1.185 02-Feb-2017 christos

expose copyout_sockname_sb


Revision tags: bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204
# 1.184 03-Dec-2016 christos

branches: 1.184.2;
Add missing ktrkuser


Revision tags: pgoyette-localcount-20161104 nick-nhusb-base-20161004 localcount-20160914
# 1.183 13-Sep-2016 martin

Make the ktrace record written by do_sys_sendmsg/do_sys_recvmsg overridable
by the caller. Use this in compat_netbsd32 to log the 32bit version, so
the 32bit userland kdump is happy.


Revision tags: pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.182 07-Jul-2016 msaitoh

branches: 1.182.2;
KNF. Remove extra spaces. No functional change.


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.181 01-Nov-2015 christos

Don't overwrite the user iov pointer in sendmmsg. Make the send and receive
code look the same.


Revision tags: nick-nhusb-base-20150921
# 1.180 24-Aug-2015 pooka

sprinkle _KERNEL_OPT


# 1.179 22-Jul-2015 maxv

Memory leak. Triggerable from an unprivileged user via COMPAT_43.


Revision tags: nick-nhusb-base-20150606
# 1.178 09-May-2015 rtr

change sosend() to accept sockaddr * instead of mbuf * for nam.

bump to 7.99.16


# 1.177 02-May-2015 rtr

make connect syscall use sockaddr_big and modify pr_{send,connect}
nam parameter type from buf * to sockaddr *.

final commit for parameter type changes to protocol user requests

* bump kernel version to 7.99.15 for parameter type changes to pr_{send,connect}


# 1.176 24-Apr-2015 rtr

make accept, getsockname and getpeername syscalls use sockaddr_big and modify
pr_{accept,sockname,peername} nam parameter type from mbuf * to sockaddr *.

* retained use of mbuftypes[MT_SONAME] for now.
* bump to netbsd version 7.99.12 for parameter type change.

patch posted to tech-net@ 2015/04/19


Revision tags: nick-nhusb-base-20150406
# 1.175 03-Apr-2015 rtr

* change pr_bind to accept struct sockaddr * instead of struct mbuf *
* update protocol bind implementations to use/expect sockaddr *
instead of mbuf *
* introduce sockaddr_big struct for storage of addr data passed via
sys_bind; sockaddr_big is of sufficient size and alignment to
accommodate all addr data sizes received.
* modify sys_bind to allocate sockaddr_big instead of using an mbuf.
* bump kernel version to 7.99.9 for change to pr_bind() parameter type.

Patch posted to tech-net@
http://mail-index.netbsd.org/tech-net/2015/03/15/msg005004.html

The choice to use a new structure sockaddr_big has been retained since
changing sockaddr_storage size would lead to unnecessary ABI change. The
use of the new structure does not preclude future work that increases
the size of sockaddr_storage and at that time sockaddr_big may be
trivially replaced.

Tested by mrg@ and myself, discussed with rmind@, posted to tech-net@


# 1.174 06-Mar-2015 rtr

Return EINVAL if namelen isn't large enough to encompass the expected
members of sockaddr structures. i.e. sa_len and sa_family.

Discussed with and patch by christos@


Revision tags: nick-nhusb-base
# 1.173 05-Sep-2014 matt

branches: 1.173.2;
Try not to use f_data, use f_{vnode,socket,pipe,mqueue,kqueue,ksem} to get
a correctly typed pointer.


Revision tags: netbsd-7-0-RC2 netbsd-7-0-RC1 netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.172 09-Aug-2014 rtr

branches: 1.172.2;
split PRU_CONNECT2 & PRU_PURGEIF function out of pr_generic() usrreq
switches and put into separate functions

- always KASSERT(solocked(so)) even if not implemented
(for PRU_CONNECT2 only)

- replace calls to pr_generic() with req = PRU_CONNECT2 with calls to
pr_connect2()

- replace calls to pr_generic() with req = PRU_PURGEIF with calls to
pr_purgeif()

put common code from unp_connect2() (used by unp_connect() into
unp_connect1() and call out to it when needed

patch only briefly reviewed by rmind@


# 1.171 09-Jul-2014 rtr

* split PRU_PEERADDR and PRU_SOCKADDR function out of pr_generic()
usrreq switches and put into separate functions
xxx_{peer,sock}addr(struct socket *, struct mbuf *).

- KASSERT(solocked(so)) always in new functions even if request
is not implemented

- KASSERT(pcb != NULL) and KASSERT(nam) if the request is
implemented and not for tcp.

* for tcp roll #ifdef KPROF and #ifdef DEBUG code from tcp_usrreq() into
easier to cut & paste functions tcp_debug_capture() and
tcp_debug_trace()

- functions provided by rmind
- remaining use of PRU_{PEER,SOCK}ADDR #define to be removed in a
future commit.

* rename netbt functions to permit consistency of pru function names
(as has been done with other requests already split out).

- l2cap_{peer,sock}addr() -> l2cap_{peer,sock}_addr_pcb()
- rfcomm_{peer,sock}addr() -> rfcomm_{peer,sock}_addr_pcb()
- sco_{peer,sock}addr() -> sco_{peer,sock}_addr_pcb()

* split/refactor do_sys_getsockname(lwp, fd, which, nam) into
two functions do_sys_get{peer,sock}name(fd, nam).

- move PRU_PEERADDR handling into do_sys_getpeername() from
do_sys_getsockname()
- have svr4_stream directly call do_sys_get{sock,peer}name()
respectively instead of providing `which' & fix a DPRINTF string
that incorrectly wrote "getpeername" when it meant "getsockname"
- fix sys_getpeername() and sys_getsockname() to call
do_sys_get{sock,peer}name() without `which' and `lwp' & adjust
comments
- bump kernel version for removal of lwp & which parameters from
do_sys_getsockname()

note: future cleanup to remove struct mbuf * abuse in
xxx_{peer,sock}name()
still to come, not done in this commit since it is easier to do post
split.

patch reviewed by rmind

welcome to 6.99.47


Revision tags: rmind-smpnet-nbase rmind-smpnet-base
# 1.170 18-May-2014 rmind

Add struct pr_usrreqs with a pr_generic function and prepare for the
dismantling of pr_usrreq in the protocols; no functional change intended.
PRU_ATTACH/PRU_DETACH changes will follow soon.

Bump for struct protosw. Welcome to 6.99.62!


# 1.169 17-May-2014 rmind

- fsocreate: set SS_NBIO before the file descriptor is affixed as there is
a theoretical race condition (hard to trigger, though); remove the LWP
parameter and clean up the code a little.
- Sprinkle few comments.
- Remove M_SOOPTS while here.


# 1.168 17-May-2014 rmind

makesocket: set SS_NBIO slightly earlier.


# 1.167 17-May-2014 rmind

Remove trailing whitespaces, wrap long lines, minor KNF; no functional changes.


Revision tags: yamt-pagecache-base9
# 1.166 07-Apr-2014 seanb

Fix a case where an erroneous EAGAIN was returned out of recvmmsg.
This occured when some, but not all of the mmsg array members
were filled with data from a non-blocking socket.
PR kern/48725


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.165 09-Oct-2013 christos

branches: 1.165.2;
delete extra m_len initialization.


# 1.164 09-Oct-2013 christos

PR/48292: Justin Cormack: paccept creates sockets that cannot be made blocking
Reset the socket flags not just the file flags for non-blocking I/O.
XXX: pullup 6


# 1.163 08-Oct-2013 christos

PR/47591: Michael Plass: If the unix socket is closed before accept,
unp->unp_conn will be NULL in PRU_ACCEPT, as called from
sys_accept->so_accept. This will cause the usrreq to return with
no error, leaving the mbuf gotten from m_get() with an uninitialized
length, containing junk from a previous call. Initialize m_len to
be 0 to handle this case. This is yet another reason why Beverly's
idea of setting m_len = 0 in m_get() makes a lot of sense. Arguably
this could be an error, since the data we return now has 0 family
and length.


# 1.162 29-Aug-2013 rmind

Remove SS_ISCONFIRMING, it is unused and TP4 will not come back.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.161 03-Jun-2013 christos

branches: 1.161.2;
use the proper name for kdump pretty-printing.


Revision tags: agc-symver-base
# 1.160 14-Feb-2013 christos

PR/47569: Valery Ushakov: SOCK_NONBLOCK does not work because it does not
set SS_NBIO.
XXX: there are too many flags that mean the same thing in too many places,
and too many flags that mean the same thing and are different.


# 1.159 14-Feb-2013 riastradh

Fix some screw cases in cmsg file descriptor passing.

- Don't leave garbage in the control buffer if allocating file
descriptors fails in unp_externalize.

- Scrub the space between CMSG_LEN and CMSG_SPACE to avoid kernel
memory disclosure in unp_externalize.

- Don't read past cmsg_len when closing file descriptors that
couldn't get delivered, in free_rights.

ok christos


Revision tags: yamt-pagecache-base8
# 1.158 29-Dec-2012 mlelstv

The sanity check prevented messages that carry only ancillary data.


# 1.157 29-Dec-2012 mlelstv

If an untraced process sleeps in recvmsg/sendmsg, the syscall does not
allocate an iov structure for ktrace. When tracing is then enabled
and the process wakes up, it crashes the kernel.

Undo the last commit which introduced this error path.

Avoid the mentioned kmem_alloc assertion by adding a sanity check analog
to similar code in sys_generic.c for I/O on file handles instead of
sockets.

This also causes the syscall to return EMSGSIZE if the msg_iovlen member
of the msg structure is less than or equal to 0, as defined in
recvmsg(2)/sendmsg(2).


Revision tags: yamt-pagecache-base7 yamt-pagecache-base6
# 1.156 17-Jul-2012 njoly

branches: 1.156.2;
Avoid kmem_alloc KASSERT for 0 byte allocation, when tracing processes
that use empty messages with sendmsg/recvmsg.


# 1.155 22-Jun-2012 christos

Add {send,recv}mmsg from Linux


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2 netbsd-6-base
# 1.154 25-Jan-2012 christos

branches: 1.154.2;
revert atomics for so_options since it is a short.


# 1.153 25-Jan-2012 christos

need <sys/atomic.h>


# 1.152 25-Jan-2012 christos

Add locking, requested by yamt. Note that locking is not used everywhere
for these.


# 1.151 25-Jan-2012 christos

As discussed in tech-kern, provide the means to prevent delivery of SIGPIPE
on EPIPE for all file descriptor types:

- provide O_NOSIGPIPE for open,kqueue1,pipe2,dup3,fcntl(F_{G,S}ETFL) [NetBSD]
- provide SOCK_NOSIGPIPE for socket,socketpair [NetBSD]
- provide SO_NOSIGPIPE for {g,s}seckopt [NetBSD/FreeBSD/MacOSX]
- provide F_{G,S}ETNOSIGPIPE for fcntl [MacOSX]


# 1.150 21-Dec-2011 christos

simplify expression


# 1.149 20-Dec-2011 christos

- Eliminate so_nbio and turn it into a bit SS_NBIO in so_state.
- Introduce MSG_NBIO so that we can turn non blocking i/o on a per call basis
- Use MSG_NBIO to fix the XXX: multi-threaded issues on the fifo sockets.
- Don't set SO_CANTRCVMORE, if we were interrupted (perhaps do it for all
errors?).


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2
# 1.148 04-Nov-2011 christos

branches: 1.148.4;
Fix error I introduced in previous commit that caused asymmetric connects
when SOCK_NONBLOCK or SOCK_CLOEXEC was specified. Factor out common code
and simplify error return.


Revision tags: yamt-pagecache-base
# 1.147 21-Sep-2011 christos

branches: 1.147.2;
Put the mbuf type in the ktrace record so that we know how to decode it
in userland.


# 1.146 27-Jul-2011 uebayasi

These don't need uvm/uvm_extern.h.


# 1.145 15-Jul-2011 christos

fail with EINVAL if flags not are not O_CLOEXEC|O_NONBLOCK in pipe2(2) and
dup3(2)


# 1.144 26-Jun-2011 christos

* Arrange for interfaces that create new file descriptors to be able to
set close-on-exec on creation (http://udrepper.livejournal.com/20407.html).

- Add F_DUPFD_CLOEXEC to fcntl(2).
- Add MSG_CMSG_CLOEXEC to recvmsg(2) for unix file descriptor passing.
- Add dup3(2) syscall with a flags argument for O_CLOEXEC, O_NONBLOCK.
- Add pipe2(2) syscall with a flags argument for O_CLOEXEC, O_NONBLOCK.
- Add flags SOCK_CLOEXEC, SOCK_NONBLOCK to the socket type parameter
for socket(2) and socketpair(2).
- Add new paccept(2) syscall that takes an additional sigset_t to alter
the sigmask temporarily and a flags argument to set SOCK_CLOEXEC,
SOCK_NONBLOCK.
- Add new mode character 'e' to fopen(3) and popen(3) to open pipes
and file descriptors for close on exec.
- Add new kqueue1(2) syscall with a new flags argument to open the
kqueue file descriptor with O_CLOEXEC, O_NONBLOCK.

* Fix the system calls that take socklen_t arguments to actually do so.

* Don't include userland header files (signal.h) from system header files
(rump_syscallargs.h).

* Bump libc version for the new syscalls.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.143 24-Apr-2011 rmind

- Replace few malloc(9) uses with kmem(9).
- Rename buf_malloc() to buf_alloc(), fix comments.
- Remove some unnecessary inclusions.


# 1.142 10-Apr-2011 christos

- Add O_CLOEXEC to open(2)
- Add fd_set_exclose() to encapsulate uses of FIO{,N}CLEX, O_CLOEXEC, F{G,S}ETFD
- Add a pipe1() function to allow passing flags to the fd's that pipe(2)
opens to ease implementation of linux pipe2(2)
- Factor out fp handling code from open(2) and fhopen(2)


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base matt-mips64-premerge-20101231 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10 uebayasi-xip-base1
# 1.141 23-Apr-2010 rmind

branches: 1.141.2;
Replace M_IOV and some malloc(9)s with kmem(9), and while there:
- Fix invalid free (M_TEMP vs M_IOV) in do_sys_recvmsg(), spotted by jakllsch@.
Also, same fix in osf1_sys_sendmsg_xopen().
- Fix attempt to free non-allocated memory in error path in netbsd32___getfh30().
- Plug a memory leak in compat_43_netbsd32_orecvmsg().


Revision tags: yamt-nfs-mp-base9 uebayasi-xip-base
# 1.140 21-Jan-2010 pgoyette

branches: 1.140.2; 1.140.4;
Remove unnecessary call to kauth_cred_free().

This resolves an occassional crash I'd been experiencing as reported on
current-users@

Fix suggested by and OK elad@


# 1.139 29-Dec-2009 elad

Add credentials to to sockets.

We don't need any deferred free etc. because we no longer free the
credentials in interrupt context.

Tons of help from matt@, thanks!


# 1.138 20-Dec-2009 dsl

If a multithreaded app closes an fd while another thread is blocked in
read/write/accept, then the expectation is that the blocked thread will
exit and the close complete.
Since only one fd is affected, but many fd can refer to the same file,
the close code can only request the fs code unblock with ERESTART.
Fixed for pipes and sockets, ERESTART will only be generated after such
a close - so there should be no change for other programs.
Also rename fo_abort() to fo_restart() (this used to be fo_drain()).
Fixes PR/26567


Revision tags: matt-premerge-20091211
# 1.137 09-Dec-2009 dsl

Rename fo_drain() to fo_abort(), 'drain' is used to mean 'wait for output
do drain' in many places, whereas fo_drain() was called in order to force
blocking read()/write() etc calls to return to userspace so that a close()
call from a different thread can complete.
In the sockets code comment out the broken code in the inner function,
it was being called from compat code.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 jym-xensuspend-nbase yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.136 04-Apr-2009 ad

Add fileops::fo_drain(), to be called from fd_close() when there is more
than one active reference to a file descriptor. It should dislodge threads
sleeping while holding a reference to the descriptor. Implemented only for
sockets but should be extended to pipes, fifos, etc.

Fixes the case of a multithreaded process doing something like the
following, which would have hung until the process got a signal.

thr0 accept(fd, ...)
thr1 close(fd)


Revision tags: nick-hppapmap-base2
# 1.135 21-Jan-2009 yamt

branches: 1.135.2;
restore the pre socket locking patch signal behaviour.
this fixes a busy-loop in nfs_connect.


Revision tags: netbsd-5-0-RC1 haad-dm-base2 haad-nbase2 ad-audiomp2-base netbsd-5-base matt-mips64-base2 haad-dm-base1 wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 haad-dm-base mjf-devfs2-base
# 1.134 06-Aug-2008 plunky

branches: 1.134.2; 1.134.4;
Convert socket options code to use a sockopt structure
instead of laying everything into an mbuf.

approved by core


Revision tags: simonb-wapbl-nbase simonb-wapbl-base
# 1.133 24-Jun-2008 ad

branches: 1.133.2;
Nothing uses getsock/getvnode any more.


Revision tags: wrstuden-revivesa-base-1 yamt-pf42-base4 yamt-pf42-base3 wrstuden-revivesa-base
# 1.132 30-May-2008 rmind

branches: 1.132.2;
do_sys_accept: release the reference to sock in few error paths.
Should fix PR/38790, report and test-case by Nicolas Joly.


Revision tags: hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2
# 1.131 28-Apr-2008 martin

branches: 1.131.2;
Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.130 24-Apr-2008 ad

branches: 1.130.2;
Network protocol interrupts can now block on locks, so merge the globals
proclist_mutex and proclist_lock into a single adaptive mutex (proc_lock).
Implications:

- Inspecting process state requires thread context, so signals can no longer
be sent from a hardware interrupt handler. Signal activity must be
deferred to a soft interrupt or kthread.

- As the proc state locking is simplified, it's now safe to take exit()
and wait() out from under kernel_lock.

- The system spends less time at IPL_SCHED, and there is less lock activity.


# 1.129 24-Apr-2008 ad

Merge the socket locking patch:

- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.

With much feedback from matt@ and plunky@.


Revision tags: yamt-pf42-baseX yamt-pf42-base ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14
# 1.128 21-Mar-2008 ad

branches: 1.128.2;
Catch up with descriptor handling changes. See kern_descrip.c revision
1.173 for details.


Revision tags: keiichi-mipv6-nbase nick-net80211-sync-base keiichi-mipv6-base matt-armv6-nbase mjf-devfs-base hpcarm-cleanup-base
# 1.127 06-Feb-2008 ad

branches: 1.127.6;
Don't lock the socket to set/clear FNONBLOCK. Just set it atomically.


Revision tags: vmlocking2-base3 bouyer-xeni386-nbase bouyer-xeni386-base matt-armv6-base
# 1.126 26-Dec-2007 ad

Merge more changes from vmlocking2, mainly:

- Locking improvements.
- Use pool_cache for more items.


# 1.125 20-Dec-2007 dsl

Convert all the system call entry points from:
int foo(struct lwp *l, void *v, register_t *retval)
to:
int foo(struct lwp *l, const struct foo_args *uap, register_t *retval)
Fixup compat code to not write into 'uap' and (in some cases) to actually
pass a correctly formatted 'uap' structure with the right name to the
next routine.
A few 'compat' routines that just call standard ones have been deleted.
All the 'compat' code compiles (along with the kernels required to test
build it).
98% done by automated scripts.


# 1.124 16-Dec-2007 elad

Make solisten() take an lwp pointer like the rest, so it can be passed down
to pr_usrreq.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 vmlocking-nbase jmcneill-pm-base reinoud-bufcleanup-base
# 1.123 24-Nov-2007 dyoung

branches: 1.123.2; 1.123.6;
Pass the mbuf type (e.g., MT_SONAME, MT_SOOPTS) as the second
argument to getsockmbuf().


Revision tags: jmcneill-base bouyer-xenamd64-base2 yamt-x86pmap-base4 bouyer-xenamd64-base yamt-x86pmap-base3 yamt-x86pmap-base2 vmlocking-base
# 1.122 05-Oct-2007 dyoung

branches: 1.122.4;
Use getsombuf().


Revision tags: yamt-x86pmap-base
# 1.121 19-Sep-2007 christos

branches: 1.121.2;
minor nits; no code change.


# 1.120 19-Sep-2007 dyoung

1) Introduce a new socket option, (SOL_SOCKET, SO_NOHEADER), that
tells a socket that it should both add a protocol header to tx'd
datagrams and remove the header from rx'd datagrams:

int onoff = 1, s = socket(...);
setsockopt(s, SOL_SOCKET, SO_NOHEADER, &onoff);

2) Add an implementation of (SOL_SOCKET, SO_NOHEADER) for raw IPv4
sockets.

3) Reorganize the protocols' pr_ctloutput implementations a bit.
Consistently return ENOPROTOOPT when an option is unsupported,
and EINVAL if a supported option's arguments are incorrect.
Reorganize the flow of code so that it's more clear how/when
options are passed down the stack until they are handled.

Shorten some pr_ctloutput staircases for readability.

4) Extract common mbuf code into subroutines, add new sockaddr
methods, and introduce a new subroutine, fsocreate(), for reuse
later; use it first in sys_socket():

struct mbuf *m_getsombuf(struct socket *so)

Create an mbuf and make its owner the socket `so'.

struct mbuf *m_intopt(struct socket *so, int val)

Create an mbuf, make its owner the socket `so', put the
int `val' into it, and set its length to sizeof(int).


int fsocreate(..., int *fd)

Create a socket, a la socreate(9), put the socket into the
given LWP's descriptor table, return the descriptor at `fd'
on success.

void *sockaddr_addr(struct sockaddr *sa, socklen_t *slenp)
const void *sockaddr_const_addr(const struct sockaddr *sa, socklen_t *slenp)

Extract a pointer to the address part of a sockaddr. Write
the length of the address part at `slenp', if `slenp' is
not NULL.

socklen_t sockaddr_getlen(const struct sockaddr *sa)

Return the length of a sockaddr. This just evaluates to
sa->sa_len. I only add this for consistency with code that
appears in a portable userland library that I am going to
import.

const struct sockaddr *sockaddr_any(const struct sockaddr *sa)

Return the "don't care" sockaddr in the same family as
`sa'. This is the address a client should sobind(9) if it
does not care the source address and, if applicable, the
port et cetera that it uses.

const void *sockaddr_anyaddr(const struct sockaddr *sa, socklen_t *slenp)

Return the "don't care" sockaddr in the same family as
`sa'. This is the address a client should sobind(9) if it
does not care the source address and, if applicable, the
port et cetera that it uses.


Revision tags: nick-csl-alignment-base5
# 1.119 06-Sep-2007 rmind

do_sys_sendmsg: Plug a possible leak.
From CID: 4535


# 1.118 01-Sep-2007 dsl

Don't error calls to copy socket addresses to userspace when the application
has provided a non-null buffer pointer and a zero length.


# 1.117 27-Aug-2007 dsl

ktrace socket control structures (ie msghdr, address etc) using ktrkuser().


# 1.116 15-Aug-2007 ad

branches: 1.116.2;
Changes to make ktrace LKM friendly and reduce ifdef KTRACE. Proposed
on tech-kern.


Revision tags: matt-mips64-base nick-csl-alignment-base
# 1.115 15-Jul-2007 dsl

branches: 1.115.2; 1.115.6;
Remove non-user flags (especially MSG_IOVUSRSPACE) from mp->msg_flags
before passing to so_receive.
This may (or may not) have any effect...


Revision tags: mjf-ufs-trans-base
# 1.114 01-Jul-2007 dsl

Check for SOL_SOCKET when checking for SCM_RIGHTS.


# 1.113 24-Jun-2007 dsl

Split sys_getpeername() and sys_getsockname() so they can be called when the
'name' is wanted in kernel code.
Similarly split sys_accept() and change the split in recvmsg() so that it
is useful to the compat functions, recvit() is removed and replaced by
do_sys_recvmsg().
Factor out the code that writes socked names to userspace (from mbuf) to
avoid replicated code.
Extract the code that writes socket 'control' (CMSG) data out to userspace,
being more careful about the 'fd' that may exist inside SCM_RIGHTS msgs.
(they still get lost if some of the latter copyout calls fail).
Since these are new functions, and old LKMs will fail to load.


# 1.112 02-Jun-2007 enami

- Fix obvious typos so that sendto(2) works.
- Wrap lines again.


# 1.111 01-Jun-2007 dsl

Split sys_bind() and sys_connect() so that compat code can use common code
once the 'address' has been copied into an mbuf.
Add extra flags for 'struct msghdr.msg_flags' to indicate that the address
and control are already in mbufs, and that the uio structure is in userspace
for sending data, rename sendit() to do_sys_sendmsg() to ensure no old code
passes in random flags.
Changes to compat code to use new functions - removing some stackgap use.
Fix a 'use after free' in compat_43_sys_recvmsg.
I ***THINK*** the code that converts 'cmsg' formatted data is borked!
svr4_stream.c ought to be generated from svr4_32_stream.c during the build.


Revision tags: yamt-idlelwp-base8
# 1.110 13-May-2007 dsl

Fallout from caddr_t deletion - remove a load of redundant (void *) casts.


# 1.109 18-Apr-2007 yamt

sys_accept: fix usecount botch and double soclose in rev.1.108.


# 1.108 15-Apr-2007 yamt

sys_accept: don't leak a socket on error.


Revision tags: thorpej-atomic-base
# 1.107 04-Mar-2007 christos

branches: 1.107.2; 1.107.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


Revision tags: ad-audiomp-base post-newlock2-merge
# 1.106 09-Feb-2007 ad

branches: 1.106.2;
Merge newlock2 to head.


Revision tags: newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base netbsd-4-base
# 1.105 01-Nov-2006 yamt

branches: 1.105.2;
remove some __unused from function parameters.


# 1.104 23-Oct-2006 elad

PR/34873: Ryo Shimizu: sendmsg() can cause kernel panic

Patch applied, thanks!


Revision tags: yamt-splraiseipl-base2
# 1.103 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


Revision tags: yamt-splraiseipl-base yamt-pdpolicy-base9 yamt-pdpolicy-base8 rpaulo-netinet-merge-pcb-base
# 1.102 22-Aug-2006 seanb

branches: 1.102.2; 1.102.4;
Don't leave a dangling socket (no associated struct file) if
user supplied a bad name or anamelen parameter to accept(2).
If bad paramaters were suplied and a copyout() failed, the
struct file was cleaned up but not the associated socket. This
could leave sockets in CLOSE_WAIT that could never be closed.


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base7
# 1.101 23-Jul-2006 ad

branches: 1.101.2;
Use the LWP cached credentials where sane.


# 1.100 26-Jun-2006 mrg

version the socket(2) syscall. for compat30 socket, we use
EPROTONOSUPPORT instead of EAFNOSUPPORT.

from pavel@ with a little bit of clean up from myself.

XXX: netbsd32 (and perhaps other emulations) should be able
XXX: to call the standard socket calls for this i think, but
XXX: revisit this at another time.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base yamt-pdpolicy-base5 chap-midi-base simonb-timecounters-base
# 1.99 16-May-2006 christos

branches: 1.99.4;
Don't set mature an fd that has been ffree'd


Revision tags: elad-kernelauth-base
# 1.98 11-May-2006 christos

Add MSG_NOSIGNAL (from FreeBSD)


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2 yamt-pdpolicy-base
# 1.97 01-Mar-2006 yamt

branches: 1.97.2; 1.97.4; 1.97.6;
merge yamt-uio_vmspace branch.

- use vmspace rather than proc or lwp where appropriate.
the latter is more natural to specify an address space.
(and less likely to be abused for random purposes.)
- fix a swdmover race.


Revision tags: yamt-uio_vmspace-base5
# 1.96 26-Dec-2005 perry

branches: 1.96.2; 1.96.4; 1.96.6;
u_intN_t -> uintN_t


# 1.95 11-Dec-2005 christos

merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base ktrace-lwp-base
# 1.94 03-Sep-2005 martin

In adjust_rights() Use CMSG_SPACE() to calculate the number of
filedescriptors passed in this message - the counterpart in
unp_externalize does this as well.

Note that CMSG_SPACE(0) does not make sense, since it does not invoke
the alignment magic - so use CMSG_SPACE(sizeof(int)) and adjust the
calculated total later.

This fixes the postfix conection cache for 64bit platforms. Previously
the number of passed filed descriptors (nfds) would have been
calculeted too high, causing the fdrelease() of uninitialized junk.


# 1.93 03-Sep-2005 martin

minor knf tweak


# 1.92 30-May-2005 martin

branches: 1.92.2;
Close additional file descriptors if we set MSG_CTRUNC in a SCM_RIGHTS
message. From der Mouse in PR kern/30370.


# 1.91 29-May-2005 christos

- add const.
- remove unnecessary casts.
- add __UNCONST casts and mark them with XXXUNCONST as necessary.


Revision tags: yamt-km-base4 yamt-km-base3 netbsd-3-base kent-audio2-base
# 1.90 26-Feb-2005 perry

branches: 1.90.2;
nuke trailing whitespace


Revision tags: yamt-km-base2 yamt-km-base kent-audio1-beforemerge kent-audio1-base
# 1.89 30-Nov-2004 christos

branches: 1.89.4; 1.89.6;
Cloning cleanup:
1. make fileops const
2. add 2 new negative errno's to `officially' support the cloning hack:
- EDUPFD (used to overload ENODEV)
- EMOVEFD (used to overload ENXIO)
3. Created an fdclone() function to encapsulate the operations needed for
EMOVEFD, and made all cloners use it.
4. Centralize the local noop/badop fileops functions to:
fnullop_fcntl, fnullop_poll, fnullop_kqfilter, fbadop_stat


# 1.88 22-May-2004 jonathan

Eliminate several uses of `curproc' from the socket-layer code and from NFS.

Add a new explicit `struct proc *p' argument to socreate(), sosend().
Use that argument instead of curproc. Follow-on changes to pass that
argument to socreate(), sosend(), and (*so->so_send)() calls.
These changes reviewed and independently recoded by Matt Thomas.

Changes to soreceive() and (*dom->dom_exernalize() from Matt Thomas:
pass soreceive()'s struct uio* uio->uio_procp to unp_externalize().
Eliminate curproc from unp_externalize. Also, now soreceive() uses
its uio->uio_procp value, pass that same value downward to
((pr->pru_usrreq)() calls for consistency, instead of (struct proc * )0.

Similar changes in sys/nfs to eliminate (most) uses of curproc,
either via the req-> r_procp field of a struct nfsreq *req argument,
or by passing down new explicit struct proc * arguments.

Reviewed by: Matt Thomas, posted to tech-kern.
NB: The (*pr->pru_usrreq)() change should be tested on more (all!) protocols.


# 1.87 18-May-2004 ragge

Fix connect() "bug": If connect() is interrupted by a signal, the connection
attempt is terminated, so if a process needs frequent timer interrupts
it can't ever connect() to a machine far away.

Bug found by Erik Lundgren, bugfix (for the same problem) is similar to
the way FreeBSD solved the same problem.

As a side effect, the new connect() behaviour conformes to Posix.


Revision tags: netbsd-2-0-base
# 1.86 29-Nov-2003 matt

branches: 1.86.2;
Restore a change that made AF_LOCAL sockets block on connect(2) until
accepted. However, this time this behavor is not the default. Instead
it must enabled by using the LOCAL_CONNWAIT socket option on either the
connecting or accepting socket.


# 1.85 29-Nov-2003 perry

Revert a change that altered the semantics of AF_LOCAL sockets. Sadly
this made us API incompatible with other Unixes.


# 1.84 13-Nov-2003 chs

eliminate uvm_useracc() in favor of checking the return value of
copyin() or copyout().

uvm_useracc() tells us whether the mapping permissions allow access to
the desired part of an address space, and many callers assume that
this is the same as knowing whether an attempt to access that part of
the address space will succeed. however, access to user space can
fail for reasons other than insufficient permission, most notably that
paging in any non-resident data can fail due to i/o errors. most of
the callers of uvm_useracc() make the above incorrect assumption. the
rest are all misguided optimizations, which optimize for the case
where an operation will fail. we'd rather optimize for operations
succeeding, in which case we should just attempt the access and handle
failures due to insufficient permissions the same way we handle i/o
errors. since there appear to be no good uses of uvm_useracc(), we'll
just remove it.


# 1.83 04-Sep-2003 matt

Adapt to the new calling conventions of unp_connect2


# 1.82 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.81 29-Jun-2003 fvdl

branches: 1.81.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.


# 1.80 28-Jun-2003 darrenr

Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V


# 1.79 05-Apr-2003 christos

PR/21030: Naoto Shimazaki: fcntl to accepted socket does not work properly


# 1.78 26-Feb-2003 matt

Remove leftover MBUFTRACE asserts.


# 1.77 26-Feb-2003 drochner

deactivate MBUFTRACE related KASSERT()s in the !MBUFTRACE case


# 1.76 26-Feb-2003 matt

Add MBUFTRACE kernel option.
Do a little mbuf rework while here. Change all uses of MGET*(*, M_WAIT, *)
to m_get*(M_WAIT, *). These are not performance critical and making them
call m_get saves considerable space. Add m_clget analogue of MCLGET and
make corresponding change for M_WAIT uses.
Modify netinet, gem, fxp, tulip, nfs to support MBUFTRACE.
Begin to change netstat to use sysctl.


# 1.75 18-Jan-2003 thorpej

Merge the nathanw_sa branch.


Revision tags: nathanw_sa_before_merge fvdl_fs64_base gmcgarry_ctxsw_base gmcgarry_ucred_base nathanw_sa_base
# 1.74 26-Nov-2002 christos

si_ -> sel_ to avoid conflicts with siginfo.


# 1.73 25-Nov-2002 itojun

no need for error check after MEXTMALLOC - jdolecek


# 1.72 25-Nov-2002 itojun

MEXTMALLOC() can fail even if M_WAITOK, if arg is too big for malloc().


Revision tags: kqueue-aftermerge
# 1.71 23-Oct-2002 jdolecek

merge kqueue branch into -current

kqueue provides a stateful and efficient event notification framework
currently supported events include socket, file, directory, fifo,
pipe, tty and device changes, and monitoring of processes and signals

kqueue is supported by all writable filesystems in NetBSD tree
(with exception of Coda) and all device drivers supporting poll(2)

based on work done by Jonathan Lemon for FreeBSD
initial NetBSD port done by Luke Mewburn and Jason Thorpe


Revision tags: kqueue-beforemerge kqueue-base
# 1.70 04-Sep-2002 matt

Use the queue macros from <sys/queue.h> instead of referring to the queue
members directly. Use *_FOREACH whenever possible.


Revision tags: gehenna-devsw-base
# 1.69 31-May-2002 itojun

support setsockopt() with larger data (up to MCLBYTES).
From: Hitoshi Asaeda <Hitoshi.Asaeda@sophia.inria.fr>


Revision tags: netbsd-1-6-PATCH002-RELEASE netbsd-1-6-PATCH002 netbsd-1-6-PATCH002-RC4 netbsd-1-6-PATCH002-RC3 netbsd-1-6-PATCH002-RC2 netbsd-1-6-PATCH002-RC1 netbsd-1-6-PATCH001 netbsd-1-6-PATCH001-RELEASE netbsd-1-6-PATCH001-RC3 netbsd-1-6-PATCH001-RC2 netbsd-1-6-PATCH001-RC1 netbsd-1-6-RELEASE netbsd-1-6-RC3 netbsd-1-6-RC2 netbsd-1-6-RC1 netbsd-1-6-base eeh-devprop-base newlock-base ifpoll-base
# 1.68 11-Feb-2002 jdolecek

branches: 1.68.8;
Switch default for pipes to the faster John S. Dyson's implementation.
Old, socketpair-based ones are available with option PIPE_SOCKETPAIR.


Revision tags: thorpej-mips-cache-base
# 1.67 12-Nov-2001 lukem

add RCSIDs


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2
# 1.66 16-Sep-2001 wiz

branches: 1.66.2;
Spell 'occurred' with two 'r's.


Revision tags: post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.65 17-Jul-2001 jdolecek

branches: 1.65.2;
Expel MSG_COMPAT/COMPAT_OLDSOCK, make the COMPAT_43 wrappers
arrange things as needed. Unfortunately, the check in sockargs()
have to stay, since 4.3BSD bind(2), connect(2) and sendto(2) were
not versioned at the time :(

This code was tested to pass regression tests.


# 1.64 01-Jul-2001 matt

branches: 1.64.2;
Use consistent types for len. Limit sockarg length to reasonable values.


# 1.63 25-Jun-2001 jdolecek

Back off the sendit()/recvit() change, some have problems with it


# 1.62 25-Jun-2001 jdolecek

sys_connect(): fix the call to FILE_UNUSE() so that it's done on return, rather
than immediatelly after getsock() call


# 1.61 25-Jun-2001 jdolecek

Add 'kernsa' parameter for sendit()/recvit(); if nonzero, msg->msg_name
is supposed to point directly to struct mbuf or struct sockaddr in kernel
space as appropriate, rather than being a pointer to memory in userland.

This is to be used by compat/* when emulation needs to wrap
send{to|msg}(2)/recv{from|msg}(2) and modify the passed struct
sockaddr.


# 1.60 16-Jun-2001 jdolecek

Add port of high performance pipe implementation written by John S. Dyson
for FreeBSD project. Besides huge speed boost compared with socketpair-based
pipes, this implementation also uses pagable kernel memory instead of mbufs.

Significant differences to FreeBSD version:
* uses uvm_loan() facility for direct write
* async/SIGIO handling correct also for sync writer, async reader
* limits settable via sysctl, amountpipekva and nbigpipes available via sysctl
* pipes are unidirectional - this is enforced on file descriptor level
for now only, the code would be updated to take advantage of it
eventually
* uses lockmgr(9)-based locks instead of home brew variant
* scatter-gather write is handled correctly for direct write case, data
is transferred by PIPE_DIRECT_CHUNK bytes maximum, to avoid running out of kva

All FreeBSD/NetBSD specific code is within appropriate #ifdef, in preparation
to feed changes back to FreeBSD tree.

This pipe implementation is optional for now, add 'options NEW_PIPE'
to your kernel config to use it.


# 1.59 14-Jun-2001 thorpej

Fix a partial construction problem that can cause race conditions
between creation of a file descriptor and close(2) when using kernel
assisted threads. What we do is stick descriptors in the table, but
mark them as "larval". This causes essentially everything to treat
it as a non-existent descriptor, except for fdalloc(), which sees a
filled slot so that it won't (incorrectly) allocate it again. When
a descriptor is fully constructed, the code that has constructed it
marks it as "mature" (which actually clears the "larval" flag), and
things continue to work as normal.

While here, gather all the code that gets a descriptor from the table
into a fd_getfile() function, and call it, rather than having the
same (sometimes incorrect) code copied all over the place.


# 1.58 06-May-2001 manu

implement the recently introduced EMUL_BSD_ASYNCIO_PIPE emulation package
flag.

EMUL_BSD_ASYNCIO_PIPE notes that the emulated binaries expect the original
BSD pipe behavior for asynchronous I/O, which is to fire SIGIO on read() and
write(). OSes without this flag do not expect any SIGIO to be fired on
read() and write() for pipes, even when async I/O was requested. As far as
we know, the OSes that need EMUL_BSD_ASYNCIO_PIPE are NetBSD, OSF/1 and
Darwin.


Revision tags: thorpej_scsipi_beforemerge thorpej_scsipi_nbase thorpej_scsipi_base
# 1.57 27-Feb-2001 lukem

branches: 1.57.2;
convert to ANSI KNF


# 1.56 10-Dec-2000 fvdl

Make sobind() take a struct proc *. It already took curproc and
passed it down to the appropriate usrreq function, and this
allows usage for contexts that need to be explicitly different
from curproc (like in the NFS code when binding to a reserved port).


# 1.55 24-Nov-2000 jdolecek

define COMPAT_OLDSOCK unconditionally - the code is used virtually for all
emulations besides NetBSD, and this way it's LKM-safe


# 1.54 02-Aug-2000 thorpej

MALLOC()/FREE() are not to be used for variable sized allocations.


# 1.53 27-Jun-2000 mrg

remove include of <vm/vm.h>


Revision tags: netbsd-1-5-RELEASE netbsd-1-5-BETA2 netbsd-1-5-BETA netbsd-1-5-ALPHA2 netbsd-1-5-base minoura-xpg4dl-base
# 1.52 27-May-2000 sommerfeld

branches: 1.52.4;
Reduce use of curproc in several places:

- Change ktrace interface to pass in the current process, rather than
p->p_tracep, since the various ktr* function need curproc anyway.

- Add curproc as a parameter to mi_switch() since all callers had it
handy anyway.

- Add a second proc argument for inferior() since callers all had
curproc handy.

Also, miscellaneous cleanups in ktrace:

- ktrace now always uses file-based, rather than vnode-based I/O
(simplifies, increases type safety); eliminate KTRFLAG_FD & KTRFAC_FD.
Do non-blocking I/O, and yield a finite number of times when receiving
EWOULDBLOCK before giving up.

- move code duplicated between sys_fktrace and sys_ktrace into ktrace_common.

- simplify interface to ktrwrite()


# 1.51 30-Mar-2000 augustss

Get rid of register declarations.


# 1.50 23-Mar-2000 thorpej

Implement fdremove() which is used in place of all the code that
did the "fdp->fd_ofiles[fd] = 0" assignment; fdremove() make sure
the fd_freefiles hints stay in sync.

From OpenBSD.


Revision tags: chs-ubc2-newbase wrstuden-devbsize-19991221 wrstuden-devbsize-base comdex-fall-1999-base fvdl-softdep-base
# 1.49 05-Nov-1999 mycroft

branches: 1.49.2;
Fix recent bug in sys_accept(): we must remove the file descriptor from the
file descriptor table before freeing the file description.


# 1.48 30-Oct-1999 enami

back out unnecessary stylistic changes in recent changes, to keep coding
style closer to NKF.


# 1.47 27-Oct-1999 jdolecek

minor cleanup of previous - avoid goto and code duplication


# 1.46 27-Oct-1999 darrenr

patch from Greg A. Woods to fix panic problems with code that attempts to
recover from failures to accept a socket successfully. Problem suggested
by this:
> It would appear (from two "panic: closef: count < 0" failures in less
> than 12 hours) that Darren's fix to accept(2) for lost file descriptors
> isn't quite correct. His fix inserts a call to closef() to handle one
> of several possible error conditions. However everywhere else in the
> socket code in the same file where falloc() cleanup is necessary the
> function used is ffree().


Revision tags: chs-ubc2-base
# 1.45 01-Jul-1999 itojun

branches: 1.45.2; 1.45.4; 1.45.6;
IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.


# 1.44 01-Jul-1999 darrenr

fix sys_accept() to return EOPNOTSUPP for protocols which don't support
listen/accept (PR_LISTEN flag in protosw) and detect obvious faults in
parameters passed. It is still possible for the address used for copying
the socket information to become invalid between that check and the copyout
so close the connection's allocated fd if the copyout fails so that we can
return EFAULT without allocating an fd and the application not knowing about
it. Ideally we'd be able to queue the connection back up so a later accept
could retrieve it but unfortunately that's not possible.


# 1.43 05-May-1999 thorpej

Add "use counting" to file entries. When closing a file, and it's reference
count is 0, wait for use count to drain before finishing the close.

This is necessary in order for multiple processes to safely share file
descriptor tables.


# 1.42 30-Apr-1999 cgd

add checks for COMPAT_OSF1 in the appropriate places


Revision tags: netbsd-1-4-RELEASE netbsd-1-4-base
# 1.41 10-Feb-1999 kleink

branches: 1.41.2; 1.41.4; 1.41.6;
* Due to addition and use of socklen_t, make the socket option and address
arguments passed to accept(), bind(), connect(), getpeername(), getsockname(),
getsockopt(), recvfrom(), sendto() and sendmsg() unsigned, which also elimiates
a few casts.
* Reflect the (now) signedness of msg_iovlen, which necessiates the addition
of a few casts.


# 1.40 18-Dec-1998 drochner

solve the COMPAT_OLDSOCK/MSG_COMPAT problem differently:
The source files which need MSG_COMPAT define COMPAT_OLDSOCK.


Revision tags: kenh-if-detach-base
# 1.39 26-Nov-1998 mycroft

Revert the functional change in rev 1.38; permit a msg_iovlen of 0.
There are two reasons for this:
* We should be able to pass file descriptors without sending any data.
* We could send zero-length iovecs anyway (but we shouldn't have to do this).
Also, msg_iovlen is already a u_int, so delete a bunch of casts.


Revision tags: chs-ubc-base
# 1.38 04-Aug-1998 kleink

Per XNS Issue 5, calling recvmsg(2) or sendmsg(2) with an msg.msg_iovlen less
than or equal to 0 shall fail with EMSGSIZE; the latter condition was not being
checked for. Also, document the msg.msg_iovlen > {IOV_MAX} case.


# 1.37 04-Aug-1998 kleink

UIO_MAXIOV -> IOV_MAX


# 1.36 04-Aug-1998 perry

Abolition of bcopy, ovbcopy, bcmp, and bzero, phase one.
bcopy(x, y, z) -> memcpy(y, x, z)
ovbcopy(x, y, z) -> memmove(y, x, z)
bcmp(x, y, z) -> memcmp(x, y, z)
bzero(x, y) -> memset(x, 0, y)


# 1.35 03-Aug-1998 kleink

Fix two off-by-one bugs, both present in each recvmsg(2) and sendmsg(2):
* the first one would cause an unnecessary malloc() of iovec storage for
a msg_iovlen of UIO_SMALLIOV although the required amount of memory has
been allocated on the stack.
* the second one would cause a recvmsg() or sendmsg() with a msg_iovlen of
UIO_MAXIOV to fail with EMSGSIZE, which is also a violation of XNS5.


# 1.34 31-Jul-1998 perry

fix sizeofs so they comply with the KNF style guide. yes, it is pedantic.


Revision tags: eeh-paddr_t-base
# 1.33 29-Jul-1998 thorpej

branches: 1.33.2;
When checking for overflow in the residual count, test against SSIZE_MAX.
The read/write system calls return ssize_t because -1 is used to indicate
error, therefore the transfer size MUST be limited to SSIZE_MAX, otherwise
garbage can be returned to the user.

There is NO change from existing behavior here, only a more precise
definition of that the semantics are, except in the Alpha case, where
the full SSIZE_MAX transfer size can now be realized (ssize_t is 64-bit
on the Alpha).


# 1.32 18-Jul-1998 lukem

use AF_LOCAL instead of AF_UNIX


# 1.31 25-Jun-1998 thorpej

defopt KTRACE


# 1.30 25-Apr-1998 matt

Hook for 0-copy (or other optimized) sends and receives


# 1.29 01-Mar-1998 fvdl

Merge with Lite2 + local changes


# 1.28 06-Feb-1998 thorpej

When copying out multiple control messages, ensure that the next control
message is aligned. From David Borman <dab@bsdi.com>.


# 1.27 07-Jan-1998 thorpej

Make insertion and removal of sockets from the partial and incoming
connections queues O(C) rather than O(N).


# 1.26 07-Jan-1998 thorpej

Fix bug in recvit() that would cause recvmsg() to only receive one
control message, even if there were multiple control messages on
the queue. From Jean-Luc Richier <Jean-Luc.Richier@imag.fr>, in
bug report kern/4700.


Revision tags: netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base thorpej-signal-base marc-pcmcia-bp marc-pcmcia-base
# 1.25 26-Jun-1997 thorpej

branches: 1.25.8;
Use UCHAR_MAX rather than "255" when sanity-checking the length of a
sockaddr in sockargs().


# 1.24 26-Jun-1997 thorpej

In sockargs():
- Add a comment describing my feelings about this interface, in general.
- Remove the COMPAT_OLDSOCK length hack. Instead, if the socket argument
is too long to fit in an mbuf, allocate enough external storage to
hold it.
- If the socket argument is a sockaddr, don't allow the length to be
greater than 255, as that would overflow sa_len.
Many thanks to enami tsugutomo <enami@cv.sony.co.jp> for his sanity checking.


Revision tags: is-newarp-before-merge is-newarp-base
# 1.23 22-Dec-1996 cgd

* catch up with system call argument type fixups/const poisoning.
* Fix arguments to various copyin()/copyout() invocations, to avoid
gratuitous casts.
* Some KNF formatting fixes
* Change sockargs()'s second argument to be a const void *, to help
with dealing with the syscall argument type fixups/const poisoning.


# 1.22 14-Jun-1996 cgd

avoid unnecessary checks of m_get/MGET/etc.'s return values. When
they're called with M_WAIT, they are defined to never return NULL.


# 1.21 22-May-1996 mycroft

Pass a proc pointer down to the usrreq and pcbbind functions for PRU_ATTACH, PRU_BIND and
PRU_CONTROL. The usrreq interface really needs to be split up, but this will have to wait.
Remove SS_PRIV completely.


Revision tags: netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.20 17-May-1996 pk

branches: 1.20.4;
Don't touch retval[] in socketpair(); manual page says this system call
return 0 on success (PR#2428).


# 1.19 09-Feb-1996 christos

More proto fixes


# 1.18 04-Feb-1996 christos

First pass at prototyping


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.17 10-Oct-1995 mycroft

Add hooks for COMPAT_FREEBSD, from Noriyuki Soda.


# 1.16 07-Oct-1995 mycroft

Prefix names of system call implementation functions with `sys_'.


# 1.15 19-Sep-1995 thorpej

Make system calls conform to a standard prototype and bring those
prototypes into scope.


# 1.14 12-Aug-1995 mycroft

splnet --> splsoftnet


# 1.13 24-Jun-1995 christos

Extracted all of the compat_xxx routines, and created a library [libcompat]
for them. There are a few #ifdef COMPAT_XX remaining, but they are not easy
or worth eliminating (yet).


# 1.12 10-May-1995 christos

tty_tb.c: need to include ioctl_compat.h in order to compile.
sysv_shm.c: make shm_find_segment_by_shmid global so it can be used by
COMPAT_HPUX. There should be a better way...
rest: Add #ifdef COMPAT_HPUX where needed


# 1.11 05-Mar-1995 fvdl

Extended a couple of defines with "|| defined(COMPAT_LINUX)" to make
things compile without requiring COMPAT_43 and/or COMPAT_09.


# 1.10 30-Oct-1994 cgd

be more careful with types, also pull in headers where necessary.


# 1.9 20-Oct-1994 cgd

update for new syscall args description mechanism


Revision tags: netbsd-1-0-PATCH06 netbsd-1-0-PATCH05 netbsd-1-0-PATCH04 netbsd-1-0-PATCH03 netbsd-1-0-PATCH02 netbsd-1-0-PATCH1 netbsd-1-0-PATCH0 netbsd-1-0-RELEASE netbsd-1-0-base
# 1.8 29-Jun-1994 cgd

New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.7 04-May-1994 mycroft

Add return types where missing. Simplify some of the compat conditionals.
Include compat code if COMPAT_SUNOS with or without COMPAT_43.


# 1.6 18-Dec-1993 mycroft

Canonicalize all #includes.


Revision tags: magnum-base netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.5 17-Jul-1993 mycroft

branches: 1.5.4;
Finish moving struct definitions outside of function declarations.


# 1.4 27-Jun-1993 andrew

* ansifications
* Yuval Yarom's socket recv(2) fixes - access rights problems (see also
uipc_socket.c).


# 1.3 22-May-1993 cgd

add include of select.h if necessary for protos, or delete if extraneous


# 1.2 18-May-1993 cgd

make kernel select interface be one-stop shopping & clean it all up.


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


# 1.203 27-Jun-2022 riastradh

sendmsg(2): Avoid buffer overrun in ktrace of invalid cmsghdr.

Reported-by: syzbot+efded148140b23425f5c@syzkaller.appspotmail.com


# 1.202 02-Oct-2021 thorpej

...and correct my terrible spelling.


# 1.201 02-Oct-2021 thorpej

- Strenghen the poll(2) fifo_inout test to ensure that once the reader
has read enough that exactly PIPE_BUF space is available that the FIFO
becomes writable again.
- When creating a FIFO, ensure that the receive low water mark is 1
(a FIFO must be readable when at least 1 byte is available); this
was already the case implicitly, but this makes it explicit.
- Similarly, set the send low water mark to PIPE_BUF to ensure that
the pipe is writable when at least PIPE_BUF bytes of space are available
in the send buffer. Without this change, the strengthened test case
above does not pass (the default send low water mark is larger than
PIPE_BUF; see soreserve()).
- Make the same low water mark changes to the PIPE_SOCKETPAIR case.


Revision tags: thorpej-i2c-spi-conf2-base thorpej-futex2-base thorpej-cfargs2-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base thorpej-i2c-spi-conf-base thorpej-cfargs-base thorpej-futex-base
# 1.200 23-May-2020 ad

Move proc_lock into the data segment. It was dynamically allocated because
at the time we had mutex_obj_alloc() but not __cacheline_aligned.


Revision tags: netbsd-9-2-RELEASE netbsd-9-1-RELEASE bouyer-xenpvh-base2 phil-wifi-20200421 bouyer-xenpvh-base1 phil-wifi-20200411 bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3 netbsd-9-0-RELEASE netbsd-9-0-RC2 ad-namecache-base2 ad-namecache-base1 ad-namecache-base netbsd-9-0-RC1 phil-wifi-20191119 netbsd-9-base phil-wifi-20190609 isaki-audio2-base pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126
# 1.199 12-Nov-2018 hannken

branches: 1.199.4;
sys_recvmmsg: don't defer an error that already gets returned.


# 1.198 07-Nov-2018 hannken

Don't defer errors from sendmmsg(). This matches the linux manpage.

Defer errors from recvmmsg() through so_rerror and
tests and return a deferred error on entry.

Ok: christos@


Revision tags: pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.197 03-Sep-2018 riastradh

Rename min/max -> uimin/uimax for better honesty.

These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER! Some subsystems have

#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)


# 1.196 01-Aug-2018 rjs

Add ioctl(2) handler for kernel part of sctp_peeloff().


# 1.195 31-Jul-2018 rjs

Add getsockopt2() syscall.


Revision tags: pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521
# 1.194 04-May-2018 christos

branches: 1.194.2;
define MBUFTYPES here.


# 1.193 03-May-2018 christos

Fix COMPAT_NETBSD32 cmsg handling:

1. alignment was wrong for > 1 message
2. macros were doing incorrect pointer comparisons, fortunately ending
the iteration early after the fists cmsg instead of crashing.
3. don't output 32 bit ktrace records for cmsg. 32 bit programs running
under emulation on 64 bit systems should produce 64 bit ktrace records
so that the native ktrace can handle the records; remove extra arguments
that are now not needed (the 32 bit msghdr).
4. output the correct type for cmsg trace records.
5. output all the cmsg records in traces instead of just the first one.

Welcome to 8.99.15 because of the argument removal.

XXX: Really all the code should be changed to use the CMSG_{FIRST,NXT}HDR
macros...


Revision tags: pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322
# 1.192 16-Mar-2018 christos

PR/53103: Timo Buhrmester: linux emulation of sendto(2) broken

The sockargs refactoring broke it, because sockargs only works with a user
address. Added an argument to sockargs to indicate where the address is
coming from. Welcome to 8.99.14.


Revision tags: pgoyette-compat-0315 pgoyette-compat-base
# 1.191 12-Feb-2018 maxv

branches: 1.191.2;
Add a KASSERT; we expect *from to be a single mbuf (not chained).


# 1.190 04-Jan-2018 christos

Add a new sockopt member to keep track of the actual size of the option
that should be returned to the caller in getsockopt(2).

(Tom Ivar Helbekkmo)


# 1.189 31-Dec-2017 christos

pass valsize for getsockopt like we do for setsockopt


# 1.188 26-Dec-2017 kamil

Refactor pipe1() and correct a bug in sys_pipe2() (SYS_pipe2)

sys_pipe2() returns two integers (values), the 2nd one is a copy of the 2nd
file descriptor that lands in fildes[2]. This is a side effect of reusing
the code for sys_pipe() (SYS_pipe) and not cleaning it up.

The first returned value is (on success) 0.

Introduced a small refactoring in pipe1() that it does not operate over
retval[], but on an array int[2]. A user sets retval[] for pipe() when
desired and needed.

This refactoring touches compat code: netbsd32, linux, linux32.

Before the changes on NetBSD/amd64:

$ ktruss -i ./a.out
[...]
15131 1 a.out pipe2(0x7f7fff2e62b8, 0) = 0, 4
[...]

After the changes:

$ ktruss -i ./a.out
[...]
782 1 a.out pipe2(0x7f7fff97e850, 0) = 0
[...]

There should not be a visible change for current users.

Sponsored by <The NetBSD Foundation>


Revision tags: tls-maxphys-base-20171202 nick-nhusb-base-20170825 perseant-stdc-iso10646-base
# 1.187 20-Jun-2017 christos

Change len type to be unsigned int for consistency with the input type.
Don't check for negative; it does not matter we clamp anyway. This
broke the compat32 getsockname() where an unitialized socklen_t ended
up randomly negative causing it to fail.


Revision tags: matt-nb8-mediatek-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204
# 1.186 03-Feb-2017 christos

branches: 1.186.6;
expose sendmsg_so and recvmsg_so.


# 1.185 02-Feb-2017 christos

expose copyout_sockname_sb


Revision tags: bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204
# 1.184 03-Dec-2016 christos

branches: 1.184.2;
Add missing ktrkuser


Revision tags: pgoyette-localcount-20161104 nick-nhusb-base-20161004 localcount-20160914
# 1.183 13-Sep-2016 martin

Make the ktrace record written by do_sys_sendmsg/do_sys_recvmsg overridable
by the caller. Use this in compat_netbsd32 to log the 32bit version, so
the 32bit userland kdump is happy.


Revision tags: pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.182 07-Jul-2016 msaitoh

branches: 1.182.2;
KNF. Remove extra spaces. No functional change.


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.181 01-Nov-2015 christos

Don't overwrite the user iov pointer in sendmmsg. Make the send and receive
code look the same.


Revision tags: nick-nhusb-base-20150921
# 1.180 24-Aug-2015 pooka

sprinkle _KERNEL_OPT


# 1.179 22-Jul-2015 maxv

Memory leak. Triggerable from an unprivileged user via COMPAT_43.


Revision tags: nick-nhusb-base-20150606
# 1.178 09-May-2015 rtr

change sosend() to accept sockaddr * instead of mbuf * for nam.

bump to 7.99.16


# 1.177 02-May-2015 rtr

make connect syscall use sockaddr_big and modify pr_{send,connect}
nam parameter type from buf * to sockaddr *.

final commit for parameter type changes to protocol user requests

* bump kernel version to 7.99.15 for parameter type changes to pr_{send,connect}


# 1.176 24-Apr-2015 rtr

make accept, getsockname and getpeername syscalls use sockaddr_big and modify
pr_{accept,sockname,peername} nam parameter type from mbuf * to sockaddr *.

* retained use of mbuftypes[MT_SONAME] for now.
* bump to netbsd version 7.99.12 for parameter type change.

patch posted to tech-net@ 2015/04/19


Revision tags: nick-nhusb-base-20150406
# 1.175 03-Apr-2015 rtr

* change pr_bind to accept struct sockaddr * instead of struct mbuf *
* update protocol bind implementations to use/expect sockaddr *
instead of mbuf *
* introduce sockaddr_big struct for storage of addr data passed via
sys_bind; sockaddr_big is of sufficient size and alignment to
accommodate all addr data sizes received.
* modify sys_bind to allocate sockaddr_big instead of using an mbuf.
* bump kernel version to 7.99.9 for change to pr_bind() parameter type.

Patch posted to tech-net@
http://mail-index.netbsd.org/tech-net/2015/03/15/msg005004.html

The choice to use a new structure sockaddr_big has been retained since
changing sockaddr_storage size would lead to unnecessary ABI change. The
use of the new structure does not preclude future work that increases
the size of sockaddr_storage and at that time sockaddr_big may be
trivially replaced.

Tested by mrg@ and myself, discussed with rmind@, posted to tech-net@


# 1.174 06-Mar-2015 rtr

Return EINVAL if namelen isn't large enough to encompass the expected
members of sockaddr structures. i.e. sa_len and sa_family.

Discussed with and patch by christos@


Revision tags: nick-nhusb-base
# 1.173 05-Sep-2014 matt

branches: 1.173.2;
Try not to use f_data, use f_{vnode,socket,pipe,mqueue,kqueue,ksem} to get
a correctly typed pointer.


Revision tags: netbsd-7-0-RC2 netbsd-7-0-RC1 netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.172 09-Aug-2014 rtr

branches: 1.172.2;
split PRU_CONNECT2 & PRU_PURGEIF function out of pr_generic() usrreq
switches and put into separate functions

- always KASSERT(solocked(so)) even if not implemented
(for PRU_CONNECT2 only)

- replace calls to pr_generic() with req = PRU_CONNECT2 with calls to
pr_connect2()

- replace calls to pr_generic() with req = PRU_PURGEIF with calls to
pr_purgeif()

put common code from unp_connect2() (used by unp_connect() into
unp_connect1() and call out to it when needed

patch only briefly reviewed by rmind@


# 1.171 09-Jul-2014 rtr

* split PRU_PEERADDR and PRU_SOCKADDR function out of pr_generic()
usrreq switches and put into separate functions
xxx_{peer,sock}addr(struct socket *, struct mbuf *).

- KASSERT(solocked(so)) always in new functions even if request
is not implemented

- KASSERT(pcb != NULL) and KASSERT(nam) if the request is
implemented and not for tcp.

* for tcp roll #ifdef KPROF and #ifdef DEBUG code from tcp_usrreq() into
easier to cut & paste functions tcp_debug_capture() and
tcp_debug_trace()

- functions provided by rmind
- remaining use of PRU_{PEER,SOCK}ADDR #define to be removed in a
future commit.

* rename netbt functions to permit consistency of pru function names
(as has been done with other requests already split out).

- l2cap_{peer,sock}addr() -> l2cap_{peer,sock}_addr_pcb()
- rfcomm_{peer,sock}addr() -> rfcomm_{peer,sock}_addr_pcb()
- sco_{peer,sock}addr() -> sco_{peer,sock}_addr_pcb()

* split/refactor do_sys_getsockname(lwp, fd, which, nam) into
two functions do_sys_get{peer,sock}name(fd, nam).

- move PRU_PEERADDR handling into do_sys_getpeername() from
do_sys_getsockname()
- have svr4_stream directly call do_sys_get{sock,peer}name()
respectively instead of providing `which' & fix a DPRINTF string
that incorrectly wrote "getpeername" when it meant "getsockname"
- fix sys_getpeername() and sys_getsockname() to call
do_sys_get{sock,peer}name() without `which' and `lwp' & adjust
comments
- bump kernel version for removal of lwp & which parameters from
do_sys_getsockname()

note: future cleanup to remove struct mbuf * abuse in
xxx_{peer,sock}name()
still to come, not done in this commit since it is easier to do post
split.

patch reviewed by rmind

welcome to 6.99.47


Revision tags: rmind-smpnet-nbase rmind-smpnet-base
# 1.170 18-May-2014 rmind

Add struct pr_usrreqs with a pr_generic function and prepare for the
dismantling of pr_usrreq in the protocols; no functional change intended.
PRU_ATTACH/PRU_DETACH changes will follow soon.

Bump for struct protosw. Welcome to 6.99.62!


# 1.169 17-May-2014 rmind

- fsocreate: set SS_NBIO before the file descriptor is affixed as there is
a theoretical race condition (hard to trigger, though); remove the LWP
parameter and clean up the code a little.
- Sprinkle few comments.
- Remove M_SOOPTS while here.


# 1.168 17-May-2014 rmind

makesocket: set SS_NBIO slightly earlier.


# 1.167 17-May-2014 rmind

Remove trailing whitespaces, wrap long lines, minor KNF; no functional changes.


Revision tags: yamt-pagecache-base9
# 1.166 07-Apr-2014 seanb

Fix a case where an erroneous EAGAIN was returned out of recvmmsg.
This occured when some, but not all of the mmsg array members
were filled with data from a non-blocking socket.
PR kern/48725


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.165 09-Oct-2013 christos

branches: 1.165.2;
delete extra m_len initialization.


# 1.164 09-Oct-2013 christos

PR/48292: Justin Cormack: paccept creates sockets that cannot be made blocking
Reset the socket flags not just the file flags for non-blocking I/O.
XXX: pullup 6


# 1.163 08-Oct-2013 christos

PR/47591: Michael Plass: If the unix socket is closed before accept,
unp->unp_conn will be NULL in PRU_ACCEPT, as called from
sys_accept->so_accept. This will cause the usrreq to return with
no error, leaving the mbuf gotten from m_get() with an uninitialized
length, containing junk from a previous call. Initialize m_len to
be 0 to handle this case. This is yet another reason why Beverly's
idea of setting m_len = 0 in m_get() makes a lot of sense. Arguably
this could be an error, since the data we return now has 0 family
and length.


# 1.162 29-Aug-2013 rmind

Remove SS_ISCONFIRMING, it is unused and TP4 will not come back.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.161 03-Jun-2013 christos

branches: 1.161.2;
use the proper name for kdump pretty-printing.


Revision tags: agc-symver-base
# 1.160 14-Feb-2013 christos

PR/47569: Valery Ushakov: SOCK_NONBLOCK does not work because it does not
set SS_NBIO.
XXX: there are too many flags that mean the same thing in too many places,
and too many flags that mean the same thing and are different.


# 1.159 14-Feb-2013 riastradh

Fix some screw cases in cmsg file descriptor passing.

- Don't leave garbage in the control buffer if allocating file
descriptors fails in unp_externalize.

- Scrub the space between CMSG_LEN and CMSG_SPACE to avoid kernel
memory disclosure in unp_externalize.

- Don't read past cmsg_len when closing file descriptors that
couldn't get delivered, in free_rights.

ok christos


Revision tags: yamt-pagecache-base8
# 1.158 29-Dec-2012 mlelstv

The sanity check prevented messages that carry only ancillary data.


# 1.157 29-Dec-2012 mlelstv

If an untraced process sleeps in recvmsg/sendmsg, the syscall does not
allocate an iov structure for ktrace. When tracing is then enabled
and the process wakes up, it crashes the kernel.

Undo the last commit which introduced this error path.

Avoid the mentioned kmem_alloc assertion by adding a sanity check analog
to similar code in sys_generic.c for I/O on file handles instead of
sockets.

This also causes the syscall to return EMSGSIZE if the msg_iovlen member
of the msg structure is less than or equal to 0, as defined in
recvmsg(2)/sendmsg(2).


Revision tags: yamt-pagecache-base7 yamt-pagecache-base6
# 1.156 17-Jul-2012 njoly

branches: 1.156.2;
Avoid kmem_alloc KASSERT for 0 byte allocation, when tracing processes
that use empty messages with sendmsg/recvmsg.


# 1.155 22-Jun-2012 christos

Add {send,recv}mmsg from Linux


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2 netbsd-6-base
# 1.154 25-Jan-2012 christos

branches: 1.154.2;
revert atomics for so_options since it is a short.


# 1.153 25-Jan-2012 christos

need <sys/atomic.h>


# 1.152 25-Jan-2012 christos

Add locking, requested by yamt. Note that locking is not used everywhere
for these.


# 1.151 25-Jan-2012 christos

As discussed in tech-kern, provide the means to prevent delivery of SIGPIPE
on EPIPE for all file descriptor types:

- provide O_NOSIGPIPE for open,kqueue1,pipe2,dup3,fcntl(F_{G,S}ETFL) [NetBSD]
- provide SOCK_NOSIGPIPE for socket,socketpair [NetBSD]
- provide SO_NOSIGPIPE for {g,s}seckopt [NetBSD/FreeBSD/MacOSX]
- provide F_{G,S}ETNOSIGPIPE for fcntl [MacOSX]


# 1.150 21-Dec-2011 christos

simplify expression


# 1.149 20-Dec-2011 christos

- Eliminate so_nbio and turn it into a bit SS_NBIO in so_state.
- Introduce MSG_NBIO so that we can turn non blocking i/o on a per call basis
- Use MSG_NBIO to fix the XXX: multi-threaded issues on the fifo sockets.
- Don't set SO_CANTRCVMORE, if we were interrupted (perhaps do it for all
errors?).


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2
# 1.148 04-Nov-2011 christos

branches: 1.148.4;
Fix error I introduced in previous commit that caused asymmetric connects
when SOCK_NONBLOCK or SOCK_CLOEXEC was specified. Factor out common code
and simplify error return.


Revision tags: yamt-pagecache-base
# 1.147 21-Sep-2011 christos

branches: 1.147.2;
Put the mbuf type in the ktrace record so that we know how to decode it
in userland.


# 1.146 27-Jul-2011 uebayasi

These don't need uvm/uvm_extern.h.


# 1.145 15-Jul-2011 christos

fail with EINVAL if flags not are not O_CLOEXEC|O_NONBLOCK in pipe2(2) and
dup3(2)


# 1.144 26-Jun-2011 christos

* Arrange for interfaces that create new file descriptors to be able to
set close-on-exec on creation (http://udrepper.livejournal.com/20407.html).

- Add F_DUPFD_CLOEXEC to fcntl(2).
- Add MSG_CMSG_CLOEXEC to recvmsg(2) for unix file descriptor passing.
- Add dup3(2) syscall with a flags argument for O_CLOEXEC, O_NONBLOCK.
- Add pipe2(2) syscall with a flags argument for O_CLOEXEC, O_NONBLOCK.
- Add flags SOCK_CLOEXEC, SOCK_NONBLOCK to the socket type parameter
for socket(2) and socketpair(2).
- Add new paccept(2) syscall that takes an additional sigset_t to alter
the sigmask temporarily and a flags argument to set SOCK_CLOEXEC,
SOCK_NONBLOCK.
- Add new mode character 'e' to fopen(3) and popen(3) to open pipes
and file descriptors for close on exec.
- Add new kqueue1(2) syscall with a new flags argument to open the
kqueue file descriptor with O_CLOEXEC, O_NONBLOCK.

* Fix the system calls that take socklen_t arguments to actually do so.

* Don't include userland header files (signal.h) from system header files
(rump_syscallargs.h).

* Bump libc version for the new syscalls.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.143 24-Apr-2011 rmind

- Replace few malloc(9) uses with kmem(9).
- Rename buf_malloc() to buf_alloc(), fix comments.
- Remove some unnecessary inclusions.


# 1.142 10-Apr-2011 christos

- Add O_CLOEXEC to open(2)
- Add fd_set_exclose() to encapsulate uses of FIO{,N}CLEX, O_CLOEXEC, F{G,S}ETFD
- Add a pipe1() function to allow passing flags to the fd's that pipe(2)
opens to ease implementation of linux pipe2(2)
- Factor out fp handling code from open(2) and fhopen(2)


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base matt-mips64-premerge-20101231 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10 uebayasi-xip-base1
# 1.141 23-Apr-2010 rmind

branches: 1.141.2;
Replace M_IOV and some malloc(9)s with kmem(9), and while there:
- Fix invalid free (M_TEMP vs M_IOV) in do_sys_recvmsg(), spotted by jakllsch@.
Also, same fix in osf1_sys_sendmsg_xopen().
- Fix attempt to free non-allocated memory in error path in netbsd32___getfh30().
- Plug a memory leak in compat_43_netbsd32_orecvmsg().


Revision tags: yamt-nfs-mp-base9 uebayasi-xip-base
# 1.140 21-Jan-2010 pgoyette

branches: 1.140.2; 1.140.4;
Remove unnecessary call to kauth_cred_free().

This resolves an occassional crash I'd been experiencing as reported on
current-users@

Fix suggested by and OK elad@


# 1.139 29-Dec-2009 elad

Add credentials to to sockets.

We don't need any deferred free etc. because we no longer free the
credentials in interrupt context.

Tons of help from matt@, thanks!


# 1.138 20-Dec-2009 dsl

If a multithreaded app closes an fd while another thread is blocked in
read/write/accept, then the expectation is that the blocked thread will
exit and the close complete.
Since only one fd is affected, but many fd can refer to the same file,
the close code can only request the fs code unblock with ERESTART.
Fixed for pipes and sockets, ERESTART will only be generated after such
a close - so there should be no change for other programs.
Also rename fo_abort() to fo_restart() (this used to be fo_drain()).
Fixes PR/26567


Revision tags: matt-premerge-20091211
# 1.137 09-Dec-2009 dsl

Rename fo_drain() to fo_abort(), 'drain' is used to mean 'wait for output
do drain' in many places, whereas fo_drain() was called in order to force
blocking read()/write() etc calls to return to userspace so that a close()
call from a different thread can complete.
In the sockets code comment out the broken code in the inner function,
it was being called from compat code.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 jym-xensuspend-nbase yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.136 04-Apr-2009 ad

Add fileops::fo_drain(), to be called from fd_close() when there is more
than one active reference to a file descriptor. It should dislodge threads
sleeping while holding a reference to the descriptor. Implemented only for
sockets but should be extended to pipes, fifos, etc.

Fixes the case of a multithreaded process doing something like the
following, which would have hung until the process got a signal.

thr0 accept(fd, ...)
thr1 close(fd)


Revision tags: nick-hppapmap-base2
# 1.135 21-Jan-2009 yamt

branches: 1.135.2;
restore the pre socket locking patch signal behaviour.
this fixes a busy-loop in nfs_connect.


Revision tags: netbsd-5-0-RC1 haad-dm-base2 haad-nbase2 ad-audiomp2-base netbsd-5-base matt-mips64-base2 haad-dm-base1 wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 haad-dm-base mjf-devfs2-base
# 1.134 06-Aug-2008 plunky

branches: 1.134.2; 1.134.4;
Convert socket options code to use a sockopt structure
instead of laying everything into an mbuf.

approved by core


Revision tags: simonb-wapbl-nbase simonb-wapbl-base
# 1.133 24-Jun-2008 ad

branches: 1.133.2;
Nothing uses getsock/getvnode any more.


Revision tags: wrstuden-revivesa-base-1 yamt-pf42-base4 yamt-pf42-base3 wrstuden-revivesa-base
# 1.132 30-May-2008 rmind

branches: 1.132.2;
do_sys_accept: release the reference to sock in few error paths.
Should fix PR/38790, report and test-case by Nicolas Joly.


Revision tags: hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2
# 1.131 28-Apr-2008 martin

branches: 1.131.2;
Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.130 24-Apr-2008 ad

branches: 1.130.2;
Network protocol interrupts can now block on locks, so merge the globals
proclist_mutex and proclist_lock into a single adaptive mutex (proc_lock).
Implications:

- Inspecting process state requires thread context, so signals can no longer
be sent from a hardware interrupt handler. Signal activity must be
deferred to a soft interrupt or kthread.

- As the proc state locking is simplified, it's now safe to take exit()
and wait() out from under kernel_lock.

- The system spends less time at IPL_SCHED, and there is less lock activity.


# 1.129 24-Apr-2008 ad

Merge the socket locking patch:

- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.

With much feedback from matt@ and plunky@.


Revision tags: yamt-pf42-baseX yamt-pf42-base ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14
# 1.128 21-Mar-2008 ad

branches: 1.128.2;
Catch up with descriptor handling changes. See kern_descrip.c revision
1.173 for details.


Revision tags: keiichi-mipv6-nbase nick-net80211-sync-base keiichi-mipv6-base matt-armv6-nbase mjf-devfs-base hpcarm-cleanup-base
# 1.127 06-Feb-2008 ad

branches: 1.127.6;
Don't lock the socket to set/clear FNONBLOCK. Just set it atomically.


Revision tags: vmlocking2-base3 bouyer-xeni386-nbase bouyer-xeni386-base matt-armv6-base
# 1.126 26-Dec-2007 ad

Merge more changes from vmlocking2, mainly:

- Locking improvements.
- Use pool_cache for more items.


# 1.125 20-Dec-2007 dsl

Convert all the system call entry points from:
int foo(struct lwp *l, void *v, register_t *retval)
to:
int foo(struct lwp *l, const struct foo_args *uap, register_t *retval)
Fixup compat code to not write into 'uap' and (in some cases) to actually
pass a correctly formatted 'uap' structure with the right name to the
next routine.
A few 'compat' routines that just call standard ones have been deleted.
All the 'compat' code compiles (along with the kernels required to test
build it).
98% done by automated scripts.


# 1.124 16-Dec-2007 elad

Make solisten() take an lwp pointer like the rest, so it can be passed down
to pr_usrreq.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 vmlocking-nbase jmcneill-pm-base reinoud-bufcleanup-base
# 1.123 24-Nov-2007 dyoung

branches: 1.123.2; 1.123.6;
Pass the mbuf type (e.g., MT_SONAME, MT_SOOPTS) as the second
argument to getsockmbuf().


Revision tags: jmcneill-base bouyer-xenamd64-base2 yamt-x86pmap-base4 bouyer-xenamd64-base yamt-x86pmap-base3 yamt-x86pmap-base2 vmlocking-base
# 1.122 05-Oct-2007 dyoung

branches: 1.122.4;
Use getsombuf().


Revision tags: yamt-x86pmap-base
# 1.121 19-Sep-2007 christos

branches: 1.121.2;
minor nits; no code change.


# 1.120 19-Sep-2007 dyoung

1) Introduce a new socket option, (SOL_SOCKET, SO_NOHEADER), that
tells a socket that it should both add a protocol header to tx'd
datagrams and remove the header from rx'd datagrams:

int onoff = 1, s = socket(...);
setsockopt(s, SOL_SOCKET, SO_NOHEADER, &onoff);

2) Add an implementation of (SOL_SOCKET, SO_NOHEADER) for raw IPv4
sockets.

3) Reorganize the protocols' pr_ctloutput implementations a bit.
Consistently return ENOPROTOOPT when an option is unsupported,
and EINVAL if a supported option's arguments are incorrect.
Reorganize the flow of code so that it's more clear how/when
options are passed down the stack until they are handled.

Shorten some pr_ctloutput staircases for readability.

4) Extract common mbuf code into subroutines, add new sockaddr
methods, and introduce a new subroutine, fsocreate(), for reuse
later; use it first in sys_socket():

struct mbuf *m_getsombuf(struct socket *so)

Create an mbuf and make its owner the socket `so'.

struct mbuf *m_intopt(struct socket *so, int val)

Create an mbuf, make its owner the socket `so', put the
int `val' into it, and set its length to sizeof(int).


int fsocreate(..., int *fd)

Create a socket, a la socreate(9), put the socket into the
given LWP's descriptor table, return the descriptor at `fd'
on success.

void *sockaddr_addr(struct sockaddr *sa, socklen_t *slenp)
const void *sockaddr_const_addr(const struct sockaddr *sa, socklen_t *slenp)

Extract a pointer to the address part of a sockaddr. Write
the length of the address part at `slenp', if `slenp' is
not NULL.

socklen_t sockaddr_getlen(const struct sockaddr *sa)

Return the length of a sockaddr. This just evaluates to
sa->sa_len. I only add this for consistency with code that
appears in a portable userland library that I am going to
import.

const struct sockaddr *sockaddr_any(const struct sockaddr *sa)

Return the "don't care" sockaddr in the same family as
`sa'. This is the address a client should sobind(9) if it
does not care the source address and, if applicable, the
port et cetera that it uses.

const void *sockaddr_anyaddr(const struct sockaddr *sa, socklen_t *slenp)

Return the "don't care" sockaddr in the same family as
`sa'. This is the address a client should sobind(9) if it
does not care the source address and, if applicable, the
port et cetera that it uses.


Revision tags: nick-csl-alignment-base5
# 1.119 06-Sep-2007 rmind

do_sys_sendmsg: Plug a possible leak.
From CID: 4535


# 1.118 01-Sep-2007 dsl

Don't error calls to copy socket addresses to userspace when the application
has provided a non-null buffer pointer and a zero length.


# 1.117 27-Aug-2007 dsl

ktrace socket control structures (ie msghdr, address etc) using ktrkuser().


# 1.116 15-Aug-2007 ad

branches: 1.116.2;
Changes to make ktrace LKM friendly and reduce ifdef KTRACE. Proposed
on tech-kern.


Revision tags: matt-mips64-base nick-csl-alignment-base
# 1.115 15-Jul-2007 dsl

branches: 1.115.2; 1.115.6;
Remove non-user flags (especially MSG_IOVUSRSPACE) from mp->msg_flags
before passing to so_receive.
This may (or may not) have any effect...


Revision tags: mjf-ufs-trans-base
# 1.114 01-Jul-2007 dsl

Check for SOL_SOCKET when checking for SCM_RIGHTS.


# 1.113 24-Jun-2007 dsl

Split sys_getpeername() and sys_getsockname() so they can be called when the
'name' is wanted in kernel code.
Similarly split sys_accept() and change the split in recvmsg() so that it
is useful to the compat functions, recvit() is removed and replaced by
do_sys_recvmsg().
Factor out the code that writes socked names to userspace (from mbuf) to
avoid replicated code.
Extract the code that writes socket 'control' (CMSG) data out to userspace,
being more careful about the 'fd' that may exist inside SCM_RIGHTS msgs.
(they still get lost if some of the latter copyout calls fail).
Since these are new functions, and old LKMs will fail to load.


# 1.112 02-Jun-2007 enami

- Fix obvious typos so that sendto(2) works.
- Wrap lines again.


# 1.111 01-Jun-2007 dsl

Split sys_bind() and sys_connect() so that compat code can use common code
once the 'address' has been copied into an mbuf.
Add extra flags for 'struct msghdr.msg_flags' to indicate that the address
and control are already in mbufs, and that the uio structure is in userspace
for sending data, rename sendit() to do_sys_sendmsg() to ensure no old code
passes in random flags.
Changes to compat code to use new functions - removing some stackgap use.
Fix a 'use after free' in compat_43_sys_recvmsg.
I ***THINK*** the code that converts 'cmsg' formatted data is borked!
svr4_stream.c ought to be generated from svr4_32_stream.c during the build.


Revision tags: yamt-idlelwp-base8
# 1.110 13-May-2007 dsl

Fallout from caddr_t deletion - remove a load of redundant (void *) casts.


# 1.109 18-Apr-2007 yamt

sys_accept: fix usecount botch and double soclose in rev.1.108.


# 1.108 15-Apr-2007 yamt

sys_accept: don't leak a socket on error.


Revision tags: thorpej-atomic-base
# 1.107 04-Mar-2007 christos

branches: 1.107.2; 1.107.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


Revision tags: ad-audiomp-base post-newlock2-merge
# 1.106 09-Feb-2007 ad

branches: 1.106.2;
Merge newlock2 to head.


Revision tags: newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base netbsd-4-base
# 1.105 01-Nov-2006 yamt

branches: 1.105.2;
remove some __unused from function parameters.


# 1.104 23-Oct-2006 elad

PR/34873: Ryo Shimizu: sendmsg() can cause kernel panic

Patch applied, thanks!


Revision tags: yamt-splraiseipl-base2
# 1.103 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


Revision tags: yamt-splraiseipl-base yamt-pdpolicy-base9 yamt-pdpolicy-base8 rpaulo-netinet-merge-pcb-base
# 1.102 22-Aug-2006 seanb

branches: 1.102.2; 1.102.4;
Don't leave a dangling socket (no associated struct file) if
user supplied a bad name or anamelen parameter to accept(2).
If bad paramaters were suplied and a copyout() failed, the
struct file was cleaned up but not the associated socket. This
could leave sockets in CLOSE_WAIT that could never be closed.


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base7
# 1.101 23-Jul-2006 ad

branches: 1.101.2;
Use the LWP cached credentials where sane.


# 1.100 26-Jun-2006 mrg

version the socket(2) syscall. for compat30 socket, we use
EPROTONOSUPPORT instead of EAFNOSUPPORT.

from pavel@ with a little bit of clean up from myself.

XXX: netbsd32 (and perhaps other emulations) should be able
XXX: to call the standard socket calls for this i think, but
XXX: revisit this at another time.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base yamt-pdpolicy-base5 chap-midi-base simonb-timecounters-base
# 1.99 16-May-2006 christos

branches: 1.99.4;
Don't set mature an fd that has been ffree'd


Revision tags: elad-kernelauth-base
# 1.98 11-May-2006 christos

Add MSG_NOSIGNAL (from FreeBSD)


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2 yamt-pdpolicy-base
# 1.97 01-Mar-2006 yamt

branches: 1.97.2; 1.97.4; 1.97.6;
merge yamt-uio_vmspace branch.

- use vmspace rather than proc or lwp where appropriate.
the latter is more natural to specify an address space.
(and less likely to be abused for random purposes.)
- fix a swdmover race.


Revision tags: yamt-uio_vmspace-base5
# 1.96 26-Dec-2005 perry

branches: 1.96.2; 1.96.4; 1.96.6;
u_intN_t -> uintN_t


# 1.95 11-Dec-2005 christos

merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base ktrace-lwp-base
# 1.94 03-Sep-2005 martin

In adjust_rights() Use CMSG_SPACE() to calculate the number of
filedescriptors passed in this message - the counterpart in
unp_externalize does this as well.

Note that CMSG_SPACE(0) does not make sense, since it does not invoke
the alignment magic - so use CMSG_SPACE(sizeof(int)) and adjust the
calculated total later.

This fixes the postfix conection cache for 64bit platforms. Previously
the number of passed filed descriptors (nfds) would have been
calculeted too high, causing the fdrelease() of uninitialized junk.


# 1.93 03-Sep-2005 martin

minor knf tweak


# 1.92 30-May-2005 martin

branches: 1.92.2;
Close additional file descriptors if we set MSG_CTRUNC in a SCM_RIGHTS
message. From der Mouse in PR kern/30370.


# 1.91 29-May-2005 christos

- add const.
- remove unnecessary casts.
- add __UNCONST casts and mark them with XXXUNCONST as necessary.


Revision tags: yamt-km-base4 yamt-km-base3 netbsd-3-base kent-audio2-base
# 1.90 26-Feb-2005 perry

branches: 1.90.2;
nuke trailing whitespace


Revision tags: yamt-km-base2 yamt-km-base kent-audio1-beforemerge kent-audio1-base
# 1.89 30-Nov-2004 christos

branches: 1.89.4; 1.89.6;
Cloning cleanup:
1. make fileops const
2. add 2 new negative errno's to `officially' support the cloning hack:
- EDUPFD (used to overload ENODEV)
- EMOVEFD (used to overload ENXIO)
3. Created an fdclone() function to encapsulate the operations needed for
EMOVEFD, and made all cloners use it.
4. Centralize the local noop/badop fileops functions to:
fnullop_fcntl, fnullop_poll, fnullop_kqfilter, fbadop_stat


# 1.88 22-May-2004 jonathan

Eliminate several uses of `curproc' from the socket-layer code and from NFS.

Add a new explicit `struct proc *p' argument to socreate(), sosend().
Use that argument instead of curproc. Follow-on changes to pass that
argument to socreate(), sosend(), and (*so->so_send)() calls.
These changes reviewed and independently recoded by Matt Thomas.

Changes to soreceive() and (*dom->dom_exernalize() from Matt Thomas:
pass soreceive()'s struct uio* uio->uio_procp to unp_externalize().
Eliminate curproc from unp_externalize. Also, now soreceive() uses
its uio->uio_procp value, pass that same value downward to
((pr->pru_usrreq)() calls for consistency, instead of (struct proc * )0.

Similar changes in sys/nfs to eliminate (most) uses of curproc,
either via the req-> r_procp field of a struct nfsreq *req argument,
or by passing down new explicit struct proc * arguments.

Reviewed by: Matt Thomas, posted to tech-kern.
NB: The (*pr->pru_usrreq)() change should be tested on more (all!) protocols.


# 1.87 18-May-2004 ragge

Fix connect() "bug": If connect() is interrupted by a signal, the connection
attempt is terminated, so if a process needs frequent timer interrupts
it can't ever connect() to a machine far away.

Bug found by Erik Lundgren, bugfix (for the same problem) is similar to
the way FreeBSD solved the same problem.

As a side effect, the new connect() behaviour conformes to Posix.


Revision tags: netbsd-2-0-base
# 1.86 29-Nov-2003 matt

branches: 1.86.2;
Restore a change that made AF_LOCAL sockets block on connect(2) until
accepted. However, this time this behavor is not the default. Instead
it must enabled by using the LOCAL_CONNWAIT socket option on either the
connecting or accepting socket.


# 1.85 29-Nov-2003 perry

Revert a change that altered the semantics of AF_LOCAL sockets. Sadly
this made us API incompatible with other Unixes.


# 1.84 13-Nov-2003 chs

eliminate uvm_useracc() in favor of checking the return value of
copyin() or copyout().

uvm_useracc() tells us whether the mapping permissions allow access to
the desired part of an address space, and many callers assume that
this is the same as knowing whether an attempt to access that part of
the address space will succeed. however, access to user space can
fail for reasons other than insufficient permission, most notably that
paging in any non-resident data can fail due to i/o errors. most of
the callers of uvm_useracc() make the above incorrect assumption. the
rest are all misguided optimizations, which optimize for the case
where an operation will fail. we'd rather optimize for operations
succeeding, in which case we should just attempt the access and handle
failures due to insufficient permissions the same way we handle i/o
errors. since there appear to be no good uses of uvm_useracc(), we'll
just remove it.


# 1.83 04-Sep-2003 matt

Adapt to the new calling conventions of unp_connect2


# 1.82 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.81 29-Jun-2003 fvdl

branches: 1.81.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.


# 1.80 28-Jun-2003 darrenr

Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V


# 1.79 05-Apr-2003 christos

PR/21030: Naoto Shimazaki: fcntl to accepted socket does not work properly


# 1.78 26-Feb-2003 matt

Remove leftover MBUFTRACE asserts.


# 1.77 26-Feb-2003 drochner

deactivate MBUFTRACE related KASSERT()s in the !MBUFTRACE case


# 1.76 26-Feb-2003 matt

Add MBUFTRACE kernel option.
Do a little mbuf rework while here. Change all uses of MGET*(*, M_WAIT, *)
to m_get*(M_WAIT, *). These are not performance critical and making them
call m_get saves considerable space. Add m_clget analogue of MCLGET and
make corresponding change for M_WAIT uses.
Modify netinet, gem, fxp, tulip, nfs to support MBUFTRACE.
Begin to change netstat to use sysctl.


# 1.75 18-Jan-2003 thorpej

Merge the nathanw_sa branch.


Revision tags: nathanw_sa_before_merge fvdl_fs64_base gmcgarry_ctxsw_base gmcgarry_ucred_base nathanw_sa_base
# 1.74 26-Nov-2002 christos

si_ -> sel_ to avoid conflicts with siginfo.


# 1.73 25-Nov-2002 itojun

no need for error check after MEXTMALLOC - jdolecek


# 1.72 25-Nov-2002 itojun

MEXTMALLOC() can fail even if M_WAITOK, if arg is too big for malloc().


Revision tags: kqueue-aftermerge
# 1.71 23-Oct-2002 jdolecek

merge kqueue branch into -current

kqueue provides a stateful and efficient event notification framework
currently supported events include socket, file, directory, fifo,
pipe, tty and device changes, and monitoring of processes and signals

kqueue is supported by all writable filesystems in NetBSD tree
(with exception of Coda) and all device drivers supporting poll(2)

based on work done by Jonathan Lemon for FreeBSD
initial NetBSD port done by Luke Mewburn and Jason Thorpe


Revision tags: kqueue-beforemerge kqueue-base
# 1.70 04-Sep-2002 matt

Use the queue macros from <sys/queue.h> instead of referring to the queue
members directly. Use *_FOREACH whenever possible.


Revision tags: gehenna-devsw-base
# 1.69 31-May-2002 itojun

support setsockopt() with larger data (up to MCLBYTES).
From: Hitoshi Asaeda <Hitoshi.Asaeda@sophia.inria.fr>


Revision tags: netbsd-1-6-PATCH002-RELEASE netbsd-1-6-PATCH002 netbsd-1-6-PATCH002-RC4 netbsd-1-6-PATCH002-RC3 netbsd-1-6-PATCH002-RC2 netbsd-1-6-PATCH002-RC1 netbsd-1-6-PATCH001 netbsd-1-6-PATCH001-RELEASE netbsd-1-6-PATCH001-RC3 netbsd-1-6-PATCH001-RC2 netbsd-1-6-PATCH001-RC1 netbsd-1-6-RELEASE netbsd-1-6-RC3 netbsd-1-6-RC2 netbsd-1-6-RC1 netbsd-1-6-base eeh-devprop-base newlock-base ifpoll-base
# 1.68 11-Feb-2002 jdolecek

branches: 1.68.8;
Switch default for pipes to the faster John S. Dyson's implementation.
Old, socketpair-based ones are available with option PIPE_SOCKETPAIR.


Revision tags: thorpej-mips-cache-base
# 1.67 12-Nov-2001 lukem

add RCSIDs


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2
# 1.66 16-Sep-2001 wiz

branches: 1.66.2;
Spell 'occurred' with two 'r's.


Revision tags: post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.65 17-Jul-2001 jdolecek

branches: 1.65.2;
Expel MSG_COMPAT/COMPAT_OLDSOCK, make the COMPAT_43 wrappers
arrange things as needed. Unfortunately, the check in sockargs()
have to stay, since 4.3BSD bind(2), connect(2) and sendto(2) were
not versioned at the time :(

This code was tested to pass regression tests.


# 1.64 01-Jul-2001 matt

branches: 1.64.2;
Use consistent types for len. Limit sockarg length to reasonable values.


# 1.63 25-Jun-2001 jdolecek

Back off the sendit()/recvit() change, some have problems with it


# 1.62 25-Jun-2001 jdolecek

sys_connect(): fix the call to FILE_UNUSE() so that it's done on return, rather
than immediatelly after getsock() call


# 1.61 25-Jun-2001 jdolecek

Add 'kernsa' parameter for sendit()/recvit(); if nonzero, msg->msg_name
is supposed to point directly to struct mbuf or struct sockaddr in kernel
space as appropriate, rather than being a pointer to memory in userland.

This is to be used by compat/* when emulation needs to wrap
send{to|msg}(2)/recv{from|msg}(2) and modify the passed struct
sockaddr.


# 1.60 16-Jun-2001 jdolecek

Add port of high performance pipe implementation written by John S. Dyson
for FreeBSD project. Besides huge speed boost compared with socketpair-based
pipes, this implementation also uses pagable kernel memory instead of mbufs.

Significant differences to FreeBSD version:
* uses uvm_loan() facility for direct write
* async/SIGIO handling correct also for sync writer, async reader
* limits settable via sysctl, amountpipekva and nbigpipes available via sysctl
* pipes are unidirectional - this is enforced on file descriptor level
for now only, the code would be updated to take advantage of it
eventually
* uses lockmgr(9)-based locks instead of home brew variant
* scatter-gather write is handled correctly for direct write case, data
is transferred by PIPE_DIRECT_CHUNK bytes maximum, to avoid running out of kva

All FreeBSD/NetBSD specific code is within appropriate #ifdef, in preparation
to feed changes back to FreeBSD tree.

This pipe implementation is optional for now, add 'options NEW_PIPE'
to your kernel config to use it.


# 1.59 14-Jun-2001 thorpej

Fix a partial construction problem that can cause race conditions
between creation of a file descriptor and close(2) when using kernel
assisted threads. What we do is stick descriptors in the table, but
mark them as "larval". This causes essentially everything to treat
it as a non-existent descriptor, except for fdalloc(), which sees a
filled slot so that it won't (incorrectly) allocate it again. When
a descriptor is fully constructed, the code that has constructed it
marks it as "mature" (which actually clears the "larval" flag), and
things continue to work as normal.

While here, gather all the code that gets a descriptor from the table
into a fd_getfile() function, and call it, rather than having the
same (sometimes incorrect) code copied all over the place.


# 1.58 06-May-2001 manu

implement the recently introduced EMUL_BSD_ASYNCIO_PIPE emulation package
flag.

EMUL_BSD_ASYNCIO_PIPE notes that the emulated binaries expect the original
BSD pipe behavior for asynchronous I/O, which is to fire SIGIO on read() and
write(). OSes without this flag do not expect any SIGIO to be fired on
read() and write() for pipes, even when async I/O was requested. As far as
we know, the OSes that need EMUL_BSD_ASYNCIO_PIPE are NetBSD, OSF/1 and
Darwin.


Revision tags: thorpej_scsipi_beforemerge thorpej_scsipi_nbase thorpej_scsipi_base
# 1.57 27-Feb-2001 lukem

branches: 1.57.2;
convert to ANSI KNF


# 1.56 10-Dec-2000 fvdl

Make sobind() take a struct proc *. It already took curproc and
passed it down to the appropriate usrreq function, and this
allows usage for contexts that need to be explicitly different
from curproc (like in the NFS code when binding to a reserved port).


# 1.55 24-Nov-2000 jdolecek

define COMPAT_OLDSOCK unconditionally - the code is used virtually for all
emulations besides NetBSD, and this way it's LKM-safe


# 1.54 02-Aug-2000 thorpej

MALLOC()/FREE() are not to be used for variable sized allocations.


# 1.53 27-Jun-2000 mrg

remove include of <vm/vm.h>


Revision tags: netbsd-1-5-RELEASE netbsd-1-5-BETA2 netbsd-1-5-BETA netbsd-1-5-ALPHA2 netbsd-1-5-base minoura-xpg4dl-base
# 1.52 27-May-2000 sommerfeld

branches: 1.52.4;
Reduce use of curproc in several places:

- Change ktrace interface to pass in the current process, rather than
p->p_tracep, since the various ktr* function need curproc anyway.

- Add curproc as a parameter to mi_switch() since all callers had it
handy anyway.

- Add a second proc argument for inferior() since callers all had
curproc handy.

Also, miscellaneous cleanups in ktrace:

- ktrace now always uses file-based, rather than vnode-based I/O
(simplifies, increases type safety); eliminate KTRFLAG_FD & KTRFAC_FD.
Do non-blocking I/O, and yield a finite number of times when receiving
EWOULDBLOCK before giving up.

- move code duplicated between sys_fktrace and sys_ktrace into ktrace_common.

- simplify interface to ktrwrite()


# 1.51 30-Mar-2000 augustss

Get rid of register declarations.


# 1.50 23-Mar-2000 thorpej

Implement fdremove() which is used in place of all the code that
did the "fdp->fd_ofiles[fd] = 0" assignment; fdremove() make sure
the fd_freefiles hints stay in sync.

From OpenBSD.


Revision tags: chs-ubc2-newbase wrstuden-devbsize-19991221 wrstuden-devbsize-base comdex-fall-1999-base fvdl-softdep-base
# 1.49 05-Nov-1999 mycroft

branches: 1.49.2;
Fix recent bug in sys_accept(): we must remove the file descriptor from the
file descriptor table before freeing the file description.


# 1.48 30-Oct-1999 enami

back out unnecessary stylistic changes in recent changes, to keep coding
style closer to NKF.


# 1.47 27-Oct-1999 jdolecek

minor cleanup of previous - avoid goto and code duplication


# 1.46 27-Oct-1999 darrenr

patch from Greg A. Woods to fix panic problems with code that attempts to
recover from failures to accept a socket successfully. Problem suggested
by this:
> It would appear (from two "panic: closef: count < 0" failures in less
> than 12 hours) that Darren's fix to accept(2) for lost file descriptors
> isn't quite correct. His fix inserts a call to closef() to handle one
> of several possible error conditions. However everywhere else in the
> socket code in the same file where falloc() cleanup is necessary the
> function used is ffree().


Revision tags: chs-ubc2-base
# 1.45 01-Jul-1999 itojun

branches: 1.45.2; 1.45.4; 1.45.6;
IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.


# 1.44 01-Jul-1999 darrenr

fix sys_accept() to return EOPNOTSUPP for protocols which don't support
listen/accept (PR_LISTEN flag in protosw) and detect obvious faults in
parameters passed. It is still possible for the address used for copying
the socket information to become invalid between that check and the copyout
so close the connection's allocated fd if the copyout fails so that we can
return EFAULT without allocating an fd and the application not knowing about
it. Ideally we'd be able to queue the connection back up so a later accept
could retrieve it but unfortunately that's not possible.


# 1.43 05-May-1999 thorpej

Add "use counting" to file entries. When closing a file, and it's reference
count is 0, wait for use count to drain before finishing the close.

This is necessary in order for multiple processes to safely share file
descriptor tables.


# 1.42 30-Apr-1999 cgd

add checks for COMPAT_OSF1 in the appropriate places


Revision tags: netbsd-1-4-RELEASE netbsd-1-4-base
# 1.41 10-Feb-1999 kleink

branches: 1.41.2; 1.41.4; 1.41.6;
* Due to addition and use of socklen_t, make the socket option and address
arguments passed to accept(), bind(), connect(), getpeername(), getsockname(),
getsockopt(), recvfrom(), sendto() and sendmsg() unsigned, which also elimiates
a few casts.
* Reflect the (now) signedness of msg_iovlen, which necessiates the addition
of a few casts.


# 1.40 18-Dec-1998 drochner

solve the COMPAT_OLDSOCK/MSG_COMPAT problem differently:
The source files which need MSG_COMPAT define COMPAT_OLDSOCK.


Revision tags: kenh-if-detach-base
# 1.39 26-Nov-1998 mycroft

Revert the functional change in rev 1.38; permit a msg_iovlen of 0.
There are two reasons for this:
* We should be able to pass file descriptors without sending any data.
* We could send zero-length iovecs anyway (but we shouldn't have to do this).
Also, msg_iovlen is already a u_int, so delete a bunch of casts.


Revision tags: chs-ubc-base
# 1.38 04-Aug-1998 kleink

Per XNS Issue 5, calling recvmsg(2) or sendmsg(2) with an msg.msg_iovlen less
than or equal to 0 shall fail with EMSGSIZE; the latter condition was not being
checked for. Also, document the msg.msg_iovlen > {IOV_MAX} case.


# 1.37 04-Aug-1998 kleink

UIO_MAXIOV -> IOV_MAX


# 1.36 04-Aug-1998 perry

Abolition of bcopy, ovbcopy, bcmp, and bzero, phase one.
bcopy(x, y, z) -> memcpy(y, x, z)
ovbcopy(x, y, z) -> memmove(y, x, z)
bcmp(x, y, z) -> memcmp(x, y, z)
bzero(x, y) -> memset(x, 0, y)


# 1.35 03-Aug-1998 kleink

Fix two off-by-one bugs, both present in each recvmsg(2) and sendmsg(2):
* the first one would cause an unnecessary malloc() of iovec storage for
a msg_iovlen of UIO_SMALLIOV although the required amount of memory has
been allocated on the stack.
* the second one would cause a recvmsg() or sendmsg() with a msg_iovlen of
UIO_MAXIOV to fail with EMSGSIZE, which is also a violation of XNS5.


# 1.34 31-Jul-1998 perry

fix sizeofs so they comply with the KNF style guide. yes, it is pedantic.


Revision tags: eeh-paddr_t-base
# 1.33 29-Jul-1998 thorpej

branches: 1.33.2;
When checking for overflow in the residual count, test against SSIZE_MAX.
The read/write system calls return ssize_t because -1 is used to indicate
error, therefore the transfer size MUST be limited to SSIZE_MAX, otherwise
garbage can be returned to the user.

There is NO change from existing behavior here, only a more precise
definition of that the semantics are, except in the Alpha case, where
the full SSIZE_MAX transfer size can now be realized (ssize_t is 64-bit
on the Alpha).


# 1.32 18-Jul-1998 lukem

use AF_LOCAL instead of AF_UNIX


# 1.31 25-Jun-1998 thorpej

defopt KTRACE


# 1.30 25-Apr-1998 matt

Hook for 0-copy (or other optimized) sends and receives


# 1.29 01-Mar-1998 fvdl

Merge with Lite2 + local changes


# 1.28 06-Feb-1998 thorpej

When copying out multiple control messages, ensure that the next control
message is aligned. From David Borman <dab@bsdi.com>.


# 1.27 07-Jan-1998 thorpej

Make insertion and removal of sockets from the partial and incoming
connections queues O(C) rather than O(N).


# 1.26 07-Jan-1998 thorpej

Fix bug in recvit() that would cause recvmsg() to only receive one
control message, even if there were multiple control messages on
the queue. From Jean-Luc Richier <Jean-Luc.Richier@imag.fr>, in
bug report kern/4700.


Revision tags: netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base thorpej-signal-base marc-pcmcia-bp marc-pcmcia-base
# 1.25 26-Jun-1997 thorpej

branches: 1.25.8;
Use UCHAR_MAX rather than "255" when sanity-checking the length of a
sockaddr in sockargs().


# 1.24 26-Jun-1997 thorpej

In sockargs():
- Add a comment describing my feelings about this interface, in general.
- Remove the COMPAT_OLDSOCK length hack. Instead, if the socket argument
is too long to fit in an mbuf, allocate enough external storage to
hold it.
- If the socket argument is a sockaddr, don't allow the length to be
greater than 255, as that would overflow sa_len.
Many thanks to enami tsugutomo <enami@cv.sony.co.jp> for his sanity checking.


Revision tags: is-newarp-before-merge is-newarp-base
# 1.23 22-Dec-1996 cgd

* catch up with system call argument type fixups/const poisoning.
* Fix arguments to various copyin()/copyout() invocations, to avoid
gratuitous casts.
* Some KNF formatting fixes
* Change sockargs()'s second argument to be a const void *, to help
with dealing with the syscall argument type fixups/const poisoning.


# 1.22 14-Jun-1996 cgd

avoid unnecessary checks of m_get/MGET/etc.'s return values. When
they're called with M_WAIT, they are defined to never return NULL.


# 1.21 22-May-1996 mycroft

Pass a proc pointer down to the usrreq and pcbbind functions for PRU_ATTACH, PRU_BIND and
PRU_CONTROL. The usrreq interface really needs to be split up, but this will have to wait.
Remove SS_PRIV completely.


Revision tags: netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.20 17-May-1996 pk

branches: 1.20.4;
Don't touch retval[] in socketpair(); manual page says this system call
return 0 on success (PR#2428).


# 1.19 09-Feb-1996 christos

More proto fixes


# 1.18 04-Feb-1996 christos

First pass at prototyping


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.17 10-Oct-1995 mycroft

Add hooks for COMPAT_FREEBSD, from Noriyuki Soda.


# 1.16 07-Oct-1995 mycroft

Prefix names of system call implementation functions with `sys_'.


# 1.15 19-Sep-1995 thorpej

Make system calls conform to a standard prototype and bring those
prototypes into scope.


# 1.14 12-Aug-1995 mycroft

splnet --> splsoftnet


# 1.13 24-Jun-1995 christos

Extracted all of the compat_xxx routines, and created a library [libcompat]
for them. There are a few #ifdef COMPAT_XX remaining, but they are not easy
or worth eliminating (yet).


# 1.12 10-May-1995 christos

tty_tb.c: need to include ioctl_compat.h in order to compile.
sysv_shm.c: make shm_find_segment_by_shmid global so it can be used by
COMPAT_HPUX. There should be a better way...
rest: Add #ifdef COMPAT_HPUX where needed


# 1.11 05-Mar-1995 fvdl

Extended a couple of defines with "|| defined(COMPAT_LINUX)" to make
things compile without requiring COMPAT_43 and/or COMPAT_09.


# 1.10 30-Oct-1994 cgd

be more careful with types, also pull in headers where necessary.


# 1.9 20-Oct-1994 cgd

update for new syscall args description mechanism


Revision tags: netbsd-1-0-PATCH06 netbsd-1-0-PATCH05 netbsd-1-0-PATCH04 netbsd-1-0-PATCH03 netbsd-1-0-PATCH02 netbsd-1-0-PATCH1 netbsd-1-0-PATCH0 netbsd-1-0-RELEASE netbsd-1-0-base
# 1.8 29-Jun-1994 cgd

New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.7 04-May-1994 mycroft

Add return types where missing. Simplify some of the compat conditionals.
Include compat code if COMPAT_SUNOS with or without COMPAT_43.


# 1.6 18-Dec-1993 mycroft

Canonicalize all #includes.


Revision tags: magnum-base netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.5 17-Jul-1993 mycroft

branches: 1.5.4;
Finish moving struct definitions outside of function declarations.


# 1.4 27-Jun-1993 andrew

* ansifications
* Yuval Yarom's socket recv(2) fixes - access rights problems (see also
uipc_socket.c).


# 1.3 22-May-1993 cgd

add include of select.h if necessary for protos, or delete if extraneous


# 1.2 18-May-1993 cgd

make kernel select interface be one-stop shopping & clean it all up.


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


# 1.202 02-Oct-2021 thorpej

...and correct my terrible spelling.


# 1.201 02-Oct-2021 thorpej

- Strenghen the poll(2) fifo_inout test to ensure that once the reader
has read enough that exactly PIPE_BUF space is available that the FIFO
becomes writable again.
- When creating a FIFO, ensure that the receive low water mark is 1
(a FIFO must be readable when at least 1 byte is available); this
was already the case implicitly, but this makes it explicit.
- Similarly, set the send low water mark to PIPE_BUF to ensure that
the pipe is writable when at least PIPE_BUF bytes of space are available
in the send buffer. Without this change, the strengthened test case
above does not pass (the default send low water mark is larger than
PIPE_BUF; see soreserve()).
- Make the same low water mark changes to the PIPE_SOCKETPAIR case.


Revision tags: thorpej-i2c-spi-conf2-base thorpej-futex2-base thorpej-cfargs2-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base thorpej-i2c-spi-conf-base thorpej-cfargs-base thorpej-futex-base
# 1.200 23-May-2020 ad

Move proc_lock into the data segment. It was dynamically allocated because
at the time we had mutex_obj_alloc() but not __cacheline_aligned.


Revision tags: netbsd-9-2-RELEASE netbsd-9-1-RELEASE bouyer-xenpvh-base2 phil-wifi-20200421 bouyer-xenpvh-base1 phil-wifi-20200411 bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3 netbsd-9-0-RELEASE netbsd-9-0-RC2 ad-namecache-base2 ad-namecache-base1 ad-namecache-base netbsd-9-0-RC1 phil-wifi-20191119 netbsd-9-base phil-wifi-20190609 isaki-audio2-base pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126
# 1.199 12-Nov-2018 hannken

branches: 1.199.4;
sys_recvmmsg: don't defer an error that already gets returned.


# 1.198 07-Nov-2018 hannken

Don't defer errors from sendmmsg(). This matches the linux manpage.

Defer errors from recvmmsg() through so_rerror and
tests and return a deferred error on entry.

Ok: christos@


Revision tags: pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.197 03-Sep-2018 riastradh

Rename min/max -> uimin/uimax for better honesty.

These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER! Some subsystems have

#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)


# 1.196 01-Aug-2018 rjs

Add ioctl(2) handler for kernel part of sctp_peeloff().


# 1.195 31-Jul-2018 rjs

Add getsockopt2() syscall.


Revision tags: pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521
# 1.194 04-May-2018 christos

branches: 1.194.2;
define MBUFTYPES here.


# 1.193 03-May-2018 christos

Fix COMPAT_NETBSD32 cmsg handling:

1. alignment was wrong for > 1 message
2. macros were doing incorrect pointer comparisons, fortunately ending
the iteration early after the fists cmsg instead of crashing.
3. don't output 32 bit ktrace records for cmsg. 32 bit programs running
under emulation on 64 bit systems should produce 64 bit ktrace records
so that the native ktrace can handle the records; remove extra arguments
that are now not needed (the 32 bit msghdr).
4. output the correct type for cmsg trace records.
5. output all the cmsg records in traces instead of just the first one.

Welcome to 8.99.15 because of the argument removal.

XXX: Really all the code should be changed to use the CMSG_{FIRST,NXT}HDR
macros...


Revision tags: pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322
# 1.192 16-Mar-2018 christos

PR/53103: Timo Buhrmester: linux emulation of sendto(2) broken

The sockargs refactoring broke it, because sockargs only works with a user
address. Added an argument to sockargs to indicate where the address is
coming from. Welcome to 8.99.14.


Revision tags: pgoyette-compat-0315 pgoyette-compat-base
# 1.191 12-Feb-2018 maxv

branches: 1.191.2;
Add a KASSERT; we expect *from to be a single mbuf (not chained).


# 1.190 04-Jan-2018 christos

Add a new sockopt member to keep track of the actual size of the option
that should be returned to the caller in getsockopt(2).

(Tom Ivar Helbekkmo)


# 1.189 31-Dec-2017 christos

pass valsize for getsockopt like we do for setsockopt


# 1.188 26-Dec-2017 kamil

Refactor pipe1() and correct a bug in sys_pipe2() (SYS_pipe2)

sys_pipe2() returns two integers (values), the 2nd one is a copy of the 2nd
file descriptor that lands in fildes[2]. This is a side effect of reusing
the code for sys_pipe() (SYS_pipe) and not cleaning it up.

The first returned value is (on success) 0.

Introduced a small refactoring in pipe1() that it does not operate over
retval[], but on an array int[2]. A user sets retval[] for pipe() when
desired and needed.

This refactoring touches compat code: netbsd32, linux, linux32.

Before the changes on NetBSD/amd64:

$ ktruss -i ./a.out
[...]
15131 1 a.out pipe2(0x7f7fff2e62b8, 0) = 0, 4
[...]

After the changes:

$ ktruss -i ./a.out
[...]
782 1 a.out pipe2(0x7f7fff97e850, 0) = 0
[...]

There should not be a visible change for current users.

Sponsored by <The NetBSD Foundation>


Revision tags: tls-maxphys-base-20171202 nick-nhusb-base-20170825 perseant-stdc-iso10646-base
# 1.187 20-Jun-2017 christos

Change len type to be unsigned int for consistency with the input type.
Don't check for negative; it does not matter we clamp anyway. This
broke the compat32 getsockname() where an unitialized socklen_t ended
up randomly negative causing it to fail.


Revision tags: matt-nb8-mediatek-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204
# 1.186 03-Feb-2017 christos

branches: 1.186.6;
expose sendmsg_so and recvmsg_so.


# 1.185 02-Feb-2017 christos

expose copyout_sockname_sb


Revision tags: bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204
# 1.184 03-Dec-2016 christos

branches: 1.184.2;
Add missing ktrkuser


Revision tags: pgoyette-localcount-20161104 nick-nhusb-base-20161004 localcount-20160914
# 1.183 13-Sep-2016 martin

Make the ktrace record written by do_sys_sendmsg/do_sys_recvmsg overridable
by the caller. Use this in compat_netbsd32 to log the 32bit version, so
the 32bit userland kdump is happy.


Revision tags: pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.182 07-Jul-2016 msaitoh

branches: 1.182.2;
KNF. Remove extra spaces. No functional change.


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.181 01-Nov-2015 christos

Don't overwrite the user iov pointer in sendmmsg. Make the send and receive
code look the same.


Revision tags: nick-nhusb-base-20150921
# 1.180 24-Aug-2015 pooka

sprinkle _KERNEL_OPT


# 1.179 22-Jul-2015 maxv

Memory leak. Triggerable from an unprivileged user via COMPAT_43.


Revision tags: nick-nhusb-base-20150606
# 1.178 09-May-2015 rtr

change sosend() to accept sockaddr * instead of mbuf * for nam.

bump to 7.99.16


# 1.177 02-May-2015 rtr

make connect syscall use sockaddr_big and modify pr_{send,connect}
nam parameter type from buf * to sockaddr *.

final commit for parameter type changes to protocol user requests

* bump kernel version to 7.99.15 for parameter type changes to pr_{send,connect}


# 1.176 24-Apr-2015 rtr

make accept, getsockname and getpeername syscalls use sockaddr_big and modify
pr_{accept,sockname,peername} nam parameter type from mbuf * to sockaddr *.

* retained use of mbuftypes[MT_SONAME] for now.
* bump to netbsd version 7.99.12 for parameter type change.

patch posted to tech-net@ 2015/04/19


Revision tags: nick-nhusb-base-20150406
# 1.175 03-Apr-2015 rtr

* change pr_bind to accept struct sockaddr * instead of struct mbuf *
* update protocol bind implementations to use/expect sockaddr *
instead of mbuf *
* introduce sockaddr_big struct for storage of addr data passed via
sys_bind; sockaddr_big is of sufficient size and alignment to
accommodate all addr data sizes received.
* modify sys_bind to allocate sockaddr_big instead of using an mbuf.
* bump kernel version to 7.99.9 for change to pr_bind() parameter type.

Patch posted to tech-net@
http://mail-index.netbsd.org/tech-net/2015/03/15/msg005004.html

The choice to use a new structure sockaddr_big has been retained since
changing sockaddr_storage size would lead to unnecessary ABI change. The
use of the new structure does not preclude future work that increases
the size of sockaddr_storage and at that time sockaddr_big may be
trivially replaced.

Tested by mrg@ and myself, discussed with rmind@, posted to tech-net@


# 1.174 06-Mar-2015 rtr

Return EINVAL if namelen isn't large enough to encompass the expected
members of sockaddr structures. i.e. sa_len and sa_family.

Discussed with and patch by christos@


Revision tags: nick-nhusb-base
# 1.173 05-Sep-2014 matt

branches: 1.173.2;
Try not to use f_data, use f_{vnode,socket,pipe,mqueue,kqueue,ksem} to get
a correctly typed pointer.


Revision tags: netbsd-7-0-RC2 netbsd-7-0-RC1 netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.172 09-Aug-2014 rtr

branches: 1.172.2;
split PRU_CONNECT2 & PRU_PURGEIF function out of pr_generic() usrreq
switches and put into separate functions

- always KASSERT(solocked(so)) even if not implemented
(for PRU_CONNECT2 only)

- replace calls to pr_generic() with req = PRU_CONNECT2 with calls to
pr_connect2()

- replace calls to pr_generic() with req = PRU_PURGEIF with calls to
pr_purgeif()

put common code from unp_connect2() (used by unp_connect() into
unp_connect1() and call out to it when needed

patch only briefly reviewed by rmind@


# 1.171 09-Jul-2014 rtr

* split PRU_PEERADDR and PRU_SOCKADDR function out of pr_generic()
usrreq switches and put into separate functions
xxx_{peer,sock}addr(struct socket *, struct mbuf *).

- KASSERT(solocked(so)) always in new functions even if request
is not implemented

- KASSERT(pcb != NULL) and KASSERT(nam) if the request is
implemented and not for tcp.

* for tcp roll #ifdef KPROF and #ifdef DEBUG code from tcp_usrreq() into
easier to cut & paste functions tcp_debug_capture() and
tcp_debug_trace()

- functions provided by rmind
- remaining use of PRU_{PEER,SOCK}ADDR #define to be removed in a
future commit.

* rename netbt functions to permit consistency of pru function names
(as has been done with other requests already split out).

- l2cap_{peer,sock}addr() -> l2cap_{peer,sock}_addr_pcb()
- rfcomm_{peer,sock}addr() -> rfcomm_{peer,sock}_addr_pcb()
- sco_{peer,sock}addr() -> sco_{peer,sock}_addr_pcb()

* split/refactor do_sys_getsockname(lwp, fd, which, nam) into
two functions do_sys_get{peer,sock}name(fd, nam).

- move PRU_PEERADDR handling into do_sys_getpeername() from
do_sys_getsockname()
- have svr4_stream directly call do_sys_get{sock,peer}name()
respectively instead of providing `which' & fix a DPRINTF string
that incorrectly wrote "getpeername" when it meant "getsockname"
- fix sys_getpeername() and sys_getsockname() to call
do_sys_get{sock,peer}name() without `which' and `lwp' & adjust
comments
- bump kernel version for removal of lwp & which parameters from
do_sys_getsockname()

note: future cleanup to remove struct mbuf * abuse in
xxx_{peer,sock}name()
still to come, not done in this commit since it is easier to do post
split.

patch reviewed by rmind

welcome to 6.99.47


Revision tags: rmind-smpnet-nbase rmind-smpnet-base
# 1.170 18-May-2014 rmind

Add struct pr_usrreqs with a pr_generic function and prepare for the
dismantling of pr_usrreq in the protocols; no functional change intended.
PRU_ATTACH/PRU_DETACH changes will follow soon.

Bump for struct protosw. Welcome to 6.99.62!


# 1.169 17-May-2014 rmind

- fsocreate: set SS_NBIO before the file descriptor is affixed as there is
a theoretical race condition (hard to trigger, though); remove the LWP
parameter and clean up the code a little.
- Sprinkle few comments.
- Remove M_SOOPTS while here.


# 1.168 17-May-2014 rmind

makesocket: set SS_NBIO slightly earlier.


# 1.167 17-May-2014 rmind

Remove trailing whitespaces, wrap long lines, minor KNF; no functional changes.


Revision tags: yamt-pagecache-base9
# 1.166 07-Apr-2014 seanb

Fix a case where an erroneous EAGAIN was returned out of recvmmsg.
This occured when some, but not all of the mmsg array members
were filled with data from a non-blocking socket.
PR kern/48725


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.165 09-Oct-2013 christos

branches: 1.165.2;
delete extra m_len initialization.


# 1.164 09-Oct-2013 christos

PR/48292: Justin Cormack: paccept creates sockets that cannot be made blocking
Reset the socket flags not just the file flags for non-blocking I/O.
XXX: pullup 6


# 1.163 08-Oct-2013 christos

PR/47591: Michael Plass: If the unix socket is closed before accept,
unp->unp_conn will be NULL in PRU_ACCEPT, as called from
sys_accept->so_accept. This will cause the usrreq to return with
no error, leaving the mbuf gotten from m_get() with an uninitialized
length, containing junk from a previous call. Initialize m_len to
be 0 to handle this case. This is yet another reason why Beverly's
idea of setting m_len = 0 in m_get() makes a lot of sense. Arguably
this could be an error, since the data we return now has 0 family
and length.


# 1.162 29-Aug-2013 rmind

Remove SS_ISCONFIRMING, it is unused and TP4 will not come back.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.161 03-Jun-2013 christos

branches: 1.161.2;
use the proper name for kdump pretty-printing.


Revision tags: agc-symver-base
# 1.160 14-Feb-2013 christos

PR/47569: Valery Ushakov: SOCK_NONBLOCK does not work because it does not
set SS_NBIO.
XXX: there are too many flags that mean the same thing in too many places,
and too many flags that mean the same thing and are different.


# 1.159 14-Feb-2013 riastradh

Fix some screw cases in cmsg file descriptor passing.

- Don't leave garbage in the control buffer if allocating file
descriptors fails in unp_externalize.

- Scrub the space between CMSG_LEN and CMSG_SPACE to avoid kernel
memory disclosure in unp_externalize.

- Don't read past cmsg_len when closing file descriptors that
couldn't get delivered, in free_rights.

ok christos


Revision tags: yamt-pagecache-base8
# 1.158 29-Dec-2012 mlelstv

The sanity check prevented messages that carry only ancillary data.


# 1.157 29-Dec-2012 mlelstv

If an untraced process sleeps in recvmsg/sendmsg, the syscall does not
allocate an iov structure for ktrace. When tracing is then enabled
and the process wakes up, it crashes the kernel.

Undo the last commit which introduced this error path.

Avoid the mentioned kmem_alloc assertion by adding a sanity check analog
to similar code in sys_generic.c for I/O on file handles instead of
sockets.

This also causes the syscall to return EMSGSIZE if the msg_iovlen member
of the msg structure is less than or equal to 0, as defined in
recvmsg(2)/sendmsg(2).


Revision tags: yamt-pagecache-base7 yamt-pagecache-base6
# 1.156 17-Jul-2012 njoly

branches: 1.156.2;
Avoid kmem_alloc KASSERT for 0 byte allocation, when tracing processes
that use empty messages with sendmsg/recvmsg.


# 1.155 22-Jun-2012 christos

Add {send,recv}mmsg from Linux


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2 netbsd-6-base
# 1.154 25-Jan-2012 christos

branches: 1.154.2;
revert atomics for so_options since it is a short.


# 1.153 25-Jan-2012 christos

need <sys/atomic.h>


# 1.152 25-Jan-2012 christos

Add locking, requested by yamt. Note that locking is not used everywhere
for these.


# 1.151 25-Jan-2012 christos

As discussed in tech-kern, provide the means to prevent delivery of SIGPIPE
on EPIPE for all file descriptor types:

- provide O_NOSIGPIPE for open,kqueue1,pipe2,dup3,fcntl(F_{G,S}ETFL) [NetBSD]
- provide SOCK_NOSIGPIPE for socket,socketpair [NetBSD]
- provide SO_NOSIGPIPE for {g,s}seckopt [NetBSD/FreeBSD/MacOSX]
- provide F_{G,S}ETNOSIGPIPE for fcntl [MacOSX]


# 1.150 21-Dec-2011 christos

simplify expression


# 1.149 20-Dec-2011 christos

- Eliminate so_nbio and turn it into a bit SS_NBIO in so_state.
- Introduce MSG_NBIO so that we can turn non blocking i/o on a per call basis
- Use MSG_NBIO to fix the XXX: multi-threaded issues on the fifo sockets.
- Don't set SO_CANTRCVMORE, if we were interrupted (perhaps do it for all
errors?).


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2
# 1.148 04-Nov-2011 christos

branches: 1.148.4;
Fix error I introduced in previous commit that caused asymmetric connects
when SOCK_NONBLOCK or SOCK_CLOEXEC was specified. Factor out common code
and simplify error return.


Revision tags: yamt-pagecache-base
# 1.147 21-Sep-2011 christos

branches: 1.147.2;
Put the mbuf type in the ktrace record so that we know how to decode it
in userland.


# 1.146 27-Jul-2011 uebayasi

These don't need uvm/uvm_extern.h.


# 1.145 15-Jul-2011 christos

fail with EINVAL if flags not are not O_CLOEXEC|O_NONBLOCK in pipe2(2) and
dup3(2)


# 1.144 26-Jun-2011 christos

* Arrange for interfaces that create new file descriptors to be able to
set close-on-exec on creation (http://udrepper.livejournal.com/20407.html).

- Add F_DUPFD_CLOEXEC to fcntl(2).
- Add MSG_CMSG_CLOEXEC to recvmsg(2) for unix file descriptor passing.
- Add dup3(2) syscall with a flags argument for O_CLOEXEC, O_NONBLOCK.
- Add pipe2(2) syscall with a flags argument for O_CLOEXEC, O_NONBLOCK.
- Add flags SOCK_CLOEXEC, SOCK_NONBLOCK to the socket type parameter
for socket(2) and socketpair(2).
- Add new paccept(2) syscall that takes an additional sigset_t to alter
the sigmask temporarily and a flags argument to set SOCK_CLOEXEC,
SOCK_NONBLOCK.
- Add new mode character 'e' to fopen(3) and popen(3) to open pipes
and file descriptors for close on exec.
- Add new kqueue1(2) syscall with a new flags argument to open the
kqueue file descriptor with O_CLOEXEC, O_NONBLOCK.

* Fix the system calls that take socklen_t arguments to actually do so.

* Don't include userland header files (signal.h) from system header files
(rump_syscallargs.h).

* Bump libc version for the new syscalls.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.143 24-Apr-2011 rmind

- Replace few malloc(9) uses with kmem(9).
- Rename buf_malloc() to buf_alloc(), fix comments.
- Remove some unnecessary inclusions.


# 1.142 10-Apr-2011 christos

- Add O_CLOEXEC to open(2)
- Add fd_set_exclose() to encapsulate uses of FIO{,N}CLEX, O_CLOEXEC, F{G,S}ETFD
- Add a pipe1() function to allow passing flags to the fd's that pipe(2)
opens to ease implementation of linux pipe2(2)
- Factor out fp handling code from open(2) and fhopen(2)


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base matt-mips64-premerge-20101231 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10 uebayasi-xip-base1
# 1.141 23-Apr-2010 rmind

branches: 1.141.2;
Replace M_IOV and some malloc(9)s with kmem(9), and while there:
- Fix invalid free (M_TEMP vs M_IOV) in do_sys_recvmsg(), spotted by jakllsch@.
Also, same fix in osf1_sys_sendmsg_xopen().
- Fix attempt to free non-allocated memory in error path in netbsd32___getfh30().
- Plug a memory leak in compat_43_netbsd32_orecvmsg().


Revision tags: yamt-nfs-mp-base9 uebayasi-xip-base
# 1.140 21-Jan-2010 pgoyette

branches: 1.140.2; 1.140.4;
Remove unnecessary call to kauth_cred_free().

This resolves an occassional crash I'd been experiencing as reported on
current-users@

Fix suggested by and OK elad@


# 1.139 29-Dec-2009 elad

Add credentials to to sockets.

We don't need any deferred free etc. because we no longer free the
credentials in interrupt context.

Tons of help from matt@, thanks!


# 1.138 20-Dec-2009 dsl

If a multithreaded app closes an fd while another thread is blocked in
read/write/accept, then the expectation is that the blocked thread will
exit and the close complete.
Since only one fd is affected, but many fd can refer to the same file,
the close code can only request the fs code unblock with ERESTART.
Fixed for pipes and sockets, ERESTART will only be generated after such
a close - so there should be no change for other programs.
Also rename fo_abort() to fo_restart() (this used to be fo_drain()).
Fixes PR/26567


Revision tags: matt-premerge-20091211
# 1.137 09-Dec-2009 dsl

Rename fo_drain() to fo_abort(), 'drain' is used to mean 'wait for output
do drain' in many places, whereas fo_drain() was called in order to force
blocking read()/write() etc calls to return to userspace so that a close()
call from a different thread can complete.
In the sockets code comment out the broken code in the inner function,
it was being called from compat code.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 jym-xensuspend-nbase yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.136 04-Apr-2009 ad

Add fileops::fo_drain(), to be called from fd_close() when there is more
than one active reference to a file descriptor. It should dislodge threads
sleeping while holding a reference to the descriptor. Implemented only for
sockets but should be extended to pipes, fifos, etc.

Fixes the case of a multithreaded process doing something like the
following, which would have hung until the process got a signal.

thr0 accept(fd, ...)
thr1 close(fd)


Revision tags: nick-hppapmap-base2
# 1.135 21-Jan-2009 yamt

branches: 1.135.2;
restore the pre socket locking patch signal behaviour.
this fixes a busy-loop in nfs_connect.


Revision tags: netbsd-5-0-RC1 haad-dm-base2 haad-nbase2 ad-audiomp2-base netbsd-5-base matt-mips64-base2 haad-dm-base1 wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 haad-dm-base mjf-devfs2-base
# 1.134 06-Aug-2008 plunky

branches: 1.134.2; 1.134.4;
Convert socket options code to use a sockopt structure
instead of laying everything into an mbuf.

approved by core


Revision tags: simonb-wapbl-nbase simonb-wapbl-base
# 1.133 24-Jun-2008 ad

branches: 1.133.2;
Nothing uses getsock/getvnode any more.


Revision tags: wrstuden-revivesa-base-1 yamt-pf42-base4 yamt-pf42-base3 wrstuden-revivesa-base
# 1.132 30-May-2008 rmind

branches: 1.132.2;
do_sys_accept: release the reference to sock in few error paths.
Should fix PR/38790, report and test-case by Nicolas Joly.


Revision tags: hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2
# 1.131 28-Apr-2008 martin

branches: 1.131.2;
Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.130 24-Apr-2008 ad

branches: 1.130.2;
Network protocol interrupts can now block on locks, so merge the globals
proclist_mutex and proclist_lock into a single adaptive mutex (proc_lock).
Implications:

- Inspecting process state requires thread context, so signals can no longer
be sent from a hardware interrupt handler. Signal activity must be
deferred to a soft interrupt or kthread.

- As the proc state locking is simplified, it's now safe to take exit()
and wait() out from under kernel_lock.

- The system spends less time at IPL_SCHED, and there is less lock activity.


# 1.129 24-Apr-2008 ad

Merge the socket locking patch:

- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.

With much feedback from matt@ and plunky@.


Revision tags: yamt-pf42-baseX yamt-pf42-base ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14
# 1.128 21-Mar-2008 ad

branches: 1.128.2;
Catch up with descriptor handling changes. See kern_descrip.c revision
1.173 for details.


Revision tags: keiichi-mipv6-nbase nick-net80211-sync-base keiichi-mipv6-base matt-armv6-nbase mjf-devfs-base hpcarm-cleanup-base
# 1.127 06-Feb-2008 ad

branches: 1.127.6;
Don't lock the socket to set/clear FNONBLOCK. Just set it atomically.


Revision tags: vmlocking2-base3 bouyer-xeni386-nbase bouyer-xeni386-base matt-armv6-base
# 1.126 26-Dec-2007 ad

Merge more changes from vmlocking2, mainly:

- Locking improvements.
- Use pool_cache for more items.


# 1.125 20-Dec-2007 dsl

Convert all the system call entry points from:
int foo(struct lwp *l, void *v, register_t *retval)
to:
int foo(struct lwp *l, const struct foo_args *uap, register_t *retval)
Fixup compat code to not write into 'uap' and (in some cases) to actually
pass a correctly formatted 'uap' structure with the right name to the
next routine.
A few 'compat' routines that just call standard ones have been deleted.
All the 'compat' code compiles (along with the kernels required to test
build it).
98% done by automated scripts.


# 1.124 16-Dec-2007 elad

Make solisten() take an lwp pointer like the rest, so it can be passed down
to pr_usrreq.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 vmlocking-nbase jmcneill-pm-base reinoud-bufcleanup-base
# 1.123 24-Nov-2007 dyoung

branches: 1.123.2; 1.123.6;
Pass the mbuf type (e.g., MT_SONAME, MT_SOOPTS) as the second
argument to getsockmbuf().


Revision tags: jmcneill-base bouyer-xenamd64-base2 yamt-x86pmap-base4 bouyer-xenamd64-base yamt-x86pmap-base3 yamt-x86pmap-base2 vmlocking-base
# 1.122 05-Oct-2007 dyoung

branches: 1.122.4;
Use getsombuf().


Revision tags: yamt-x86pmap-base
# 1.121 19-Sep-2007 christos

branches: 1.121.2;
minor nits; no code change.


# 1.120 19-Sep-2007 dyoung

1) Introduce a new socket option, (SOL_SOCKET, SO_NOHEADER), that
tells a socket that it should both add a protocol header to tx'd
datagrams and remove the header from rx'd datagrams:

int onoff = 1, s = socket(...);
setsockopt(s, SOL_SOCKET, SO_NOHEADER, &onoff);

2) Add an implementation of (SOL_SOCKET, SO_NOHEADER) for raw IPv4
sockets.

3) Reorganize the protocols' pr_ctloutput implementations a bit.
Consistently return ENOPROTOOPT when an option is unsupported,
and EINVAL if a supported option's arguments are incorrect.
Reorganize the flow of code so that it's more clear how/when
options are passed down the stack until they are handled.

Shorten some pr_ctloutput staircases for readability.

4) Extract common mbuf code into subroutines, add new sockaddr
methods, and introduce a new subroutine, fsocreate(), for reuse
later; use it first in sys_socket():

struct mbuf *m_getsombuf(struct socket *so)

Create an mbuf and make its owner the socket `so'.

struct mbuf *m_intopt(struct socket *so, int val)

Create an mbuf, make its owner the socket `so', put the
int `val' into it, and set its length to sizeof(int).


int fsocreate(..., int *fd)

Create a socket, a la socreate(9), put the socket into the
given LWP's descriptor table, return the descriptor at `fd'
on success.

void *sockaddr_addr(struct sockaddr *sa, socklen_t *slenp)
const void *sockaddr_const_addr(const struct sockaddr *sa, socklen_t *slenp)

Extract a pointer to the address part of a sockaddr. Write
the length of the address part at `slenp', if `slenp' is
not NULL.

socklen_t sockaddr_getlen(const struct sockaddr *sa)

Return the length of a sockaddr. This just evaluates to
sa->sa_len. I only add this for consistency with code that
appears in a portable userland library that I am going to
import.

const struct sockaddr *sockaddr_any(const struct sockaddr *sa)

Return the "don't care" sockaddr in the same family as
`sa'. This is the address a client should sobind(9) if it
does not care the source address and, if applicable, the
port et cetera that it uses.

const void *sockaddr_anyaddr(const struct sockaddr *sa, socklen_t *slenp)

Return the "don't care" sockaddr in the same family as
`sa'. This is the address a client should sobind(9) if it
does not care the source address and, if applicable, the
port et cetera that it uses.


Revision tags: nick-csl-alignment-base5
# 1.119 06-Sep-2007 rmind

do_sys_sendmsg: Plug a possible leak.
From CID: 4535


# 1.118 01-Sep-2007 dsl

Don't error calls to copy socket addresses to userspace when the application
has provided a non-null buffer pointer and a zero length.


# 1.117 27-Aug-2007 dsl

ktrace socket control structures (ie msghdr, address etc) using ktrkuser().


# 1.116 15-Aug-2007 ad

branches: 1.116.2;
Changes to make ktrace LKM friendly and reduce ifdef KTRACE. Proposed
on tech-kern.


Revision tags: matt-mips64-base nick-csl-alignment-base
# 1.115 15-Jul-2007 dsl

branches: 1.115.2; 1.115.6;
Remove non-user flags (especially MSG_IOVUSRSPACE) from mp->msg_flags
before passing to so_receive.
This may (or may not) have any effect...


Revision tags: mjf-ufs-trans-base
# 1.114 01-Jul-2007 dsl

Check for SOL_SOCKET when checking for SCM_RIGHTS.


# 1.113 24-Jun-2007 dsl

Split sys_getpeername() and sys_getsockname() so they can be called when the
'name' is wanted in kernel code.
Similarly split sys_accept() and change the split in recvmsg() so that it
is useful to the compat functions, recvit() is removed and replaced by
do_sys_recvmsg().
Factor out the code that writes socked names to userspace (from mbuf) to
avoid replicated code.
Extract the code that writes socket 'control' (CMSG) data out to userspace,
being more careful about the 'fd' that may exist inside SCM_RIGHTS msgs.
(they still get lost if some of the latter copyout calls fail).
Since these are new functions, and old LKMs will fail to load.


# 1.112 02-Jun-2007 enami

- Fix obvious typos so that sendto(2) works.
- Wrap lines again.


# 1.111 01-Jun-2007 dsl

Split sys_bind() and sys_connect() so that compat code can use common code
once the 'address' has been copied into an mbuf.
Add extra flags for 'struct msghdr.msg_flags' to indicate that the address
and control are already in mbufs, and that the uio structure is in userspace
for sending data, rename sendit() to do_sys_sendmsg() to ensure no old code
passes in random flags.
Changes to compat code to use new functions - removing some stackgap use.
Fix a 'use after free' in compat_43_sys_recvmsg.
I ***THINK*** the code that converts 'cmsg' formatted data is borked!
svr4_stream.c ought to be generated from svr4_32_stream.c during the build.


Revision tags: yamt-idlelwp-base8
# 1.110 13-May-2007 dsl

Fallout from caddr_t deletion - remove a load of redundant (void *) casts.


# 1.109 18-Apr-2007 yamt

sys_accept: fix usecount botch and double soclose in rev.1.108.


# 1.108 15-Apr-2007 yamt

sys_accept: don't leak a socket on error.


Revision tags: thorpej-atomic-base
# 1.107 04-Mar-2007 christos

branches: 1.107.2; 1.107.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


Revision tags: ad-audiomp-base post-newlock2-merge
# 1.106 09-Feb-2007 ad

branches: 1.106.2;
Merge newlock2 to head.


Revision tags: newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base netbsd-4-base
# 1.105 01-Nov-2006 yamt

branches: 1.105.2;
remove some __unused from function parameters.


# 1.104 23-Oct-2006 elad

PR/34873: Ryo Shimizu: sendmsg() can cause kernel panic

Patch applied, thanks!


Revision tags: yamt-splraiseipl-base2
# 1.103 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


Revision tags: yamt-splraiseipl-base yamt-pdpolicy-base9 yamt-pdpolicy-base8 rpaulo-netinet-merge-pcb-base
# 1.102 22-Aug-2006 seanb

branches: 1.102.2; 1.102.4;
Don't leave a dangling socket (no associated struct file) if
user supplied a bad name or anamelen parameter to accept(2).
If bad paramaters were suplied and a copyout() failed, the
struct file was cleaned up but not the associated socket. This
could leave sockets in CLOSE_WAIT that could never be closed.


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base7
# 1.101 23-Jul-2006 ad

branches: 1.101.2;
Use the LWP cached credentials where sane.


# 1.100 26-Jun-2006 mrg

version the socket(2) syscall. for compat30 socket, we use
EPROTONOSUPPORT instead of EAFNOSUPPORT.

from pavel@ with a little bit of clean up from myself.

XXX: netbsd32 (and perhaps other emulations) should be able
XXX: to call the standard socket calls for this i think, but
XXX: revisit this at another time.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base yamt-pdpolicy-base5 chap-midi-base simonb-timecounters-base
# 1.99 16-May-2006 christos

branches: 1.99.4;
Don't set mature an fd that has been ffree'd


Revision tags: elad-kernelauth-base
# 1.98 11-May-2006 christos

Add MSG_NOSIGNAL (from FreeBSD)


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2 yamt-pdpolicy-base
# 1.97 01-Mar-2006 yamt

branches: 1.97.2; 1.97.4; 1.97.6;
merge yamt-uio_vmspace branch.

- use vmspace rather than proc or lwp where appropriate.
the latter is more natural to specify an address space.
(and less likely to be abused for random purposes.)
- fix a swdmover race.


Revision tags: yamt-uio_vmspace-base5
# 1.96 26-Dec-2005 perry

branches: 1.96.2; 1.96.4; 1.96.6;
u_intN_t -> uintN_t


# 1.95 11-Dec-2005 christos

merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base ktrace-lwp-base
# 1.94 03-Sep-2005 martin

In adjust_rights() Use CMSG_SPACE() to calculate the number of
filedescriptors passed in this message - the counterpart in
unp_externalize does this as well.

Note that CMSG_SPACE(0) does not make sense, since it does not invoke
the alignment magic - so use CMSG_SPACE(sizeof(int)) and adjust the
calculated total later.

This fixes the postfix conection cache for 64bit platforms. Previously
the number of passed filed descriptors (nfds) would have been
calculeted too high, causing the fdrelease() of uninitialized junk.


# 1.93 03-Sep-2005 martin

minor knf tweak


# 1.92 30-May-2005 martin

branches: 1.92.2;
Close additional file descriptors if we set MSG_CTRUNC in a SCM_RIGHTS
message. From der Mouse in PR kern/30370.


# 1.91 29-May-2005 christos

- add const.
- remove unnecessary casts.
- add __UNCONST casts and mark them with XXXUNCONST as necessary.


Revision tags: yamt-km-base4 yamt-km-base3 netbsd-3-base kent-audio2-base
# 1.90 26-Feb-2005 perry

branches: 1.90.2;
nuke trailing whitespace


Revision tags: yamt-km-base2 yamt-km-base kent-audio1-beforemerge kent-audio1-base
# 1.89 30-Nov-2004 christos

branches: 1.89.4; 1.89.6;
Cloning cleanup:
1. make fileops const
2. add 2 new negative errno's to `officially' support the cloning hack:
- EDUPFD (used to overload ENODEV)
- EMOVEFD (used to overload ENXIO)
3. Created an fdclone() function to encapsulate the operations needed for
EMOVEFD, and made all cloners use it.
4. Centralize the local noop/badop fileops functions to:
fnullop_fcntl, fnullop_poll, fnullop_kqfilter, fbadop_stat


# 1.88 22-May-2004 jonathan

Eliminate several uses of `curproc' from the socket-layer code and from NFS.

Add a new explicit `struct proc *p' argument to socreate(), sosend().
Use that argument instead of curproc. Follow-on changes to pass that
argument to socreate(), sosend(), and (*so->so_send)() calls.
These changes reviewed and independently recoded by Matt Thomas.

Changes to soreceive() and (*dom->dom_exernalize() from Matt Thomas:
pass soreceive()'s struct uio* uio->uio_procp to unp_externalize().
Eliminate curproc from unp_externalize. Also, now soreceive() uses
its uio->uio_procp value, pass that same value downward to
((pr->pru_usrreq)() calls for consistency, instead of (struct proc * )0.

Similar changes in sys/nfs to eliminate (most) uses of curproc,
either via the req-> r_procp field of a struct nfsreq *req argument,
or by passing down new explicit struct proc * arguments.

Reviewed by: Matt Thomas, posted to tech-kern.
NB: The (*pr->pru_usrreq)() change should be tested on more (all!) protocols.


# 1.87 18-May-2004 ragge

Fix connect() "bug": If connect() is interrupted by a signal, the connection
attempt is terminated, so if a process needs frequent timer interrupts
it can't ever connect() to a machine far away.

Bug found by Erik Lundgren, bugfix (for the same problem) is similar to
the way FreeBSD solved the same problem.

As a side effect, the new connect() behaviour conformes to Posix.


Revision tags: netbsd-2-0-base
# 1.86 29-Nov-2003 matt

branches: 1.86.2;
Restore a change that made AF_LOCAL sockets block on connect(2) until
accepted. However, this time this behavor is not the default. Instead
it must enabled by using the LOCAL_CONNWAIT socket option on either the
connecting or accepting socket.


# 1.85 29-Nov-2003 perry

Revert a change that altered the semantics of AF_LOCAL sockets. Sadly
this made us API incompatible with other Unixes.


# 1.84 13-Nov-2003 chs

eliminate uvm_useracc() in favor of checking the return value of
copyin() or copyout().

uvm_useracc() tells us whether the mapping permissions allow access to
the desired part of an address space, and many callers assume that
this is the same as knowing whether an attempt to access that part of
the address space will succeed. however, access to user space can
fail for reasons other than insufficient permission, most notably that
paging in any non-resident data can fail due to i/o errors. most of
the callers of uvm_useracc() make the above incorrect assumption. the
rest are all misguided optimizations, which optimize for the case
where an operation will fail. we'd rather optimize for operations
succeeding, in which case we should just attempt the access and handle
failures due to insufficient permissions the same way we handle i/o
errors. since there appear to be no good uses of uvm_useracc(), we'll
just remove it.


# 1.83 04-Sep-2003 matt

Adapt to the new calling conventions of unp_connect2


# 1.82 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.81 29-Jun-2003 fvdl

branches: 1.81.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.


# 1.80 28-Jun-2003 darrenr

Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V


# 1.79 05-Apr-2003 christos

PR/21030: Naoto Shimazaki: fcntl to accepted socket does not work properly


# 1.78 26-Feb-2003 matt

Remove leftover MBUFTRACE asserts.


# 1.77 26-Feb-2003 drochner

deactivate MBUFTRACE related KASSERT()s in the !MBUFTRACE case


# 1.76 26-Feb-2003 matt

Add MBUFTRACE kernel option.
Do a little mbuf rework while here. Change all uses of MGET*(*, M_WAIT, *)
to m_get*(M_WAIT, *). These are not performance critical and making them
call m_get saves considerable space. Add m_clget analogue of MCLGET and
make corresponding change for M_WAIT uses.
Modify netinet, gem, fxp, tulip, nfs to support MBUFTRACE.
Begin to change netstat to use sysctl.


# 1.75 18-Jan-2003 thorpej

Merge the nathanw_sa branch.


Revision tags: nathanw_sa_before_merge fvdl_fs64_base gmcgarry_ctxsw_base gmcgarry_ucred_base nathanw_sa_base
# 1.74 26-Nov-2002 christos

si_ -> sel_ to avoid conflicts with siginfo.


# 1.73 25-Nov-2002 itojun

no need for error check after MEXTMALLOC - jdolecek


# 1.72 25-Nov-2002 itojun

MEXTMALLOC() can fail even if M_WAITOK, if arg is too big for malloc().


Revision tags: kqueue-aftermerge
# 1.71 23-Oct-2002 jdolecek

merge kqueue branch into -current

kqueue provides a stateful and efficient event notification framework
currently supported events include socket, file, directory, fifo,
pipe, tty and device changes, and monitoring of processes and signals

kqueue is supported by all writable filesystems in NetBSD tree
(with exception of Coda) and all device drivers supporting poll(2)

based on work done by Jonathan Lemon for FreeBSD
initial NetBSD port done by Luke Mewburn and Jason Thorpe


Revision tags: kqueue-beforemerge kqueue-base
# 1.70 04-Sep-2002 matt

Use the queue macros from <sys/queue.h> instead of referring to the queue
members directly. Use *_FOREACH whenever possible.


Revision tags: gehenna-devsw-base
# 1.69 31-May-2002 itojun

support setsockopt() with larger data (up to MCLBYTES).
From: Hitoshi Asaeda <Hitoshi.Asaeda@sophia.inria.fr>


Revision tags: netbsd-1-6-PATCH002-RELEASE netbsd-1-6-PATCH002 netbsd-1-6-PATCH002-RC4 netbsd-1-6-PATCH002-RC3 netbsd-1-6-PATCH002-RC2 netbsd-1-6-PATCH002-RC1 netbsd-1-6-PATCH001 netbsd-1-6-PATCH001-RELEASE netbsd-1-6-PATCH001-RC3 netbsd-1-6-PATCH001-RC2 netbsd-1-6-PATCH001-RC1 netbsd-1-6-RELEASE netbsd-1-6-RC3 netbsd-1-6-RC2 netbsd-1-6-RC1 netbsd-1-6-base eeh-devprop-base newlock-base ifpoll-base
# 1.68 11-Feb-2002 jdolecek

branches: 1.68.8;
Switch default for pipes to the faster John S. Dyson's implementation.
Old, socketpair-based ones are available with option PIPE_SOCKETPAIR.


Revision tags: thorpej-mips-cache-base
# 1.67 12-Nov-2001 lukem

add RCSIDs


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2
# 1.66 16-Sep-2001 wiz

branches: 1.66.2;
Spell 'occurred' with two 'r's.


Revision tags: post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.65 17-Jul-2001 jdolecek

branches: 1.65.2;
Expel MSG_COMPAT/COMPAT_OLDSOCK, make the COMPAT_43 wrappers
arrange things as needed. Unfortunately, the check in sockargs()
have to stay, since 4.3BSD bind(2), connect(2) and sendto(2) were
not versioned at the time :(

This code was tested to pass regression tests.


# 1.64 01-Jul-2001 matt

branches: 1.64.2;
Use consistent types for len. Limit sockarg length to reasonable values.


# 1.63 25-Jun-2001 jdolecek

Back off the sendit()/recvit() change, some have problems with it


# 1.62 25-Jun-2001 jdolecek

sys_connect(): fix the call to FILE_UNUSE() so that it's done on return, rather
than immediatelly after getsock() call


# 1.61 25-Jun-2001 jdolecek

Add 'kernsa' parameter for sendit()/recvit(); if nonzero, msg->msg_name
is supposed to point directly to struct mbuf or struct sockaddr in kernel
space as appropriate, rather than being a pointer to memory in userland.

This is to be used by compat/* when emulation needs to wrap
send{to|msg}(2)/recv{from|msg}(2) and modify the passed struct
sockaddr.


# 1.60 16-Jun-2001 jdolecek

Add port of high performance pipe implementation written by John S. Dyson
for FreeBSD project. Besides huge speed boost compared with socketpair-based
pipes, this implementation also uses pagable kernel memory instead of mbufs.

Significant differences to FreeBSD version:
* uses uvm_loan() facility for direct write
* async/SIGIO handling correct also for sync writer, async reader
* limits settable via sysctl, amountpipekva and nbigpipes available via sysctl
* pipes are unidirectional - this is enforced on file descriptor level
for now only, the code would be updated to take advantage of it
eventually
* uses lockmgr(9)-based locks instead of home brew variant
* scatter-gather write is handled correctly for direct write case, data
is transferred by PIPE_DIRECT_CHUNK bytes maximum, to avoid running out of kva

All FreeBSD/NetBSD specific code is within appropriate #ifdef, in preparation
to feed changes back to FreeBSD tree.

This pipe implementation is optional for now, add 'options NEW_PIPE'
to your kernel config to use it.


# 1.59 14-Jun-2001 thorpej

Fix a partial construction problem that can cause race conditions
between creation of a file descriptor and close(2) when using kernel
assisted threads. What we do is stick descriptors in the table, but
mark them as "larval". This causes essentially everything to treat
it as a non-existent descriptor, except for fdalloc(), which sees a
filled slot so that it won't (incorrectly) allocate it again. When
a descriptor is fully constructed, the code that has constructed it
marks it as "mature" (which actually clears the "larval" flag), and
things continue to work as normal.

While here, gather all the code that gets a descriptor from the table
into a fd_getfile() function, and call it, rather than having the
same (sometimes incorrect) code copied all over the place.


# 1.58 06-May-2001 manu

implement the recently introduced EMUL_BSD_ASYNCIO_PIPE emulation package
flag.

EMUL_BSD_ASYNCIO_PIPE notes that the emulated binaries expect the original
BSD pipe behavior for asynchronous I/O, which is to fire SIGIO on read() and
write(). OSes without this flag do not expect any SIGIO to be fired on
read() and write() for pipes, even when async I/O was requested. As far as
we know, the OSes that need EMUL_BSD_ASYNCIO_PIPE are NetBSD, OSF/1 and
Darwin.


Revision tags: thorpej_scsipi_beforemerge thorpej_scsipi_nbase thorpej_scsipi_base
# 1.57 27-Feb-2001 lukem

branches: 1.57.2;
convert to ANSI KNF


# 1.56 10-Dec-2000 fvdl

Make sobind() take a struct proc *. It already took curproc and
passed it down to the appropriate usrreq function, and this
allows usage for contexts that need to be explicitly different
from curproc (like in the NFS code when binding to a reserved port).


# 1.55 24-Nov-2000 jdolecek

define COMPAT_OLDSOCK unconditionally - the code is used virtually for all
emulations besides NetBSD, and this way it's LKM-safe


# 1.54 02-Aug-2000 thorpej

MALLOC()/FREE() are not to be used for variable sized allocations.


# 1.53 27-Jun-2000 mrg

remove include of <vm/vm.h>


Revision tags: netbsd-1-5-RELEASE netbsd-1-5-BETA2 netbsd-1-5-BETA netbsd-1-5-ALPHA2 netbsd-1-5-base minoura-xpg4dl-base
# 1.52 27-May-2000 sommerfeld

branches: 1.52.4;
Reduce use of curproc in several places:

- Change ktrace interface to pass in the current process, rather than
p->p_tracep, since the various ktr* function need curproc anyway.

- Add curproc as a parameter to mi_switch() since all callers had it
handy anyway.

- Add a second proc argument for inferior() since callers all had
curproc handy.

Also, miscellaneous cleanups in ktrace:

- ktrace now always uses file-based, rather than vnode-based I/O
(simplifies, increases type safety); eliminate KTRFLAG_FD & KTRFAC_FD.
Do non-blocking I/O, and yield a finite number of times when receiving
EWOULDBLOCK before giving up.

- move code duplicated between sys_fktrace and sys_ktrace into ktrace_common.

- simplify interface to ktrwrite()


# 1.51 30-Mar-2000 augustss

Get rid of register declarations.


# 1.50 23-Mar-2000 thorpej

Implement fdremove() which is used in place of all the code that
did the "fdp->fd_ofiles[fd] = 0" assignment; fdremove() make sure
the fd_freefiles hints stay in sync.

From OpenBSD.


Revision tags: chs-ubc2-newbase wrstuden-devbsize-19991221 wrstuden-devbsize-base comdex-fall-1999-base fvdl-softdep-base
# 1.49 05-Nov-1999 mycroft

branches: 1.49.2;
Fix recent bug in sys_accept(): we must remove the file descriptor from the
file descriptor table before freeing the file description.


# 1.48 30-Oct-1999 enami

back out unnecessary stylistic changes in recent changes, to keep coding
style closer to NKF.


# 1.47 27-Oct-1999 jdolecek

minor cleanup of previous - avoid goto and code duplication


# 1.46 27-Oct-1999 darrenr

patch from Greg A. Woods to fix panic problems with code that attempts to
recover from failures to accept a socket successfully. Problem suggested
by this:
> It would appear (from two "panic: closef: count < 0" failures in less
> than 12 hours) that Darren's fix to accept(2) for lost file descriptors
> isn't quite correct. His fix inserts a call to closef() to handle one
> of several possible error conditions. However everywhere else in the
> socket code in the same file where falloc() cleanup is necessary the
> function used is ffree().


Revision tags: chs-ubc2-base
# 1.45 01-Jul-1999 itojun

branches: 1.45.2; 1.45.4; 1.45.6;
IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.


# 1.44 01-Jul-1999 darrenr

fix sys_accept() to return EOPNOTSUPP for protocols which don't support
listen/accept (PR_LISTEN flag in protosw) and detect obvious faults in
parameters passed. It is still possible for the address used for copying
the socket information to become invalid between that check and the copyout
so close the connection's allocated fd if the copyout fails so that we can
return EFAULT without allocating an fd and the application not knowing about
it. Ideally we'd be able to queue the connection back up so a later accept
could retrieve it but unfortunately that's not possible.


# 1.43 05-May-1999 thorpej

Add "use counting" to file entries. When closing a file, and it's reference
count is 0, wait for use count to drain before finishing the close.

This is necessary in order for multiple processes to safely share file
descriptor tables.


# 1.42 30-Apr-1999 cgd

add checks for COMPAT_OSF1 in the appropriate places


Revision tags: netbsd-1-4-RELEASE netbsd-1-4-base
# 1.41 10-Feb-1999 kleink

branches: 1.41.2; 1.41.4; 1.41.6;
* Due to addition and use of socklen_t, make the socket option and address
arguments passed to accept(), bind(), connect(), getpeername(), getsockname(),
getsockopt(), recvfrom(), sendto() and sendmsg() unsigned, which also elimiates
a few casts.
* Reflect the (now) signedness of msg_iovlen, which necessiates the addition
of a few casts.


# 1.40 18-Dec-1998 drochner

solve the COMPAT_OLDSOCK/MSG_COMPAT problem differently:
The source files which need MSG_COMPAT define COMPAT_OLDSOCK.


Revision tags: kenh-if-detach-base
# 1.39 26-Nov-1998 mycroft

Revert the functional change in rev 1.38; permit a msg_iovlen of 0.
There are two reasons for this:
* We should be able to pass file descriptors without sending any data.
* We could send zero-length iovecs anyway (but we shouldn't have to do this).
Also, msg_iovlen is already a u_int, so delete a bunch of casts.


Revision tags: chs-ubc-base
# 1.38 04-Aug-1998 kleink

Per XNS Issue 5, calling recvmsg(2) or sendmsg(2) with an msg.msg_iovlen less
than or equal to 0 shall fail with EMSGSIZE; the latter condition was not being
checked for. Also, document the msg.msg_iovlen > {IOV_MAX} case.


# 1.37 04-Aug-1998 kleink

UIO_MAXIOV -> IOV_MAX


# 1.36 04-Aug-1998 perry

Abolition of bcopy, ovbcopy, bcmp, and bzero, phase one.
bcopy(x, y, z) -> memcpy(y, x, z)
ovbcopy(x, y, z) -> memmove(y, x, z)
bcmp(x, y, z) -> memcmp(x, y, z)
bzero(x, y) -> memset(x, 0, y)


# 1.35 03-Aug-1998 kleink

Fix two off-by-one bugs, both present in each recvmsg(2) and sendmsg(2):
* the first one would cause an unnecessary malloc() of iovec storage for
a msg_iovlen of UIO_SMALLIOV although the required amount of memory has
been allocated on the stack.
* the second one would cause a recvmsg() or sendmsg() with a msg_iovlen of
UIO_MAXIOV to fail with EMSGSIZE, which is also a violation of XNS5.


# 1.34 31-Jul-1998 perry

fix sizeofs so they comply with the KNF style guide. yes, it is pedantic.


Revision tags: eeh-paddr_t-base
# 1.33 29-Jul-1998 thorpej

branches: 1.33.2;
When checking for overflow in the residual count, test against SSIZE_MAX.
The read/write system calls return ssize_t because -1 is used to indicate
error, therefore the transfer size MUST be limited to SSIZE_MAX, otherwise
garbage can be returned to the user.

There is NO change from existing behavior here, only a more precise
definition of that the semantics are, except in the Alpha case, where
the full SSIZE_MAX transfer size can now be realized (ssize_t is 64-bit
on the Alpha).


# 1.32 18-Jul-1998 lukem

use AF_LOCAL instead of AF_UNIX


# 1.31 25-Jun-1998 thorpej

defopt KTRACE


# 1.30 25-Apr-1998 matt

Hook for 0-copy (or other optimized) sends and receives


# 1.29 01-Mar-1998 fvdl

Merge with Lite2 + local changes


# 1.28 06-Feb-1998 thorpej

When copying out multiple control messages, ensure that the next control
message is aligned. From David Borman <dab@bsdi.com>.


# 1.27 07-Jan-1998 thorpej

Make insertion and removal of sockets from the partial and incoming
connections queues O(C) rather than O(N).


# 1.26 07-Jan-1998 thorpej

Fix bug in recvit() that would cause recvmsg() to only receive one
control message, even if there were multiple control messages on
the queue. From Jean-Luc Richier <Jean-Luc.Richier@imag.fr>, in
bug report kern/4700.


Revision tags: netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base thorpej-signal-base marc-pcmcia-bp marc-pcmcia-base
# 1.25 26-Jun-1997 thorpej

branches: 1.25.8;
Use UCHAR_MAX rather than "255" when sanity-checking the length of a
sockaddr in sockargs().


# 1.24 26-Jun-1997 thorpej

In sockargs():
- Add a comment describing my feelings about this interface, in general.
- Remove the COMPAT_OLDSOCK length hack. Instead, if the socket argument
is too long to fit in an mbuf, allocate enough external storage to
hold it.
- If the socket argument is a sockaddr, don't allow the length to be
greater than 255, as that would overflow sa_len.
Many thanks to enami tsugutomo <enami@cv.sony.co.jp> for his sanity checking.


Revision tags: is-newarp-before-merge is-newarp-base
# 1.23 22-Dec-1996 cgd

* catch up with system call argument type fixups/const poisoning.
* Fix arguments to various copyin()/copyout() invocations, to avoid
gratuitous casts.
* Some KNF formatting fixes
* Change sockargs()'s second argument to be a const void *, to help
with dealing with the syscall argument type fixups/const poisoning.


# 1.22 14-Jun-1996 cgd

avoid unnecessary checks of m_get/MGET/etc.'s return values. When
they're called with M_WAIT, they are defined to never return NULL.


# 1.21 22-May-1996 mycroft

Pass a proc pointer down to the usrreq and pcbbind functions for PRU_ATTACH, PRU_BIND and
PRU_CONTROL. The usrreq interface really needs to be split up, but this will have to wait.
Remove SS_PRIV completely.


Revision tags: netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.20 17-May-1996 pk

branches: 1.20.4;
Don't touch retval[] in socketpair(); manual page says this system call
return 0 on success (PR#2428).


# 1.19 09-Feb-1996 christos

More proto fixes


# 1.18 04-Feb-1996 christos

First pass at prototyping


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.17 10-Oct-1995 mycroft

Add hooks for COMPAT_FREEBSD, from Noriyuki Soda.


# 1.16 07-Oct-1995 mycroft

Prefix names of system call implementation functions with `sys_'.


# 1.15 19-Sep-1995 thorpej

Make system calls conform to a standard prototype and bring those
prototypes into scope.


# 1.14 12-Aug-1995 mycroft

splnet --> splsoftnet


# 1.13 24-Jun-1995 christos

Extracted all of the compat_xxx routines, and created a library [libcompat]
for them. There are a few #ifdef COMPAT_XX remaining, but they are not easy
or worth eliminating (yet).


# 1.12 10-May-1995 christos

tty_tb.c: need to include ioctl_compat.h in order to compile.
sysv_shm.c: make shm_find_segment_by_shmid global so it can be used by
COMPAT_HPUX. There should be a better way...
rest: Add #ifdef COMPAT_HPUX where needed


# 1.11 05-Mar-1995 fvdl

Extended a couple of defines with "|| defined(COMPAT_LINUX)" to make
things compile without requiring COMPAT_43 and/or COMPAT_09.


# 1.10 30-Oct-1994 cgd

be more careful with types, also pull in headers where necessary.


# 1.9 20-Oct-1994 cgd

update for new syscall args description mechanism


Revision tags: netbsd-1-0-PATCH06 netbsd-1-0-PATCH05 netbsd-1-0-PATCH04 netbsd-1-0-PATCH03 netbsd-1-0-PATCH02 netbsd-1-0-PATCH1 netbsd-1-0-PATCH0 netbsd-1-0-RELEASE netbsd-1-0-base
# 1.8 29-Jun-1994 cgd

New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.7 04-May-1994 mycroft

Add return types where missing. Simplify some of the compat conditionals.
Include compat code if COMPAT_SUNOS with or without COMPAT_43.


# 1.6 18-Dec-1993 mycroft

Canonicalize all #includes.


Revision tags: magnum-base netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.5 17-Jul-1993 mycroft

branches: 1.5.4;
Finish moving struct definitions outside of function declarations.


# 1.4 27-Jun-1993 andrew

* ansifications
* Yuval Yarom's socket recv(2) fixes - access rights problems (see also
uipc_socket.c).


# 1.3 22-May-1993 cgd

add include of select.h if necessary for protos, or delete if extraneous


# 1.2 18-May-1993 cgd

make kernel select interface be one-stop shopping & clean it all up.


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


# 1.200 23-May-2020 ad

Move proc_lock into the data segment. It was dynamically allocated because
at the time we had mutex_obj_alloc() but not __cacheline_aligned.


Revision tags: bouyer-xenpvh-base2 phil-wifi-20200421 bouyer-xenpvh-base1 phil-wifi-20200411 bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3 netbsd-9-0-RELEASE netbsd-9-0-RC2 ad-namecache-base2 ad-namecache-base1 ad-namecache-base netbsd-9-0-RC1 phil-wifi-20191119 netbsd-9-base phil-wifi-20190609 isaki-audio2-base pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126
# 1.199 12-Nov-2018 hannken

sys_recvmmsg: don't defer an error that already gets returned.


# 1.198 07-Nov-2018 hannken

Don't defer errors from sendmmsg(). This matches the linux manpage.

Defer errors from recvmmsg() through so_rerror and
tests and return a deferred error on entry.

Ok: christos@


Revision tags: pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.197 03-Sep-2018 riastradh

Rename min/max -> uimin/uimax for better honesty.

These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER! Some subsystems have

#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)


# 1.196 01-Aug-2018 rjs

Add ioctl(2) handler for kernel part of sctp_peeloff().


# 1.195 31-Jul-2018 rjs

Add getsockopt2() syscall.


Revision tags: pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521
# 1.194 04-May-2018 christos

branches: 1.194.2;
define MBUFTYPES here.


# 1.193 03-May-2018 christos

Fix COMPAT_NETBSD32 cmsg handling:

1. alignment was wrong for > 1 message
2. macros were doing incorrect pointer comparisons, fortunately ending
the iteration early after the fists cmsg instead of crashing.
3. don't output 32 bit ktrace records for cmsg. 32 bit programs running
under emulation on 64 bit systems should produce 64 bit ktrace records
so that the native ktrace can handle the records; remove extra arguments
that are now not needed (the 32 bit msghdr).
4. output the correct type for cmsg trace records.
5. output all the cmsg records in traces instead of just the first one.

Welcome to 8.99.15 because of the argument removal.

XXX: Really all the code should be changed to use the CMSG_{FIRST,NXT}HDR
macros...


Revision tags: pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322
# 1.192 16-Mar-2018 christos

PR/53103: Timo Buhrmester: linux emulation of sendto(2) broken

The sockargs refactoring broke it, because sockargs only works with a user
address. Added an argument to sockargs to indicate where the address is
coming from. Welcome to 8.99.14.


Revision tags: pgoyette-compat-0315 pgoyette-compat-base
# 1.191 12-Feb-2018 maxv

branches: 1.191.2;
Add a KASSERT; we expect *from to be a single mbuf (not chained).


# 1.190 04-Jan-2018 christos

Add a new sockopt member to keep track of the actual size of the option
that should be returned to the caller in getsockopt(2).

(Tom Ivar Helbekkmo)


# 1.189 31-Dec-2017 christos

pass valsize for getsockopt like we do for setsockopt


# 1.188 26-Dec-2017 kamil

Refactor pipe1() and correct a bug in sys_pipe2() (SYS_pipe2)

sys_pipe2() returns two integers (values), the 2nd one is a copy of the 2nd
file descriptor that lands in fildes[2]. This is a side effect of reusing
the code for sys_pipe() (SYS_pipe) and not cleaning it up.

The first returned value is (on success) 0.

Introduced a small refactoring in pipe1() that it does not operate over
retval[], but on an array int[2]. A user sets retval[] for pipe() when
desired and needed.

This refactoring touches compat code: netbsd32, linux, linux32.

Before the changes on NetBSD/amd64:

$ ktruss -i ./a.out
[...]
15131 1 a.out pipe2(0x7f7fff2e62b8, 0) = 0, 4
[...]

After the changes:

$ ktruss -i ./a.out
[...]
782 1 a.out pipe2(0x7f7fff97e850, 0) = 0
[...]

There should not be a visible change for current users.

Sponsored by <The NetBSD Foundation>


Revision tags: tls-maxphys-base-20171202 nick-nhusb-base-20170825 perseant-stdc-iso10646-base
# 1.187 20-Jun-2017 christos

Change len type to be unsigned int for consistency with the input type.
Don't check for negative; it does not matter we clamp anyway. This
broke the compat32 getsockname() where an unitialized socklen_t ended
up randomly negative causing it to fail.


Revision tags: matt-nb8-mediatek-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204
# 1.186 03-Feb-2017 christos

branches: 1.186.6;
expose sendmsg_so and recvmsg_so.


# 1.185 02-Feb-2017 christos

expose copyout_sockname_sb


Revision tags: bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204
# 1.184 03-Dec-2016 christos

branches: 1.184.2;
Add missing ktrkuser


Revision tags: pgoyette-localcount-20161104 nick-nhusb-base-20161004 localcount-20160914
# 1.183 13-Sep-2016 martin

Make the ktrace record written by do_sys_sendmsg/do_sys_recvmsg overridable
by the caller. Use this in compat_netbsd32 to log the 32bit version, so
the 32bit userland kdump is happy.


Revision tags: pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.182 07-Jul-2016 msaitoh

branches: 1.182.2;
KNF. Remove extra spaces. No functional change.


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.181 01-Nov-2015 christos

Don't overwrite the user iov pointer in sendmmsg. Make the send and receive
code look the same.


Revision tags: nick-nhusb-base-20150921
# 1.180 24-Aug-2015 pooka

sprinkle _KERNEL_OPT


# 1.179 22-Jul-2015 maxv

Memory leak. Triggerable from an unprivileged user via COMPAT_43.


Revision tags: nick-nhusb-base-20150606
# 1.178 09-May-2015 rtr

change sosend() to accept sockaddr * instead of mbuf * for nam.

bump to 7.99.16


# 1.177 02-May-2015 rtr

make connect syscall use sockaddr_big and modify pr_{send,connect}
nam parameter type from buf * to sockaddr *.

final commit for parameter type changes to protocol user requests

* bump kernel version to 7.99.15 for parameter type changes to pr_{send,connect}


# 1.176 24-Apr-2015 rtr

make accept, getsockname and getpeername syscalls use sockaddr_big and modify
pr_{accept,sockname,peername} nam parameter type from mbuf * to sockaddr *.

* retained use of mbuftypes[MT_SONAME] for now.
* bump to netbsd version 7.99.12 for parameter type change.

patch posted to tech-net@ 2015/04/19


Revision tags: nick-nhusb-base-20150406
# 1.175 03-Apr-2015 rtr

* change pr_bind to accept struct sockaddr * instead of struct mbuf *
* update protocol bind implementations to use/expect sockaddr *
instead of mbuf *
* introduce sockaddr_big struct for storage of addr data passed via
sys_bind; sockaddr_big is of sufficient size and alignment to
accommodate all addr data sizes received.
* modify sys_bind to allocate sockaddr_big instead of using an mbuf.
* bump kernel version to 7.99.9 for change to pr_bind() parameter type.

Patch posted to tech-net@
http://mail-index.netbsd.org/tech-net/2015/03/15/msg005004.html

The choice to use a new structure sockaddr_big has been retained since
changing sockaddr_storage size would lead to unnecessary ABI change. The
use of the new structure does not preclude future work that increases
the size of sockaddr_storage and at that time sockaddr_big may be
trivially replaced.

Tested by mrg@ and myself, discussed with rmind@, posted to tech-net@


# 1.174 06-Mar-2015 rtr

Return EINVAL if namelen isn't large enough to encompass the expected
members of sockaddr structures. i.e. sa_len and sa_family.

Discussed with and patch by christos@


Revision tags: nick-nhusb-base
# 1.173 05-Sep-2014 matt

branches: 1.173.2;
Try not to use f_data, use f_{vnode,socket,pipe,mqueue,kqueue,ksem} to get
a correctly typed pointer.


Revision tags: netbsd-7-0-RC2 netbsd-7-0-RC1 netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.172 09-Aug-2014 rtr

branches: 1.172.2;
split PRU_CONNECT2 & PRU_PURGEIF function out of pr_generic() usrreq
switches and put into separate functions

- always KASSERT(solocked(so)) even if not implemented
(for PRU_CONNECT2 only)

- replace calls to pr_generic() with req = PRU_CONNECT2 with calls to
pr_connect2()

- replace calls to pr_generic() with req = PRU_PURGEIF with calls to
pr_purgeif()

put common code from unp_connect2() (used by unp_connect() into
unp_connect1() and call out to it when needed

patch only briefly reviewed by rmind@


# 1.171 09-Jul-2014 rtr

* split PRU_PEERADDR and PRU_SOCKADDR function out of pr_generic()
usrreq switches and put into separate functions
xxx_{peer,sock}addr(struct socket *, struct mbuf *).

- KASSERT(solocked(so)) always in new functions even if request
is not implemented

- KASSERT(pcb != NULL) and KASSERT(nam) if the request is
implemented and not for tcp.

* for tcp roll #ifdef KPROF and #ifdef DEBUG code from tcp_usrreq() into
easier to cut & paste functions tcp_debug_capture() and
tcp_debug_trace()

- functions provided by rmind
- remaining use of PRU_{PEER,SOCK}ADDR #define to be removed in a
future commit.

* rename netbt functions to permit consistency of pru function names
(as has been done with other requests already split out).

- l2cap_{peer,sock}addr() -> l2cap_{peer,sock}_addr_pcb()
- rfcomm_{peer,sock}addr() -> rfcomm_{peer,sock}_addr_pcb()
- sco_{peer,sock}addr() -> sco_{peer,sock}_addr_pcb()

* split/refactor do_sys_getsockname(lwp, fd, which, nam) into
two functions do_sys_get{peer,sock}name(fd, nam).

- move PRU_PEERADDR handling into do_sys_getpeername() from
do_sys_getsockname()
- have svr4_stream directly call do_sys_get{sock,peer}name()
respectively instead of providing `which' & fix a DPRINTF string
that incorrectly wrote "getpeername" when it meant "getsockname"
- fix sys_getpeername() and sys_getsockname() to call
do_sys_get{sock,peer}name() without `which' and `lwp' & adjust
comments
- bump kernel version for removal of lwp & which parameters from
do_sys_getsockname()

note: future cleanup to remove struct mbuf * abuse in
xxx_{peer,sock}name()
still to come, not done in this commit since it is easier to do post
split.

patch reviewed by rmind

welcome to 6.99.47


Revision tags: rmind-smpnet-nbase rmind-smpnet-base
# 1.170 18-May-2014 rmind

Add struct pr_usrreqs with a pr_generic function and prepare for the
dismantling of pr_usrreq in the protocols; no functional change intended.
PRU_ATTACH/PRU_DETACH changes will follow soon.

Bump for struct protosw. Welcome to 6.99.62!


# 1.169 17-May-2014 rmind

- fsocreate: set SS_NBIO before the file descriptor is affixed as there is
a theoretical race condition (hard to trigger, though); remove the LWP
parameter and clean up the code a little.
- Sprinkle few comments.
- Remove M_SOOPTS while here.


# 1.168 17-May-2014 rmind

makesocket: set SS_NBIO slightly earlier.


# 1.167 17-May-2014 rmind

Remove trailing whitespaces, wrap long lines, minor KNF; no functional changes.


Revision tags: yamt-pagecache-base9
# 1.166 07-Apr-2014 seanb

Fix a case where an erroneous EAGAIN was returned out of recvmmsg.
This occured when some, but not all of the mmsg array members
were filled with data from a non-blocking socket.
PR kern/48725


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.165 09-Oct-2013 christos

branches: 1.165.2;
delete extra m_len initialization.


# 1.164 09-Oct-2013 christos

PR/48292: Justin Cormack: paccept creates sockets that cannot be made blocking
Reset the socket flags not just the file flags for non-blocking I/O.
XXX: pullup 6


# 1.163 08-Oct-2013 christos

PR/47591: Michael Plass: If the unix socket is closed before accept,
unp->unp_conn will be NULL in PRU_ACCEPT, as called from
sys_accept->so_accept. This will cause the usrreq to return with
no error, leaving the mbuf gotten from m_get() with an uninitialized
length, containing junk from a previous call. Initialize m_len to
be 0 to handle this case. This is yet another reason why Beverly's
idea of setting m_len = 0 in m_get() makes a lot of sense. Arguably
this could be an error, since the data we return now has 0 family
and length.


# 1.162 29-Aug-2013 rmind

Remove SS_ISCONFIRMING, it is unused and TP4 will not come back.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.161 03-Jun-2013 christos

branches: 1.161.2;
use the proper name for kdump pretty-printing.


Revision tags: agc-symver-base
# 1.160 14-Feb-2013 christos

PR/47569: Valery Ushakov: SOCK_NONBLOCK does not work because it does not
set SS_NBIO.
XXX: there are too many flags that mean the same thing in too many places,
and too many flags that mean the same thing and are different.


# 1.159 14-Feb-2013 riastradh

Fix some screw cases in cmsg file descriptor passing.

- Don't leave garbage in the control buffer if allocating file
descriptors fails in unp_externalize.

- Scrub the space between CMSG_LEN and CMSG_SPACE to avoid kernel
memory disclosure in unp_externalize.

- Don't read past cmsg_len when closing file descriptors that
couldn't get delivered, in free_rights.

ok christos


Revision tags: yamt-pagecache-base8
# 1.158 29-Dec-2012 mlelstv

The sanity check prevented messages that carry only ancillary data.


# 1.157 29-Dec-2012 mlelstv

If an untraced process sleeps in recvmsg/sendmsg, the syscall does not
allocate an iov structure for ktrace. When tracing is then enabled
and the process wakes up, it crashes the kernel.

Undo the last commit which introduced this error path.

Avoid the mentioned kmem_alloc assertion by adding a sanity check analog
to similar code in sys_generic.c for I/O on file handles instead of
sockets.

This also causes the syscall to return EMSGSIZE if the msg_iovlen member
of the msg structure is less than or equal to 0, as defined in
recvmsg(2)/sendmsg(2).


Revision tags: yamt-pagecache-base7 yamt-pagecache-base6
# 1.156 17-Jul-2012 njoly

branches: 1.156.2;
Avoid kmem_alloc KASSERT for 0 byte allocation, when tracing processes
that use empty messages with sendmsg/recvmsg.


# 1.155 22-Jun-2012 christos

Add {send,recv}mmsg from Linux


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2 netbsd-6-base
# 1.154 25-Jan-2012 christos

branches: 1.154.2;
revert atomics for so_options since it is a short.


# 1.153 25-Jan-2012 christos

need <sys/atomic.h>


# 1.152 25-Jan-2012 christos

Add locking, requested by yamt. Note that locking is not used everywhere
for these.


# 1.151 25-Jan-2012 christos

As discussed in tech-kern, provide the means to prevent delivery of SIGPIPE
on EPIPE for all file descriptor types:

- provide O_NOSIGPIPE for open,kqueue1,pipe2,dup3,fcntl(F_{G,S}ETFL) [NetBSD]
- provide SOCK_NOSIGPIPE for socket,socketpair [NetBSD]
- provide SO_NOSIGPIPE for {g,s}seckopt [NetBSD/FreeBSD/MacOSX]
- provide F_{G,S}ETNOSIGPIPE for fcntl [MacOSX]


# 1.150 21-Dec-2011 christos

simplify expression


# 1.149 20-Dec-2011 christos

- Eliminate so_nbio and turn it into a bit SS_NBIO in so_state.
- Introduce MSG_NBIO so that we can turn non blocking i/o on a per call basis
- Use MSG_NBIO to fix the XXX: multi-threaded issues on the fifo sockets.
- Don't set SO_CANTRCVMORE, if we were interrupted (perhaps do it for all
errors?).


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2
# 1.148 04-Nov-2011 christos

branches: 1.148.4;
Fix error I introduced in previous commit that caused asymmetric connects
when SOCK_NONBLOCK or SOCK_CLOEXEC was specified. Factor out common code
and simplify error return.


Revision tags: yamt-pagecache-base
# 1.147 21-Sep-2011 christos

branches: 1.147.2;
Put the mbuf type in the ktrace record so that we know how to decode it
in userland.


# 1.146 27-Jul-2011 uebayasi

These don't need uvm/uvm_extern.h.


# 1.145 15-Jul-2011 christos

fail with EINVAL if flags not are not O_CLOEXEC|O_NONBLOCK in pipe2(2) and
dup3(2)


# 1.144 26-Jun-2011 christos

* Arrange for interfaces that create new file descriptors to be able to
set close-on-exec on creation (http://udrepper.livejournal.com/20407.html).

- Add F_DUPFD_CLOEXEC to fcntl(2).
- Add MSG_CMSG_CLOEXEC to recvmsg(2) for unix file descriptor passing.
- Add dup3(2) syscall with a flags argument for O_CLOEXEC, O_NONBLOCK.
- Add pipe2(2) syscall with a flags argument for O_CLOEXEC, O_NONBLOCK.
- Add flags SOCK_CLOEXEC, SOCK_NONBLOCK to the socket type parameter
for socket(2) and socketpair(2).
- Add new paccept(2) syscall that takes an additional sigset_t to alter
the sigmask temporarily and a flags argument to set SOCK_CLOEXEC,
SOCK_NONBLOCK.
- Add new mode character 'e' to fopen(3) and popen(3) to open pipes
and file descriptors for close on exec.
- Add new kqueue1(2) syscall with a new flags argument to open the
kqueue file descriptor with O_CLOEXEC, O_NONBLOCK.

* Fix the system calls that take socklen_t arguments to actually do so.

* Don't include userland header files (signal.h) from system header files
(rump_syscallargs.h).

* Bump libc version for the new syscalls.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.143 24-Apr-2011 rmind

- Replace few malloc(9) uses with kmem(9).
- Rename buf_malloc() to buf_alloc(), fix comments.
- Remove some unnecessary inclusions.


# 1.142 10-Apr-2011 christos

- Add O_CLOEXEC to open(2)
- Add fd_set_exclose() to encapsulate uses of FIO{,N}CLEX, O_CLOEXEC, F{G,S}ETFD
- Add a pipe1() function to allow passing flags to the fd's that pipe(2)
opens to ease implementation of linux pipe2(2)
- Factor out fp handling code from open(2) and fhopen(2)


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base matt-mips64-premerge-20101231 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10 uebayasi-xip-base1
# 1.141 23-Apr-2010 rmind

branches: 1.141.2;
Replace M_IOV and some malloc(9)s with kmem(9), and while there:
- Fix invalid free (M_TEMP vs M_IOV) in do_sys_recvmsg(), spotted by jakllsch@.
Also, same fix in osf1_sys_sendmsg_xopen().
- Fix attempt to free non-allocated memory in error path in netbsd32___getfh30().
- Plug a memory leak in compat_43_netbsd32_orecvmsg().


Revision tags: yamt-nfs-mp-base9 uebayasi-xip-base
# 1.140 21-Jan-2010 pgoyette

branches: 1.140.2; 1.140.4;
Remove unnecessary call to kauth_cred_free().

This resolves an occassional crash I'd been experiencing as reported on
current-users@

Fix suggested by and OK elad@


# 1.139 29-Dec-2009 elad

Add credentials to to sockets.

We don't need any deferred free etc. because we no longer free the
credentials in interrupt context.

Tons of help from matt@, thanks!


# 1.138 20-Dec-2009 dsl

If a multithreaded app closes an fd while another thread is blocked in
read/write/accept, then the expectation is that the blocked thread will
exit and the close complete.
Since only one fd is affected, but many fd can refer to the same file,
the close code can only request the fs code unblock with ERESTART.
Fixed for pipes and sockets, ERESTART will only be generated after such
a close - so there should be no change for other programs.
Also rename fo_abort() to fo_restart() (this used to be fo_drain()).
Fixes PR/26567


Revision tags: matt-premerge-20091211
# 1.137 09-Dec-2009 dsl

Rename fo_drain() to fo_abort(), 'drain' is used to mean 'wait for output
do drain' in many places, whereas fo_drain() was called in order to force
blocking read()/write() etc calls to return to userspace so that a close()
call from a different thread can complete.
In the sockets code comment out the broken code in the inner function,
it was being called from compat code.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 jym-xensuspend-nbase yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.136 04-Apr-2009 ad

Add fileops::fo_drain(), to be called from fd_close() when there is more
than one active reference to a file descriptor. It should dislodge threads
sleeping while holding a reference to the descriptor. Implemented only for
sockets but should be extended to pipes, fifos, etc.

Fixes the case of a multithreaded process doing something like the
following, which would have hung until the process got a signal.

thr0 accept(fd, ...)
thr1 close(fd)


Revision tags: nick-hppapmap-base2
# 1.135 21-Jan-2009 yamt

branches: 1.135.2;
restore the pre socket locking patch signal behaviour.
this fixes a busy-loop in nfs_connect.


Revision tags: netbsd-5-0-RC1 haad-dm-base2 haad-nbase2 ad-audiomp2-base netbsd-5-base matt-mips64-base2 haad-dm-base1 wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 haad-dm-base mjf-devfs2-base
# 1.134 06-Aug-2008 plunky

branches: 1.134.2; 1.134.4;
Convert socket options code to use a sockopt structure
instead of laying everything into an mbuf.

approved by core


Revision tags: simonb-wapbl-nbase simonb-wapbl-base
# 1.133 24-Jun-2008 ad

branches: 1.133.2;
Nothing uses getsock/getvnode any more.


Revision tags: wrstuden-revivesa-base-1 yamt-pf42-base4 yamt-pf42-base3 wrstuden-revivesa-base
# 1.132 30-May-2008 rmind

branches: 1.132.2;
do_sys_accept: release the reference to sock in few error paths.
Should fix PR/38790, report and test-case by Nicolas Joly.


Revision tags: hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2
# 1.131 28-Apr-2008 martin

branches: 1.131.2;
Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.130 24-Apr-2008 ad

branches: 1.130.2;
Network protocol interrupts can now block on locks, so merge the globals
proclist_mutex and proclist_lock into a single adaptive mutex (proc_lock).
Implications:

- Inspecting process state requires thread context, so signals can no longer
be sent from a hardware interrupt handler. Signal activity must be
deferred to a soft interrupt or kthread.

- As the proc state locking is simplified, it's now safe to take exit()
and wait() out from under kernel_lock.

- The system spends less time at IPL_SCHED, and there is less lock activity.


# 1.129 24-Apr-2008 ad

Merge the socket locking patch:

- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.

With much feedback from matt@ and plunky@.


Revision tags: yamt-pf42-baseX yamt-pf42-base ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14
# 1.128 21-Mar-2008 ad

branches: 1.128.2;
Catch up with descriptor handling changes. See kern_descrip.c revision
1.173 for details.


Revision tags: keiichi-mipv6-nbase nick-net80211-sync-base keiichi-mipv6-base matt-armv6-nbase mjf-devfs-base hpcarm-cleanup-base
# 1.127 06-Feb-2008 ad

branches: 1.127.6;
Don't lock the socket to set/clear FNONBLOCK. Just set it atomically.


Revision tags: vmlocking2-base3 bouyer-xeni386-nbase bouyer-xeni386-base matt-armv6-base
# 1.126 26-Dec-2007 ad

Merge more changes from vmlocking2, mainly:

- Locking improvements.
- Use pool_cache for more items.


# 1.125 20-Dec-2007 dsl

Convert all the system call entry points from:
int foo(struct lwp *l, void *v, register_t *retval)
to:
int foo(struct lwp *l, const struct foo_args *uap, register_t *retval)
Fixup compat code to not write into 'uap' and (in some cases) to actually
pass a correctly formatted 'uap' structure with the right name to the
next routine.
A few 'compat' routines that just call standard ones have been deleted.
All the 'compat' code compiles (along with the kernels required to test
build it).
98% done by automated scripts.


# 1.124 16-Dec-2007 elad

Make solisten() take an lwp pointer like the rest, so it can be passed down
to pr_usrreq.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 vmlocking-nbase jmcneill-pm-base reinoud-bufcleanup-base
# 1.123 24-Nov-2007 dyoung

branches: 1.123.2; 1.123.6;
Pass the mbuf type (e.g., MT_SONAME, MT_SOOPTS) as the second
argument to getsockmbuf().


Revision tags: jmcneill-base bouyer-xenamd64-base2 yamt-x86pmap-base4 bouyer-xenamd64-base yamt-x86pmap-base3 yamt-x86pmap-base2 vmlocking-base
# 1.122 05-Oct-2007 dyoung

branches: 1.122.4;
Use getsombuf().


Revision tags: yamt-x86pmap-base
# 1.121 19-Sep-2007 christos

branches: 1.121.2;
minor nits; no code change.


# 1.120 19-Sep-2007 dyoung

1) Introduce a new socket option, (SOL_SOCKET, SO_NOHEADER), that
tells a socket that it should both add a protocol header to tx'd
datagrams and remove the header from rx'd datagrams:

int onoff = 1, s = socket(...);
setsockopt(s, SOL_SOCKET, SO_NOHEADER, &onoff);

2) Add an implementation of (SOL_SOCKET, SO_NOHEADER) for raw IPv4
sockets.

3) Reorganize the protocols' pr_ctloutput implementations a bit.
Consistently return ENOPROTOOPT when an option is unsupported,
and EINVAL if a supported option's arguments are incorrect.
Reorganize the flow of code so that it's more clear how/when
options are passed down the stack until they are handled.

Shorten some pr_ctloutput staircases for readability.

4) Extract common mbuf code into subroutines, add new sockaddr
methods, and introduce a new subroutine, fsocreate(), for reuse
later; use it first in sys_socket():

struct mbuf *m_getsombuf(struct socket *so)

Create an mbuf and make its owner the socket `so'.

struct mbuf *m_intopt(struct socket *so, int val)

Create an mbuf, make its owner the socket `so', put the
int `val' into it, and set its length to sizeof(int).


int fsocreate(..., int *fd)

Create a socket, a la socreate(9), put the socket into the
given LWP's descriptor table, return the descriptor at `fd'
on success.

void *sockaddr_addr(struct sockaddr *sa, socklen_t *slenp)
const void *sockaddr_const_addr(const struct sockaddr *sa, socklen_t *slenp)

Extract a pointer to the address part of a sockaddr. Write
the length of the address part at `slenp', if `slenp' is
not NULL.

socklen_t sockaddr_getlen(const struct sockaddr *sa)

Return the length of a sockaddr. This just evaluates to
sa->sa_len. I only add this for consistency with code that
appears in a portable userland library that I am going to
import.

const struct sockaddr *sockaddr_any(const struct sockaddr *sa)

Return the "don't care" sockaddr in the same family as
`sa'. This is the address a client should sobind(9) if it
does not care the source address and, if applicable, the
port et cetera that it uses.

const void *sockaddr_anyaddr(const struct sockaddr *sa, socklen_t *slenp)

Return the "don't care" sockaddr in the same family as
`sa'. This is the address a client should sobind(9) if it
does not care the source address and, if applicable, the
port et cetera that it uses.


Revision tags: nick-csl-alignment-base5
# 1.119 06-Sep-2007 rmind

do_sys_sendmsg: Plug a possible leak.
From CID: 4535


# 1.118 01-Sep-2007 dsl

Don't error calls to copy socket addresses to userspace when the application
has provided a non-null buffer pointer and a zero length.


# 1.117 27-Aug-2007 dsl

ktrace socket control structures (ie msghdr, address etc) using ktrkuser().


# 1.116 15-Aug-2007 ad

branches: 1.116.2;
Changes to make ktrace LKM friendly and reduce ifdef KTRACE. Proposed
on tech-kern.


Revision tags: matt-mips64-base nick-csl-alignment-base
# 1.115 15-Jul-2007 dsl

branches: 1.115.2; 1.115.6;
Remove non-user flags (especially MSG_IOVUSRSPACE) from mp->msg_flags
before passing to so_receive.
This may (or may not) have any effect...


Revision tags: mjf-ufs-trans-base
# 1.114 01-Jul-2007 dsl

Check for SOL_SOCKET when checking for SCM_RIGHTS.


# 1.113 24-Jun-2007 dsl

Split sys_getpeername() and sys_getsockname() so they can be called when the
'name' is wanted in kernel code.
Similarly split sys_accept() and change the split in recvmsg() so that it
is useful to the compat functions, recvit() is removed and replaced by
do_sys_recvmsg().
Factor out the code that writes socked names to userspace (from mbuf) to
avoid replicated code.
Extract the code that writes socket 'control' (CMSG) data out to userspace,
being more careful about the 'fd' that may exist inside SCM_RIGHTS msgs.
(they still get lost if some of the latter copyout calls fail).
Since these are new functions, and old LKMs will fail to load.


# 1.112 02-Jun-2007 enami

- Fix obvious typos so that sendto(2) works.
- Wrap lines again.


# 1.111 01-Jun-2007 dsl

Split sys_bind() and sys_connect() so that compat code can use common code
once the 'address' has been copied into an mbuf.
Add extra flags for 'struct msghdr.msg_flags' to indicate that the address
and control are already in mbufs, and that the uio structure is in userspace
for sending data, rename sendit() to do_sys_sendmsg() to ensure no old code
passes in random flags.
Changes to compat code to use new functions - removing some stackgap use.
Fix a 'use after free' in compat_43_sys_recvmsg.
I ***THINK*** the code that converts 'cmsg' formatted data is borked!
svr4_stream.c ought to be generated from svr4_32_stream.c during the build.


Revision tags: yamt-idlelwp-base8
# 1.110 13-May-2007 dsl

Fallout from caddr_t deletion - remove a load of redundant (void *) casts.


# 1.109 18-Apr-2007 yamt

sys_accept: fix usecount botch and double soclose in rev.1.108.


# 1.108 15-Apr-2007 yamt

sys_accept: don't leak a socket on error.


Revision tags: thorpej-atomic-base
# 1.107 04-Mar-2007 christos

branches: 1.107.2; 1.107.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


Revision tags: ad-audiomp-base post-newlock2-merge
# 1.106 09-Feb-2007 ad

branches: 1.106.2;
Merge newlock2 to head.


Revision tags: newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base netbsd-4-base
# 1.105 01-Nov-2006 yamt

branches: 1.105.2;
remove some __unused from function parameters.


# 1.104 23-Oct-2006 elad

PR/34873: Ryo Shimizu: sendmsg() can cause kernel panic

Patch applied, thanks!


Revision tags: yamt-splraiseipl-base2
# 1.103 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


Revision tags: yamt-splraiseipl-base yamt-pdpolicy-base9 yamt-pdpolicy-base8 rpaulo-netinet-merge-pcb-base
# 1.102 22-Aug-2006 seanb

branches: 1.102.2; 1.102.4;
Don't leave a dangling socket (no associated struct file) if
user supplied a bad name or anamelen parameter to accept(2).
If bad paramaters were suplied and a copyout() failed, the
struct file was cleaned up but not the associated socket. This
could leave sockets in CLOSE_WAIT that could never be closed.


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base7
# 1.101 23-Jul-2006 ad

branches: 1.101.2;
Use the LWP cached credentials where sane.


# 1.100 26-Jun-2006 mrg

version the socket(2) syscall. for compat30 socket, we use
EPROTONOSUPPORT instead of EAFNOSUPPORT.

from pavel@ with a little bit of clean up from myself.

XXX: netbsd32 (and perhaps other emulations) should be able
XXX: to call the standard socket calls for this i think, but
XXX: revisit this at another time.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base yamt-pdpolicy-base5 chap-midi-base simonb-timecounters-base
# 1.99 16-May-2006 christos

branches: 1.99.4;
Don't set mature an fd that has been ffree'd


Revision tags: elad-kernelauth-base
# 1.98 11-May-2006 christos

Add MSG_NOSIGNAL (from FreeBSD)


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2 yamt-pdpolicy-base
# 1.97 01-Mar-2006 yamt

branches: 1.97.2; 1.97.4; 1.97.6;
merge yamt-uio_vmspace branch.

- use vmspace rather than proc or lwp where appropriate.
the latter is more natural to specify an address space.
(and less likely to be abused for random purposes.)
- fix a swdmover race.


Revision tags: yamt-uio_vmspace-base5
# 1.96 26-Dec-2005 perry

branches: 1.96.2; 1.96.4; 1.96.6;
u_intN_t -> uintN_t


# 1.95 11-Dec-2005 christos

merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base ktrace-lwp-base
# 1.94 03-Sep-2005 martin

In adjust_rights() Use CMSG_SPACE() to calculate the number of
filedescriptors passed in this message - the counterpart in
unp_externalize does this as well.

Note that CMSG_SPACE(0) does not make sense, since it does not invoke
the alignment magic - so use CMSG_SPACE(sizeof(int)) and adjust the
calculated total later.

This fixes the postfix conection cache for 64bit platforms. Previously
the number of passed filed descriptors (nfds) would have been
calculeted too high, causing the fdrelease() of uninitialized junk.


# 1.93 03-Sep-2005 martin

minor knf tweak


# 1.92 30-May-2005 martin

branches: 1.92.2;
Close additional file descriptors if we set MSG_CTRUNC in a SCM_RIGHTS
message. From der Mouse in PR kern/30370.


# 1.91 29-May-2005 christos

- add const.
- remove unnecessary casts.
- add __UNCONST casts and mark them with XXXUNCONST as necessary.


Revision tags: yamt-km-base4 yamt-km-base3 netbsd-3-base kent-audio2-base
# 1.90 26-Feb-2005 perry

branches: 1.90.2;
nuke trailing whitespace


Revision tags: yamt-km-base2 yamt-km-base kent-audio1-beforemerge kent-audio1-base
# 1.89 30-Nov-2004 christos

branches: 1.89.4; 1.89.6;
Cloning cleanup:
1. make fileops const
2. add 2 new negative errno's to `officially' support the cloning hack:
- EDUPFD (used to overload ENODEV)
- EMOVEFD (used to overload ENXIO)
3. Created an fdclone() function to encapsulate the operations needed for
EMOVEFD, and made all cloners use it.
4. Centralize the local noop/badop fileops functions to:
fnullop_fcntl, fnullop_poll, fnullop_kqfilter, fbadop_stat


# 1.88 22-May-2004 jonathan

Eliminate several uses of `curproc' from the socket-layer code and from NFS.

Add a new explicit `struct proc *p' argument to socreate(), sosend().
Use that argument instead of curproc. Follow-on changes to pass that
argument to socreate(), sosend(), and (*so->so_send)() calls.
These changes reviewed and independently recoded by Matt Thomas.

Changes to soreceive() and (*dom->dom_exernalize() from Matt Thomas:
pass soreceive()'s struct uio* uio->uio_procp to unp_externalize().
Eliminate curproc from unp_externalize. Also, now soreceive() uses
its uio->uio_procp value, pass that same value downward to
((pr->pru_usrreq)() calls for consistency, instead of (struct proc * )0.

Similar changes in sys/nfs to eliminate (most) uses of curproc,
either via the req-> r_procp field of a struct nfsreq *req argument,
or by passing down new explicit struct proc * arguments.

Reviewed by: Matt Thomas, posted to tech-kern.
NB: The (*pr->pru_usrreq)() change should be tested on more (all!) protocols.


# 1.87 18-May-2004 ragge

Fix connect() "bug": If connect() is interrupted by a signal, the connection
attempt is terminated, so if a process needs frequent timer interrupts
it can't ever connect() to a machine far away.

Bug found by Erik Lundgren, bugfix (for the same problem) is similar to
the way FreeBSD solved the same problem.

As a side effect, the new connect() behaviour conformes to Posix.


Revision tags: netbsd-2-0-base
# 1.86 29-Nov-2003 matt

branches: 1.86.2;
Restore a change that made AF_LOCAL sockets block on connect(2) until
accepted. However, this time this behavor is not the default. Instead
it must enabled by using the LOCAL_CONNWAIT socket option on either the
connecting or accepting socket.


# 1.85 29-Nov-2003 perry

Revert a change that altered the semantics of AF_LOCAL sockets. Sadly
this made us API incompatible with other Unixes.


# 1.84 13-Nov-2003 chs

eliminate uvm_useracc() in favor of checking the return value of
copyin() or copyout().

uvm_useracc() tells us whether the mapping permissions allow access to
the desired part of an address space, and many callers assume that
this is the same as knowing whether an attempt to access that part of
the address space will succeed. however, access to user space can
fail for reasons other than insufficient permission, most notably that
paging in any non-resident data can fail due to i/o errors. most of
the callers of uvm_useracc() make the above incorrect assumption. the
rest are all misguided optimizations, which optimize for the case
where an operation will fail. we'd rather optimize for operations
succeeding, in which case we should just attempt the access and handle
failures due to insufficient permissions the same way we handle i/o
errors. since there appear to be no good uses of uvm_useracc(), we'll
just remove it.


# 1.83 04-Sep-2003 matt

Adapt to the new calling conventions of unp_connect2


# 1.82 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.81 29-Jun-2003 fvdl

branches: 1.81.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.


# 1.80 28-Jun-2003 darrenr

Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V


# 1.79 05-Apr-2003 christos

PR/21030: Naoto Shimazaki: fcntl to accepted socket does not work properly


# 1.78 26-Feb-2003 matt

Remove leftover MBUFTRACE asserts.


# 1.77 26-Feb-2003 drochner

deactivate MBUFTRACE related KASSERT()s in the !MBUFTRACE case


# 1.76 26-Feb-2003 matt

Add MBUFTRACE kernel option.
Do a little mbuf rework while here. Change all uses of MGET*(*, M_WAIT, *)
to m_get*(M_WAIT, *). These are not performance critical and making them
call m_get saves considerable space. Add m_clget analogue of MCLGET and
make corresponding change for M_WAIT uses.
Modify netinet, gem, fxp, tulip, nfs to support MBUFTRACE.
Begin to change netstat to use sysctl.


# 1.75 18-Jan-2003 thorpej

Merge the nathanw_sa branch.


Revision tags: nathanw_sa_before_merge fvdl_fs64_base gmcgarry_ctxsw_base gmcgarry_ucred_base nathanw_sa_base
# 1.74 26-Nov-2002 christos

si_ -> sel_ to avoid conflicts with siginfo.


# 1.73 25-Nov-2002 itojun

no need for error check after MEXTMALLOC - jdolecek


# 1.72 25-Nov-2002 itojun

MEXTMALLOC() can fail even if M_WAITOK, if arg is too big for malloc().


Revision tags: kqueue-aftermerge
# 1.71 23-Oct-2002 jdolecek

merge kqueue branch into -current

kqueue provides a stateful and efficient event notification framework
currently supported events include socket, file, directory, fifo,
pipe, tty and device changes, and monitoring of processes and signals

kqueue is supported by all writable filesystems in NetBSD tree
(with exception of Coda) and all device drivers supporting poll(2)

based on work done by Jonathan Lemon for FreeBSD
initial NetBSD port done by Luke Mewburn and Jason Thorpe


Revision tags: kqueue-beforemerge kqueue-base
# 1.70 04-Sep-2002 matt

Use the queue macros from <sys/queue.h> instead of referring to the queue
members directly. Use *_FOREACH whenever possible.


Revision tags: gehenna-devsw-base
# 1.69 31-May-2002 itojun

support setsockopt() with larger data (up to MCLBYTES).
From: Hitoshi Asaeda <Hitoshi.Asaeda@sophia.inria.fr>


Revision tags: netbsd-1-6-PATCH002-RELEASE netbsd-1-6-PATCH002 netbsd-1-6-PATCH002-RC4 netbsd-1-6-PATCH002-RC3 netbsd-1-6-PATCH002-RC2 netbsd-1-6-PATCH002-RC1 netbsd-1-6-PATCH001 netbsd-1-6-PATCH001-RELEASE netbsd-1-6-PATCH001-RC3 netbsd-1-6-PATCH001-RC2 netbsd-1-6-PATCH001-RC1 netbsd-1-6-RELEASE netbsd-1-6-RC3 netbsd-1-6-RC2 netbsd-1-6-RC1 netbsd-1-6-base eeh-devprop-base newlock-base ifpoll-base
# 1.68 11-Feb-2002 jdolecek

branches: 1.68.8;
Switch default for pipes to the faster John S. Dyson's implementation.
Old, socketpair-based ones are available with option PIPE_SOCKETPAIR.


Revision tags: thorpej-mips-cache-base
# 1.67 12-Nov-2001 lukem

add RCSIDs


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2
# 1.66 16-Sep-2001 wiz

branches: 1.66.2;
Spell 'occurred' with two 'r's.


Revision tags: post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.65 17-Jul-2001 jdolecek

branches: 1.65.2;
Expel MSG_COMPAT/COMPAT_OLDSOCK, make the COMPAT_43 wrappers
arrange things as needed. Unfortunately, the check in sockargs()
have to stay, since 4.3BSD bind(2), connect(2) and sendto(2) were
not versioned at the time :(

This code was tested to pass regression tests.


# 1.64 01-Jul-2001 matt

branches: 1.64.2;
Use consistent types for len. Limit sockarg length to reasonable values.


# 1.63 25-Jun-2001 jdolecek

Back off the sendit()/recvit() change, some have problems with it


# 1.62 25-Jun-2001 jdolecek

sys_connect(): fix the call to FILE_UNUSE() so that it's done on return, rather
than immediatelly after getsock() call


# 1.61 25-Jun-2001 jdolecek

Add 'kernsa' parameter for sendit()/recvit(); if nonzero, msg->msg_name
is supposed to point directly to struct mbuf or struct sockaddr in kernel
space as appropriate, rather than being a pointer to memory in userland.

This is to be used by compat/* when emulation needs to wrap
send{to|msg}(2)/recv{from|msg}(2) and modify the passed struct
sockaddr.


# 1.60 16-Jun-2001 jdolecek

Add port of high performance pipe implementation written by John S. Dyson
for FreeBSD project. Besides huge speed boost compared with socketpair-based
pipes, this implementation also uses pagable kernel memory instead of mbufs.

Significant differences to FreeBSD version:
* uses uvm_loan() facility for direct write
* async/SIGIO handling correct also for sync writer, async reader
* limits settable via sysctl, amountpipekva and nbigpipes available via sysctl
* pipes are unidirectional - this is enforced on file descriptor level
for now only, the code would be updated to take advantage of it
eventually
* uses lockmgr(9)-based locks instead of home brew variant
* scatter-gather write is handled correctly for direct write case, data
is transferred by PIPE_DIRECT_CHUNK bytes maximum, to avoid running out of kva

All FreeBSD/NetBSD specific code is within appropriate #ifdef, in preparation
to feed changes back to FreeBSD tree.

This pipe implementation is optional for now, add 'options NEW_PIPE'
to your kernel config to use it.


# 1.59 14-Jun-2001 thorpej

Fix a partial construction problem that can cause race conditions
between creation of a file descriptor and close(2) when using kernel
assisted threads. What we do is stick descriptors in the table, but
mark them as "larval". This causes essentially everything to treat
it as a non-existent descriptor, except for fdalloc(), which sees a
filled slot so that it won't (incorrectly) allocate it again. When
a descriptor is fully constructed, the code that has constructed it
marks it as "mature" (which actually clears the "larval" flag), and
things continue to work as normal.

While here, gather all the code that gets a descriptor from the table
into a fd_getfile() function, and call it, rather than having the
same (sometimes incorrect) code copied all over the place.


# 1.58 06-May-2001 manu

implement the recently introduced EMUL_BSD_ASYNCIO_PIPE emulation package
flag.

EMUL_BSD_ASYNCIO_PIPE notes that the emulated binaries expect the original
BSD pipe behavior for asynchronous I/O, which is to fire SIGIO on read() and
write(). OSes without this flag do not expect any SIGIO to be fired on
read() and write() for pipes, even when async I/O was requested. As far as
we know, the OSes that need EMUL_BSD_ASYNCIO_PIPE are NetBSD, OSF/1 and
Darwin.


Revision tags: thorpej_scsipi_beforemerge thorpej_scsipi_nbase thorpej_scsipi_base
# 1.57 27-Feb-2001 lukem

branches: 1.57.2;
convert to ANSI KNF


# 1.56 10-Dec-2000 fvdl

Make sobind() take a struct proc *. It already took curproc and
passed it down to the appropriate usrreq function, and this
allows usage for contexts that need to be explicitly different
from curproc (like in the NFS code when binding to a reserved port).


# 1.55 24-Nov-2000 jdolecek

define COMPAT_OLDSOCK unconditionally - the code is used virtually for all
emulations besides NetBSD, and this way it's LKM-safe


# 1.54 02-Aug-2000 thorpej

MALLOC()/FREE() are not to be used for variable sized allocations.


# 1.53 27-Jun-2000 mrg

remove include of <vm/vm.h>


Revision tags: netbsd-1-5-RELEASE netbsd-1-5-BETA2 netbsd-1-5-BETA netbsd-1-5-ALPHA2 netbsd-1-5-base minoura-xpg4dl-base
# 1.52 27-May-2000 sommerfeld

branches: 1.52.4;
Reduce use of curproc in several places:

- Change ktrace interface to pass in the current process, rather than
p->p_tracep, since the various ktr* function need curproc anyway.

- Add curproc as a parameter to mi_switch() since all callers had it
handy anyway.

- Add a second proc argument for inferior() since callers all had
curproc handy.

Also, miscellaneous cleanups in ktrace:

- ktrace now always uses file-based, rather than vnode-based I/O
(simplifies, increases type safety); eliminate KTRFLAG_FD & KTRFAC_FD.
Do non-blocking I/O, and yield a finite number of times when receiving
EWOULDBLOCK before giving up.

- move code duplicated between sys_fktrace and sys_ktrace into ktrace_common.

- simplify interface to ktrwrite()


# 1.51 30-Mar-2000 augustss

Get rid of register declarations.


# 1.50 23-Mar-2000 thorpej

Implement fdremove() which is used in place of all the code that
did the "fdp->fd_ofiles[fd] = 0" assignment; fdremove() make sure
the fd_freefiles hints stay in sync.

From OpenBSD.


Revision tags: chs-ubc2-newbase wrstuden-devbsize-19991221 wrstuden-devbsize-base comdex-fall-1999-base fvdl-softdep-base
# 1.49 05-Nov-1999 mycroft

branches: 1.49.2;
Fix recent bug in sys_accept(): we must remove the file descriptor from the
file descriptor table before freeing the file description.


# 1.48 30-Oct-1999 enami

back out unnecessary stylistic changes in recent changes, to keep coding
style closer to NKF.


# 1.47 27-Oct-1999 jdolecek

minor cleanup of previous - avoid goto and code duplication


# 1.46 27-Oct-1999 darrenr

patch from Greg A. Woods to fix panic problems with code that attempts to
recover from failures to accept a socket successfully. Problem suggested
by this:
> It would appear (from two "panic: closef: count < 0" failures in less
> than 12 hours) that Darren's fix to accept(2) for lost file descriptors
> isn't quite correct. His fix inserts a call to closef() to handle one
> of several possible error conditions. However everywhere else in the
> socket code in the same file where falloc() cleanup is necessary the
> function used is ffree().


Revision tags: chs-ubc2-base
# 1.45 01-Jul-1999 itojun

branches: 1.45.2; 1.45.4; 1.45.6;
IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.


# 1.44 01-Jul-1999 darrenr

fix sys_accept() to return EOPNOTSUPP for protocols which don't support
listen/accept (PR_LISTEN flag in protosw) and detect obvious faults in
parameters passed. It is still possible for the address used for copying
the socket information to become invalid between that check and the copyout
so close the connection's allocated fd if the copyout fails so that we can
return EFAULT without allocating an fd and the application not knowing about
it. Ideally we'd be able to queue the connection back up so a later accept
could retrieve it but unfortunately that's not possible.


# 1.43 05-May-1999 thorpej

Add "use counting" to file entries. When closing a file, and it's reference
count is 0, wait for use count to drain before finishing the close.

This is necessary in order for multiple processes to safely share file
descriptor tables.


# 1.42 30-Apr-1999 cgd

add checks for COMPAT_OSF1 in the appropriate places


Revision tags: netbsd-1-4-RELEASE netbsd-1-4-base
# 1.41 10-Feb-1999 kleink

branches: 1.41.2; 1.41.4; 1.41.6;
* Due to addition and use of socklen_t, make the socket option and address
arguments passed to accept(), bind(), connect(), getpeername(), getsockname(),
getsockopt(), recvfrom(), sendto() and sendmsg() unsigned, which also elimiates
a few casts.
* Reflect the (now) signedness of msg_iovlen, which necessiates the addition
of a few casts.


# 1.40 18-Dec-1998 drochner

solve the COMPAT_OLDSOCK/MSG_COMPAT problem differently:
The source files which need MSG_COMPAT define COMPAT_OLDSOCK.


Revision tags: kenh-if-detach-base
# 1.39 26-Nov-1998 mycroft

Revert the functional change in rev 1.38; permit a msg_iovlen of 0.
There are two reasons for this:
* We should be able to pass file descriptors without sending any data.
* We could send zero-length iovecs anyway (but we shouldn't have to do this).
Also, msg_iovlen is already a u_int, so delete a bunch of casts.


Revision tags: chs-ubc-base
# 1.38 04-Aug-1998 kleink

Per XNS Issue 5, calling recvmsg(2) or sendmsg(2) with an msg.msg_iovlen less
than or equal to 0 shall fail with EMSGSIZE; the latter condition was not being
checked for. Also, document the msg.msg_iovlen > {IOV_MAX} case.


# 1.37 04-Aug-1998 kleink

UIO_MAXIOV -> IOV_MAX


# 1.36 04-Aug-1998 perry

Abolition of bcopy, ovbcopy, bcmp, and bzero, phase one.
bcopy(x, y, z) -> memcpy(y, x, z)
ovbcopy(x, y, z) -> memmove(y, x, z)
bcmp(x, y, z) -> memcmp(x, y, z)
bzero(x, y) -> memset(x, 0, y)


# 1.35 03-Aug-1998 kleink

Fix two off-by-one bugs, both present in each recvmsg(2) and sendmsg(2):
* the first one would cause an unnecessary malloc() of iovec storage for
a msg_iovlen of UIO_SMALLIOV although the required amount of memory has
been allocated on the stack.
* the second one would cause a recvmsg() or sendmsg() with a msg_iovlen of
UIO_MAXIOV to fail with EMSGSIZE, which is also a violation of XNS5.


# 1.34 31-Jul-1998 perry

fix sizeofs so they comply with the KNF style guide. yes, it is pedantic.


Revision tags: eeh-paddr_t-base
# 1.33 29-Jul-1998 thorpej

branches: 1.33.2;
When checking for overflow in the residual count, test against SSIZE_MAX.
The read/write system calls return ssize_t because -1 is used to indicate
error, therefore the transfer size MUST be limited to SSIZE_MAX, otherwise
garbage can be returned to the user.

There is NO change from existing behavior here, only a more precise
definition of that the semantics are, except in the Alpha case, where
the full SSIZE_MAX transfer size can now be realized (ssize_t is 64-bit
on the Alpha).


# 1.32 18-Jul-1998 lukem

use AF_LOCAL instead of AF_UNIX


# 1.31 25-Jun-1998 thorpej

defopt KTRACE


# 1.30 25-Apr-1998 matt

Hook for 0-copy (or other optimized) sends and receives


# 1.29 01-Mar-1998 fvdl

Merge with Lite2 + local changes


# 1.28 06-Feb-1998 thorpej

When copying out multiple control messages, ensure that the next control
message is aligned. From David Borman <dab@bsdi.com>.


# 1.27 07-Jan-1998 thorpej

Make insertion and removal of sockets from the partial and incoming
connections queues O(C) rather than O(N).


# 1.26 07-Jan-1998 thorpej

Fix bug in recvit() that would cause recvmsg() to only receive one
control message, even if there were multiple control messages on
the queue. From Jean-Luc Richier <Jean-Luc.Richier@imag.fr>, in
bug report kern/4700.


Revision tags: netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base thorpej-signal-base marc-pcmcia-bp marc-pcmcia-base
# 1.25 26-Jun-1997 thorpej

branches: 1.25.8;
Use UCHAR_MAX rather than "255" when sanity-checking the length of a
sockaddr in sockargs().


# 1.24 26-Jun-1997 thorpej

In sockargs():
- Add a comment describing my feelings about this interface, in general.
- Remove the COMPAT_OLDSOCK length hack. Instead, if the socket argument
is too long to fit in an mbuf, allocate enough external storage to
hold it.
- If the socket argument is a sockaddr, don't allow the length to be
greater than 255, as that would overflow sa_len.
Many thanks to enami tsugutomo <enami@cv.sony.co.jp> for his sanity checking.


Revision tags: is-newarp-before-merge is-newarp-base
# 1.23 22-Dec-1996 cgd

* catch up with system call argument type fixups/const poisoning.
* Fix arguments to various copyin()/copyout() invocations, to avoid
gratuitous casts.
* Some KNF formatting fixes
* Change sockargs()'s second argument to be a const void *, to help
with dealing with the syscall argument type fixups/const poisoning.


# 1.22 14-Jun-1996 cgd

avoid unnecessary checks of m_get/MGET/etc.'s return values. When
they're called with M_WAIT, they are defined to never return NULL.


# 1.21 22-May-1996 mycroft

Pass a proc pointer down to the usrreq and pcbbind functions for PRU_ATTACH, PRU_BIND and
PRU_CONTROL. The usrreq interface really needs to be split up, but this will have to wait.
Remove SS_PRIV completely.


Revision tags: netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.20 17-May-1996 pk

branches: 1.20.4;
Don't touch retval[] in socketpair(); manual page says this system call
return 0 on success (PR#2428).


# 1.19 09-Feb-1996 christos

More proto fixes


# 1.18 04-Feb-1996 christos

First pass at prototyping


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.17 10-Oct-1995 mycroft

Add hooks for COMPAT_FREEBSD, from Noriyuki Soda.


# 1.16 07-Oct-1995 mycroft

Prefix names of system call implementation functions with `sys_'.


# 1.15 19-Sep-1995 thorpej

Make system calls conform to a standard prototype and bring those
prototypes into scope.


# 1.14 12-Aug-1995 mycroft

splnet --> splsoftnet


# 1.13 24-Jun-1995 christos

Extracted all of the compat_xxx routines, and created a library [libcompat]
for them. There are a few #ifdef COMPAT_XX remaining, but they are not easy
or worth eliminating (yet).


# 1.12 10-May-1995 christos

tty_tb.c: need to include ioctl_compat.h in order to compile.
sysv_shm.c: make shm_find_segment_by_shmid global so it can be used by
COMPAT_HPUX. There should be a better way...
rest: Add #ifdef COMPAT_HPUX where needed


# 1.11 05-Mar-1995 fvdl

Extended a couple of defines with "|| defined(COMPAT_LINUX)" to make
things compile without requiring COMPAT_43 and/or COMPAT_09.


# 1.10 30-Oct-1994 cgd

be more careful with types, also pull in headers where necessary.


# 1.9 20-Oct-1994 cgd

update for new syscall args description mechanism


Revision tags: netbsd-1-0-PATCH06 netbsd-1-0-PATCH05 netbsd-1-0-PATCH04 netbsd-1-0-PATCH03 netbsd-1-0-PATCH02 netbsd-1-0-PATCH1 netbsd-1-0-PATCH0 netbsd-1-0-RELEASE netbsd-1-0-base
# 1.8 29-Jun-1994 cgd

New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.7 04-May-1994 mycroft

Add return types where missing. Simplify some of the compat conditionals.
Include compat code if COMPAT_SUNOS with or without COMPAT_43.


# 1.6 18-Dec-1993 mycroft

Canonicalize all #includes.


Revision tags: magnum-base netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.5 17-Jul-1993 mycroft

branches: 1.5.4;
Finish moving struct definitions outside of function declarations.


# 1.4 27-Jun-1993 andrew

* ansifications
* Yuval Yarom's socket recv(2) fixes - access rights problems (see also
uipc_socket.c).


# 1.3 22-May-1993 cgd

add include of select.h if necessary for protos, or delete if extraneous


# 1.2 18-May-1993 cgd

make kernel select interface be one-stop shopping & clean it all up.


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


Revision tags: isaki-audio2-base pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126
# 1.199 12-Nov-2018 hannken

sys_recvmmsg: don't defer an error that already gets returned.


# 1.198 07-Nov-2018 hannken

Don't defer errors from sendmmsg(). This matches the linux manpage.

Defer errors from recvmmsg() through so_rerror and
tests and return a deferred error on entry.

Ok: christos@


Revision tags: pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.197 03-Sep-2018 riastradh

Rename min/max -> uimin/uimax for better honesty.

These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER! Some subsystems have

#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)


# 1.196 01-Aug-2018 rjs

Add ioctl(2) handler for kernel part of sctp_peeloff().


# 1.195 31-Jul-2018 rjs

Add getsockopt2() syscall.


Revision tags: pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521
# 1.194 04-May-2018 christos

define MBUFTYPES here.


# 1.193 03-May-2018 christos

Fix COMPAT_NETBSD32 cmsg handling:

1. alignment was wrong for > 1 message
2. macros were doing incorrect pointer comparisons, fortunately ending
the iteration early after the fists cmsg instead of crashing.
3. don't output 32 bit ktrace records for cmsg. 32 bit programs running
under emulation on 64 bit systems should produce 64 bit ktrace records
so that the native ktrace can handle the records; remove extra arguments
that are now not needed (the 32 bit msghdr).
4. output the correct type for cmsg trace records.
5. output all the cmsg records in traces instead of just the first one.

Welcome to 8.99.15 because of the argument removal.

XXX: Really all the code should be changed to use the CMSG_{FIRST,NXT}HDR
macros...


Revision tags: pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322
# 1.192 16-Mar-2018 christos

PR/53103: Timo Buhrmester: linux emulation of sendto(2) broken

The sockargs refactoring broke it, because sockargs only works with a user
address. Added an argument to sockargs to indicate where the address is
coming from. Welcome to 8.99.14.


Revision tags: pgoyette-compat-0315 pgoyette-compat-base
# 1.191 12-Feb-2018 maxv

branches: 1.191.2;
Add a KASSERT; we expect *from to be a single mbuf (not chained).


# 1.190 04-Jan-2018 christos

Add a new sockopt member to keep track of the actual size of the option
that should be returned to the caller in getsockopt(2).

(Tom Ivar Helbekkmo)


# 1.189 31-Dec-2017 christos

pass valsize for getsockopt like we do for setsockopt


# 1.188 26-Dec-2017 kamil

Refactor pipe1() and correct a bug in sys_pipe2() (SYS_pipe2)

sys_pipe2() returns two integers (values), the 2nd one is a copy of the 2nd
file descriptor that lands in fildes[2]. This is a side effect of reusing
the code for sys_pipe() (SYS_pipe) and not cleaning it up.

The first returned value is (on success) 0.

Introduced a small refactoring in pipe1() that it does not operate over
retval[], but on an array int[2]. A user sets retval[] for pipe() when
desired and needed.

This refactoring touches compat code: netbsd32, linux, linux32.

Before the changes on NetBSD/amd64:

$ ktruss -i ./a.out
[...]
15131 1 a.out pipe2(0x7f7fff2e62b8, 0) = 0, 4
[...]

After the changes:

$ ktruss -i ./a.out
[...]
782 1 a.out pipe2(0x7f7fff97e850, 0) = 0
[...]

There should not be a visible change for current users.

Sponsored by <The NetBSD Foundation>


Revision tags: tls-maxphys-base-20171202 nick-nhusb-base-20170825 perseant-stdc-iso10646-base
# 1.187 20-Jun-2017 christos

Change len type to be unsigned int for consistency with the input type.
Don't check for negative; it does not matter we clamp anyway. This
broke the compat32 getsockname() where an unitialized socklen_t ended
up randomly negative causing it to fail.


Revision tags: matt-nb8-mediatek-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204
# 1.186 03-Feb-2017 christos

branches: 1.186.6;
expose sendmsg_so and recvmsg_so.


# 1.185 02-Feb-2017 christos

expose copyout_sockname_sb


Revision tags: bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204
# 1.184 03-Dec-2016 christos

branches: 1.184.2;
Add missing ktrkuser


Revision tags: pgoyette-localcount-20161104 nick-nhusb-base-20161004 localcount-20160914
# 1.183 13-Sep-2016 martin

Make the ktrace record written by do_sys_sendmsg/do_sys_recvmsg overridable
by the caller. Use this in compat_netbsd32 to log the 32bit version, so
the 32bit userland kdump is happy.


Revision tags: pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.182 07-Jul-2016 msaitoh

branches: 1.182.2;
KNF. Remove extra spaces. No functional change.


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.181 01-Nov-2015 christos

Don't overwrite the user iov pointer in sendmmsg. Make the send and receive
code look the same.


Revision tags: nick-nhusb-base-20150921
# 1.180 24-Aug-2015 pooka

sprinkle _KERNEL_OPT


# 1.179 22-Jul-2015 maxv

Memory leak. Triggerable from an unprivileged user via COMPAT_43.


Revision tags: nick-nhusb-base-20150606
# 1.178 09-May-2015 rtr

change sosend() to accept sockaddr * instead of mbuf * for nam.

bump to 7.99.16


# 1.177 02-May-2015 rtr

make connect syscall use sockaddr_big and modify pr_{send,connect}
nam parameter type from buf * to sockaddr *.

final commit for parameter type changes to protocol user requests

* bump kernel version to 7.99.15 for parameter type changes to pr_{send,connect}


# 1.176 24-Apr-2015 rtr

make accept, getsockname and getpeername syscalls use sockaddr_big and modify
pr_{accept,sockname,peername} nam parameter type from mbuf * to sockaddr *.

* retained use of mbuftypes[MT_SONAME] for now.
* bump to netbsd version 7.99.12 for parameter type change.

patch posted to tech-net@ 2015/04/19


Revision tags: nick-nhusb-base-20150406
# 1.175 03-Apr-2015 rtr

* change pr_bind to accept struct sockaddr * instead of struct mbuf *
* update protocol bind implementations to use/expect sockaddr *
instead of mbuf *
* introduce sockaddr_big struct for storage of addr data passed via
sys_bind; sockaddr_big is of sufficient size and alignment to
accommodate all addr data sizes received.
* modify sys_bind to allocate sockaddr_big instead of using an mbuf.
* bump kernel version to 7.99.9 for change to pr_bind() parameter type.

Patch posted to tech-net@
http://mail-index.netbsd.org/tech-net/2015/03/15/msg005004.html

The choice to use a new structure sockaddr_big has been retained since
changing sockaddr_storage size would lead to unnecessary ABI change. The
use of the new structure does not preclude future work that increases
the size of sockaddr_storage and at that time sockaddr_big may be
trivially replaced.

Tested by mrg@ and myself, discussed with rmind@, posted to tech-net@


# 1.174 06-Mar-2015 rtr

Return EINVAL if namelen isn't large enough to encompass the expected
members of sockaddr structures. i.e. sa_len and sa_family.

Discussed with and patch by christos@


Revision tags: nick-nhusb-base
# 1.173 05-Sep-2014 matt

branches: 1.173.2;
Try not to use f_data, use f_{vnode,socket,pipe,mqueue,kqueue,ksem} to get
a correctly typed pointer.


Revision tags: netbsd-7-0-RC2 netbsd-7-0-RC1 netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.172 09-Aug-2014 rtr

branches: 1.172.2;
split PRU_CONNECT2 & PRU_PURGEIF function out of pr_generic() usrreq
switches and put into separate functions

- always KASSERT(solocked(so)) even if not implemented
(for PRU_CONNECT2 only)

- replace calls to pr_generic() with req = PRU_CONNECT2 with calls to
pr_connect2()

- replace calls to pr_generic() with req = PRU_PURGEIF with calls to
pr_purgeif()

put common code from unp_connect2() (used by unp_connect() into
unp_connect1() and call out to it when needed

patch only briefly reviewed by rmind@


# 1.171 09-Jul-2014 rtr

* split PRU_PEERADDR and PRU_SOCKADDR function out of pr_generic()
usrreq switches and put into separate functions
xxx_{peer,sock}addr(struct socket *, struct mbuf *).

- KASSERT(solocked(so)) always in new functions even if request
is not implemented

- KASSERT(pcb != NULL) and KASSERT(nam) if the request is
implemented and not for tcp.

* for tcp roll #ifdef KPROF and #ifdef DEBUG code from tcp_usrreq() into
easier to cut & paste functions tcp_debug_capture() and
tcp_debug_trace()

- functions provided by rmind
- remaining use of PRU_{PEER,SOCK}ADDR #define to be removed in a
future commit.

* rename netbt functions to permit consistency of pru function names
(as has been done with other requests already split out).

- l2cap_{peer,sock}addr() -> l2cap_{peer,sock}_addr_pcb()
- rfcomm_{peer,sock}addr() -> rfcomm_{peer,sock}_addr_pcb()
- sco_{peer,sock}addr() -> sco_{peer,sock}_addr_pcb()

* split/refactor do_sys_getsockname(lwp, fd, which, nam) into
two functions do_sys_get{peer,sock}name(fd, nam).

- move PRU_PEERADDR handling into do_sys_getpeername() from
do_sys_getsockname()
- have svr4_stream directly call do_sys_get{sock,peer}name()
respectively instead of providing `which' & fix a DPRINTF string
that incorrectly wrote "getpeername" when it meant "getsockname"
- fix sys_getpeername() and sys_getsockname() to call
do_sys_get{sock,peer}name() without `which' and `lwp' & adjust
comments
- bump kernel version for removal of lwp & which parameters from
do_sys_getsockname()

note: future cleanup to remove struct mbuf * abuse in
xxx_{peer,sock}name()
still to come, not done in this commit since it is easier to do post
split.

patch reviewed by rmind

welcome to 6.99.47


Revision tags: rmind-smpnet-nbase rmind-smpnet-base
# 1.170 18-May-2014 rmind

Add struct pr_usrreqs with a pr_generic function and prepare for the
dismantling of pr_usrreq in the protocols; no functional change intended.
PRU_ATTACH/PRU_DETACH changes will follow soon.

Bump for struct protosw. Welcome to 6.99.62!


# 1.169 17-May-2014 rmind

- fsocreate: set SS_NBIO before the file descriptor is affixed as there is
a theoretical race condition (hard to trigger, though); remove the LWP
parameter and clean up the code a little.
- Sprinkle few comments.
- Remove M_SOOPTS while here.


# 1.168 17-May-2014 rmind

makesocket: set SS_NBIO slightly earlier.


# 1.167 17-May-2014 rmind

Remove trailing whitespaces, wrap long lines, minor KNF; no functional changes.


Revision tags: yamt-pagecache-base9
# 1.166 07-Apr-2014 seanb

Fix a case where an erroneous EAGAIN was returned out of recvmmsg.
This occured when some, but not all of the mmsg array members
were filled with data from a non-blocking socket.
PR kern/48725


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.165 09-Oct-2013 christos

branches: 1.165.2;
delete extra m_len initialization.


# 1.164 09-Oct-2013 christos

PR/48292: Justin Cormack: paccept creates sockets that cannot be made blocking
Reset the socket flags not just the file flags for non-blocking I/O.
XXX: pullup 6


# 1.163 08-Oct-2013 christos

PR/47591: Michael Plass: If the unix socket is closed before accept,
unp->unp_conn will be NULL in PRU_ACCEPT, as called from
sys_accept->so_accept. This will cause the usrreq to return with
no error, leaving the mbuf gotten from m_get() with an uninitialized
length, containing junk from a previous call. Initialize m_len to
be 0 to handle this case. This is yet another reason why Beverly's
idea of setting m_len = 0 in m_get() makes a lot of sense. Arguably
this could be an error, since the data we return now has 0 family
and length.


# 1.162 29-Aug-2013 rmind

Remove SS_ISCONFIRMING, it is unused and TP4 will not come back.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.161 03-Jun-2013 christos

branches: 1.161.2;
use the proper name for kdump pretty-printing.


Revision tags: agc-symver-base
# 1.160 14-Feb-2013 christos

PR/47569: Valery Ushakov: SOCK_NONBLOCK does not work because it does not
set SS_NBIO.
XXX: there are too many flags that mean the same thing in too many places,
and too many flags that mean the same thing and are different.


# 1.159 14-Feb-2013 riastradh

Fix some screw cases in cmsg file descriptor passing.

- Don't leave garbage in the control buffer if allocating file
descriptors fails in unp_externalize.

- Scrub the space between CMSG_LEN and CMSG_SPACE to avoid kernel
memory disclosure in unp_externalize.

- Don't read past cmsg_len when closing file descriptors that
couldn't get delivered, in free_rights.

ok christos


Revision tags: yamt-pagecache-base8
# 1.158 29-Dec-2012 mlelstv

The sanity check prevented messages that carry only ancillary data.


# 1.157 29-Dec-2012 mlelstv

If an untraced process sleeps in recvmsg/sendmsg, the syscall does not
allocate an iov structure for ktrace. When tracing is then enabled
and the process wakes up, it crashes the kernel.

Undo the last commit which introduced this error path.

Avoid the mentioned kmem_alloc assertion by adding a sanity check analog
to similar code in sys_generic.c for I/O on file handles instead of
sockets.

This also causes the syscall to return EMSGSIZE if the msg_iovlen member
of the msg structure is less than or equal to 0, as defined in
recvmsg(2)/sendmsg(2).


Revision tags: yamt-pagecache-base7 yamt-pagecache-base6
# 1.156 17-Jul-2012 njoly

branches: 1.156.2;
Avoid kmem_alloc KASSERT for 0 byte allocation, when tracing processes
that use empty messages with sendmsg/recvmsg.


# 1.155 22-Jun-2012 christos

Add {send,recv}mmsg from Linux


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2 netbsd-6-base
# 1.154 25-Jan-2012 christos

branches: 1.154.2;
revert atomics for so_options since it is a short.


# 1.153 25-Jan-2012 christos

need <sys/atomic.h>


# 1.152 25-Jan-2012 christos

Add locking, requested by yamt. Note that locking is not used everywhere
for these.


# 1.151 25-Jan-2012 christos

As discussed in tech-kern, provide the means to prevent delivery of SIGPIPE
on EPIPE for all file descriptor types:

- provide O_NOSIGPIPE for open,kqueue1,pipe2,dup3,fcntl(F_{G,S}ETFL) [NetBSD]
- provide SOCK_NOSIGPIPE for socket,socketpair [NetBSD]
- provide SO_NOSIGPIPE for {g,s}seckopt [NetBSD/FreeBSD/MacOSX]
- provide F_{G,S}ETNOSIGPIPE for fcntl [MacOSX]


# 1.150 21-Dec-2011 christos

simplify expression


# 1.149 20-Dec-2011 christos

- Eliminate so_nbio and turn it into a bit SS_NBIO in so_state.
- Introduce MSG_NBIO so that we can turn non blocking i/o on a per call basis
- Use MSG_NBIO to fix the XXX: multi-threaded issues on the fifo sockets.
- Don't set SO_CANTRCVMORE, if we were interrupted (perhaps do it for all
errors?).


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2
# 1.148 04-Nov-2011 christos

branches: 1.148.4;
Fix error I introduced in previous commit that caused asymmetric connects
when SOCK_NONBLOCK or SOCK_CLOEXEC was specified. Factor out common code
and simplify error return.


Revision tags: yamt-pagecache-base
# 1.147 21-Sep-2011 christos

branches: 1.147.2;
Put the mbuf type in the ktrace record so that we know how to decode it
in userland.


# 1.146 27-Jul-2011 uebayasi

These don't need uvm/uvm_extern.h.


# 1.145 15-Jul-2011 christos

fail with EINVAL if flags not are not O_CLOEXEC|O_NONBLOCK in pipe2(2) and
dup3(2)


# 1.144 26-Jun-2011 christos

* Arrange for interfaces that create new file descriptors to be able to
set close-on-exec on creation (http://udrepper.livejournal.com/20407.html).

- Add F_DUPFD_CLOEXEC to fcntl(2).
- Add MSG_CMSG_CLOEXEC to recvmsg(2) for unix file descriptor passing.
- Add dup3(2) syscall with a flags argument for O_CLOEXEC, O_NONBLOCK.
- Add pipe2(2) syscall with a flags argument for O_CLOEXEC, O_NONBLOCK.
- Add flags SOCK_CLOEXEC, SOCK_NONBLOCK to the socket type parameter
for socket(2) and socketpair(2).
- Add new paccept(2) syscall that takes an additional sigset_t to alter
the sigmask temporarily and a flags argument to set SOCK_CLOEXEC,
SOCK_NONBLOCK.
- Add new mode character 'e' to fopen(3) and popen(3) to open pipes
and file descriptors for close on exec.
- Add new kqueue1(2) syscall with a new flags argument to open the
kqueue file descriptor with O_CLOEXEC, O_NONBLOCK.

* Fix the system calls that take socklen_t arguments to actually do so.

* Don't include userland header files (signal.h) from system header files
(rump_syscallargs.h).

* Bump libc version for the new syscalls.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.143 24-Apr-2011 rmind

- Replace few malloc(9) uses with kmem(9).
- Rename buf_malloc() to buf_alloc(), fix comments.
- Remove some unnecessary inclusions.


# 1.142 10-Apr-2011 christos

- Add O_CLOEXEC to open(2)
- Add fd_set_exclose() to encapsulate uses of FIO{,N}CLEX, O_CLOEXEC, F{G,S}ETFD
- Add a pipe1() function to allow passing flags to the fd's that pipe(2)
opens to ease implementation of linux pipe2(2)
- Factor out fp handling code from open(2) and fhopen(2)


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base matt-mips64-premerge-20101231 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10 uebayasi-xip-base1
# 1.141 23-Apr-2010 rmind

branches: 1.141.2;
Replace M_IOV and some malloc(9)s with kmem(9), and while there:
- Fix invalid free (M_TEMP vs M_IOV) in do_sys_recvmsg(), spotted by jakllsch@.
Also, same fix in osf1_sys_sendmsg_xopen().
- Fix attempt to free non-allocated memory in error path in netbsd32___getfh30().
- Plug a memory leak in compat_43_netbsd32_orecvmsg().


Revision tags: yamt-nfs-mp-base9 uebayasi-xip-base
# 1.140 21-Jan-2010 pgoyette

branches: 1.140.2; 1.140.4;
Remove unnecessary call to kauth_cred_free().

This resolves an occassional crash I'd been experiencing as reported on
current-users@

Fix suggested by and OK elad@


# 1.139 29-Dec-2009 elad

Add credentials to to sockets.

We don't need any deferred free etc. because we no longer free the
credentials in interrupt context.

Tons of help from matt@, thanks!


# 1.138 20-Dec-2009 dsl

If a multithreaded app closes an fd while another thread is blocked in
read/write/accept, then the expectation is that the blocked thread will
exit and the close complete.
Since only one fd is affected, but many fd can refer to the same file,
the close code can only request the fs code unblock with ERESTART.
Fixed for pipes and sockets, ERESTART will only be generated after such
a close - so there should be no change for other programs.
Also rename fo_abort() to fo_restart() (this used to be fo_drain()).
Fixes PR/26567


Revision tags: matt-premerge-20091211
# 1.137 09-Dec-2009 dsl

Rename fo_drain() to fo_abort(), 'drain' is used to mean 'wait for output
do drain' in many places, whereas fo_drain() was called in order to force
blocking read()/write() etc calls to return to userspace so that a close()
call from a different thread can complete.
In the sockets code comment out the broken code in the inner function,
it was being called from compat code.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 jym-xensuspend-nbase yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.136 04-Apr-2009 ad

Add fileops::fo_drain(), to be called from fd_close() when there is more
than one active reference to a file descriptor. It should dislodge threads
sleeping while holding a reference to the descriptor. Implemented only for
sockets but should be extended to pipes, fifos, etc.

Fixes the case of a multithreaded process doing something like the
following, which would have hung until the process got a signal.

thr0 accept(fd, ...)
thr1 close(fd)


Revision tags: nick-hppapmap-base2
# 1.135 21-Jan-2009 yamt

branches: 1.135.2;
restore the pre socket locking patch signal behaviour.
this fixes a busy-loop in nfs_connect.


Revision tags: netbsd-5-0-RC1 haad-dm-base2 haad-nbase2 ad-audiomp2-base netbsd-5-base matt-mips64-base2 haad-dm-base1 wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 haad-dm-base mjf-devfs2-base
# 1.134 06-Aug-2008 plunky

branches: 1.134.2; 1.134.4;
Convert socket options code to use a sockopt structure
instead of laying everything into an mbuf.

approved by core


Revision tags: simonb-wapbl-nbase simonb-wapbl-base
# 1.133 24-Jun-2008 ad

branches: 1.133.2;
Nothing uses getsock/getvnode any more.


Revision tags: wrstuden-revivesa-base-1 yamt-pf42-base4 yamt-pf42-base3 wrstuden-revivesa-base
# 1.132 30-May-2008 rmind

branches: 1.132.2;
do_sys_accept: release the reference to sock in few error paths.
Should fix PR/38790, report and test-case by Nicolas Joly.


Revision tags: hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2
# 1.131 28-Apr-2008 martin

branches: 1.131.2;
Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.130 24-Apr-2008 ad

branches: 1.130.2;
Network protocol interrupts can now block on locks, so merge the globals
proclist_mutex and proclist_lock into a single adaptive mutex (proc_lock).
Implications:

- Inspecting process state requires thread context, so signals can no longer
be sent from a hardware interrupt handler. Signal activity must be
deferred to a soft interrupt or kthread.

- As the proc state locking is simplified, it's now safe to take exit()
and wait() out from under kernel_lock.

- The system spends less time at IPL_SCHED, and there is less lock activity.


# 1.129 24-Apr-2008 ad

Merge the socket locking patch:

- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.

With much feedback from matt@ and plunky@.


Revision tags: yamt-pf42-baseX yamt-pf42-base ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14
# 1.128 21-Mar-2008 ad

branches: 1.128.2;
Catch up with descriptor handling changes. See kern_descrip.c revision
1.173 for details.


Revision tags: keiichi-mipv6-nbase nick-net80211-sync-base keiichi-mipv6-base matt-armv6-nbase mjf-devfs-base hpcarm-cleanup-base
# 1.127 06-Feb-2008 ad

branches: 1.127.6;
Don't lock the socket to set/clear FNONBLOCK. Just set it atomically.


Revision tags: vmlocking2-base3 bouyer-xeni386-nbase bouyer-xeni386-base matt-armv6-base
# 1.126 26-Dec-2007 ad

Merge more changes from vmlocking2, mainly:

- Locking improvements.
- Use pool_cache for more items.


# 1.125 20-Dec-2007 dsl

Convert all the system call entry points from:
int foo(struct lwp *l, void *v, register_t *retval)
to:
int foo(struct lwp *l, const struct foo_args *uap, register_t *retval)
Fixup compat code to not write into 'uap' and (in some cases) to actually
pass a correctly formatted 'uap' structure with the right name to the
next routine.
A few 'compat' routines that just call standard ones have been deleted.
All the 'compat' code compiles (along with the kernels required to test
build it).
98% done by automated scripts.


# 1.124 16-Dec-2007 elad

Make solisten() take an lwp pointer like the rest, so it can be passed down
to pr_usrreq.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 vmlocking-nbase jmcneill-pm-base reinoud-bufcleanup-base
# 1.123 24-Nov-2007 dyoung

branches: 1.123.2; 1.123.6;
Pass the mbuf type (e.g., MT_SONAME, MT_SOOPTS) as the second
argument to getsockmbuf().


Revision tags: jmcneill-base bouyer-xenamd64-base2 yamt-x86pmap-base4 bouyer-xenamd64-base yamt-x86pmap-base3 yamt-x86pmap-base2 vmlocking-base
# 1.122 05-Oct-2007 dyoung

branches: 1.122.4;
Use getsombuf().


Revision tags: yamt-x86pmap-base
# 1.121 19-Sep-2007 christos

branches: 1.121.2;
minor nits; no code change.


# 1.120 19-Sep-2007 dyoung

1) Introduce a new socket option, (SOL_SOCKET, SO_NOHEADER), that
tells a socket that it should both add a protocol header to tx'd
datagrams and remove the header from rx'd datagrams:

int onoff = 1, s = socket(...);
setsockopt(s, SOL_SOCKET, SO_NOHEADER, &onoff);

2) Add an implementation of (SOL_SOCKET, SO_NOHEADER) for raw IPv4
sockets.

3) Reorganize the protocols' pr_ctloutput implementations a bit.
Consistently return ENOPROTOOPT when an option is unsupported,
and EINVAL if a supported option's arguments are incorrect.
Reorganize the flow of code so that it's more clear how/when
options are passed down the stack until they are handled.

Shorten some pr_ctloutput staircases for readability.

4) Extract common mbuf code into subroutines, add new sockaddr
methods, and introduce a new subroutine, fsocreate(), for reuse
later; use it first in sys_socket():

struct mbuf *m_getsombuf(struct socket *so)

Create an mbuf and make its owner the socket `so'.

struct mbuf *m_intopt(struct socket *so, int val)

Create an mbuf, make its owner the socket `so', put the
int `val' into it, and set its length to sizeof(int).


int fsocreate(..., int *fd)

Create a socket, a la socreate(9), put the socket into the
given LWP's descriptor table, return the descriptor at `fd'
on success.

void *sockaddr_addr(struct sockaddr *sa, socklen_t *slenp)
const void *sockaddr_const_addr(const struct sockaddr *sa, socklen_t *slenp)

Extract a pointer to the address part of a sockaddr. Write
the length of the address part at `slenp', if `slenp' is
not NULL.

socklen_t sockaddr_getlen(const struct sockaddr *sa)

Return the length of a sockaddr. This just evaluates to
sa->sa_len. I only add this for consistency with code that
appears in a portable userland library that I am going to
import.

const struct sockaddr *sockaddr_any(const struct sockaddr *sa)

Return the "don't care" sockaddr in the same family as
`sa'. This is the address a client should sobind(9) if it
does not care the source address and, if applicable, the
port et cetera that it uses.

const void *sockaddr_anyaddr(const struct sockaddr *sa, socklen_t *slenp)

Return the "don't care" sockaddr in the same family as
`sa'. This is the address a client should sobind(9) if it
does not care the source address and, if applicable, the
port et cetera that it uses.


Revision tags: nick-csl-alignment-base5
# 1.119 06-Sep-2007 rmind

do_sys_sendmsg: Plug a possible leak.
From CID: 4535


# 1.118 01-Sep-2007 dsl

Don't error calls to copy socket addresses to userspace when the application
has provided a non-null buffer pointer and a zero length.


# 1.117 27-Aug-2007 dsl

ktrace socket control structures (ie msghdr, address etc) using ktrkuser().


# 1.116 15-Aug-2007 ad

branches: 1.116.2;
Changes to make ktrace LKM friendly and reduce ifdef KTRACE. Proposed
on tech-kern.


Revision tags: matt-mips64-base nick-csl-alignment-base
# 1.115 15-Jul-2007 dsl

branches: 1.115.2; 1.115.6;
Remove non-user flags (especially MSG_IOVUSRSPACE) from mp->msg_flags
before passing to so_receive.
This may (or may not) have any effect...


Revision tags: mjf-ufs-trans-base
# 1.114 01-Jul-2007 dsl

Check for SOL_SOCKET when checking for SCM_RIGHTS.


# 1.113 24-Jun-2007 dsl

Split sys_getpeername() and sys_getsockname() so they can be called when the
'name' is wanted in kernel code.
Similarly split sys_accept() and change the split in recvmsg() so that it
is useful to the compat functions, recvit() is removed and replaced by
do_sys_recvmsg().
Factor out the code that writes socked names to userspace (from mbuf) to
avoid replicated code.
Extract the code that writes socket 'control' (CMSG) data out to userspace,
being more careful about the 'fd' that may exist inside SCM_RIGHTS msgs.
(they still get lost if some of the latter copyout calls fail).
Since these are new functions, and old LKMs will fail to load.


# 1.112 02-Jun-2007 enami

- Fix obvious typos so that sendto(2) works.
- Wrap lines again.


# 1.111 01-Jun-2007 dsl

Split sys_bind() and sys_connect() so that compat code can use common code
once the 'address' has been copied into an mbuf.
Add extra flags for 'struct msghdr.msg_flags' to indicate that the address
and control are already in mbufs, and that the uio structure is in userspace
for sending data, rename sendit() to do_sys_sendmsg() to ensure no old code
passes in random flags.
Changes to compat code to use new functions - removing some stackgap use.
Fix a 'use after free' in compat_43_sys_recvmsg.
I ***THINK*** the code that converts 'cmsg' formatted data is borked!
svr4_stream.c ought to be generated from svr4_32_stream.c during the build.


Revision tags: yamt-idlelwp-base8
# 1.110 13-May-2007 dsl

Fallout from caddr_t deletion - remove a load of redundant (void *) casts.


# 1.109 18-Apr-2007 yamt

sys_accept: fix usecount botch and double soclose in rev.1.108.


# 1.108 15-Apr-2007 yamt

sys_accept: don't leak a socket on error.


Revision tags: thorpej-atomic-base
# 1.107 04-Mar-2007 christos

branches: 1.107.2; 1.107.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


Revision tags: ad-audiomp-base post-newlock2-merge
# 1.106 09-Feb-2007 ad

branches: 1.106.2;
Merge newlock2 to head.


Revision tags: newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base netbsd-4-base
# 1.105 01-Nov-2006 yamt

branches: 1.105.2;
remove some __unused from function parameters.


# 1.104 23-Oct-2006 elad

PR/34873: Ryo Shimizu: sendmsg() can cause kernel panic

Patch applied, thanks!


Revision tags: yamt-splraiseipl-base2
# 1.103 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


Revision tags: yamt-splraiseipl-base yamt-pdpolicy-base9 yamt-pdpolicy-base8 rpaulo-netinet-merge-pcb-base
# 1.102 22-Aug-2006 seanb

branches: 1.102.2; 1.102.4;
Don't leave a dangling socket (no associated struct file) if
user supplied a bad name or anamelen parameter to accept(2).
If bad paramaters were suplied and a copyout() failed, the
struct file was cleaned up but not the associated socket. This
could leave sockets in CLOSE_WAIT that could never be closed.


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base7
# 1.101 23-Jul-2006 ad

branches: 1.101.2;
Use the LWP cached credentials where sane.


# 1.100 26-Jun-2006 mrg

version the socket(2) syscall. for compat30 socket, we use
EPROTONOSUPPORT instead of EAFNOSUPPORT.

from pavel@ with a little bit of clean up from myself.

XXX: netbsd32 (and perhaps other emulations) should be able
XXX: to call the standard socket calls for this i think, but
XXX: revisit this at another time.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base yamt-pdpolicy-base5 chap-midi-base simonb-timecounters-base
# 1.99 16-May-2006 christos

branches: 1.99.4;
Don't set mature an fd that has been ffree'd


Revision tags: elad-kernelauth-base
# 1.98 11-May-2006 christos

Add MSG_NOSIGNAL (from FreeBSD)


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2 yamt-pdpolicy-base
# 1.97 01-Mar-2006 yamt

branches: 1.97.2; 1.97.4; 1.97.6;
merge yamt-uio_vmspace branch.

- use vmspace rather than proc or lwp where appropriate.
the latter is more natural to specify an address space.
(and less likely to be abused for random purposes.)
- fix a swdmover race.


Revision tags: yamt-uio_vmspace-base5
# 1.96 26-Dec-2005 perry

branches: 1.96.2; 1.96.4; 1.96.6;
u_intN_t -> uintN_t


# 1.95 11-Dec-2005 christos

merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base ktrace-lwp-base
# 1.94 03-Sep-2005 martin

In adjust_rights() Use CMSG_SPACE() to calculate the number of
filedescriptors passed in this message - the counterpart in
unp_externalize does this as well.

Note that CMSG_SPACE(0) does not make sense, since it does not invoke
the alignment magic - so use CMSG_SPACE(sizeof(int)) and adjust the
calculated total later.

This fixes the postfix conection cache for 64bit platforms. Previously
the number of passed filed descriptors (nfds) would have been
calculeted too high, causing the fdrelease() of uninitialized junk.


# 1.93 03-Sep-2005 martin

minor knf tweak


# 1.92 30-May-2005 martin

branches: 1.92.2;
Close additional file descriptors if we set MSG_CTRUNC in a SCM_RIGHTS
message. From der Mouse in PR kern/30370.


# 1.91 29-May-2005 christos

- add const.
- remove unnecessary casts.
- add __UNCONST casts and mark them with XXXUNCONST as necessary.


Revision tags: yamt-km-base4 yamt-km-base3 netbsd-3-base kent-audio2-base
# 1.90 26-Feb-2005 perry

branches: 1.90.2;
nuke trailing whitespace


Revision tags: yamt-km-base2 yamt-km-base kent-audio1-beforemerge kent-audio1-base
# 1.89 30-Nov-2004 christos

branches: 1.89.4; 1.89.6;
Cloning cleanup:
1. make fileops const
2. add 2 new negative errno's to `officially' support the cloning hack:
- EDUPFD (used to overload ENODEV)
- EMOVEFD (used to overload ENXIO)
3. Created an fdclone() function to encapsulate the operations needed for
EMOVEFD, and made all cloners use it.
4. Centralize the local noop/badop fileops functions to:
fnullop_fcntl, fnullop_poll, fnullop_kqfilter, fbadop_stat


# 1.88 22-May-2004 jonathan

Eliminate several uses of `curproc' from the socket-layer code and from NFS.

Add a new explicit `struct proc *p' argument to socreate(), sosend().
Use that argument instead of curproc. Follow-on changes to pass that
argument to socreate(), sosend(), and (*so->so_send)() calls.
These changes reviewed and independently recoded by Matt Thomas.

Changes to soreceive() and (*dom->dom_exernalize() from Matt Thomas:
pass soreceive()'s struct uio* uio->uio_procp to unp_externalize().
Eliminate curproc from unp_externalize. Also, now soreceive() uses
its uio->uio_procp value, pass that same value downward to
((pr->pru_usrreq)() calls for consistency, instead of (struct proc * )0.

Similar changes in sys/nfs to eliminate (most) uses of curproc,
either via the req-> r_procp field of a struct nfsreq *req argument,
or by passing down new explicit struct proc * arguments.

Reviewed by: Matt Thomas, posted to tech-kern.
NB: The (*pr->pru_usrreq)() change should be tested on more (all!) protocols.


# 1.87 18-May-2004 ragge

Fix connect() "bug": If connect() is interrupted by a signal, the connection
attempt is terminated, so if a process needs frequent timer interrupts
it can't ever connect() to a machine far away.

Bug found by Erik Lundgren, bugfix (for the same problem) is similar to
the way FreeBSD solved the same problem.

As a side effect, the new connect() behaviour conformes to Posix.


Revision tags: netbsd-2-0-base
# 1.86 29-Nov-2003 matt

branches: 1.86.2;
Restore a change that made AF_LOCAL sockets block on connect(2) until
accepted. However, this time this behavor is not the default. Instead
it must enabled by using the LOCAL_CONNWAIT socket option on either the
connecting or accepting socket.


# 1.85 29-Nov-2003 perry

Revert a change that altered the semantics of AF_LOCAL sockets. Sadly
this made us API incompatible with other Unixes.


# 1.84 13-Nov-2003 chs

eliminate uvm_useracc() in favor of checking the return value of
copyin() or copyout().

uvm_useracc() tells us whether the mapping permissions allow access to
the desired part of an address space, and many callers assume that
this is the same as knowing whether an attempt to access that part of
the address space will succeed. however, access to user space can
fail for reasons other than insufficient permission, most notably that
paging in any non-resident data can fail due to i/o errors. most of
the callers of uvm_useracc() make the above incorrect assumption. the
rest are all misguided optimizations, which optimize for the case
where an operation will fail. we'd rather optimize for operations
succeeding, in which case we should just attempt the access and handle
failures due to insufficient permissions the same way we handle i/o
errors. since there appear to be no good uses of uvm_useracc(), we'll
just remove it.


# 1.83 04-Sep-2003 matt

Adapt to the new calling conventions of unp_connect2


# 1.82 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.81 29-Jun-2003 fvdl

branches: 1.81.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.


# 1.80 28-Jun-2003 darrenr

Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V


# 1.79 05-Apr-2003 christos

PR/21030: Naoto Shimazaki: fcntl to accepted socket does not work properly


# 1.78 26-Feb-2003 matt

Remove leftover MBUFTRACE asserts.


# 1.77 26-Feb-2003 drochner

deactivate MBUFTRACE related KASSERT()s in the !MBUFTRACE case


# 1.76 26-Feb-2003 matt

Add MBUFTRACE kernel option.
Do a little mbuf rework while here. Change all uses of MGET*(*, M_WAIT, *)
to m_get*(M_WAIT, *). These are not performance critical and making them
call m_get saves considerable space. Add m_clget analogue of MCLGET and
make corresponding change for M_WAIT uses.
Modify netinet, gem, fxp, tulip, nfs to support MBUFTRACE.
Begin to change netstat to use sysctl.


# 1.75 18-Jan-2003 thorpej

Merge the nathanw_sa branch.


Revision tags: nathanw_sa_before_merge fvdl_fs64_base gmcgarry_ctxsw_base gmcgarry_ucred_base nathanw_sa_base
# 1.74 26-Nov-2002 christos

si_ -> sel_ to avoid conflicts with siginfo.


# 1.73 25-Nov-2002 itojun

no need for error check after MEXTMALLOC - jdolecek


# 1.72 25-Nov-2002 itojun

MEXTMALLOC() can fail even if M_WAITOK, if arg is too big for malloc().


Revision tags: kqueue-aftermerge
# 1.71 23-Oct-2002 jdolecek

merge kqueue branch into -current

kqueue provides a stateful and efficient event notification framework
currently supported events include socket, file, directory, fifo,
pipe, tty and device changes, and monitoring of processes and signals

kqueue is supported by all writable filesystems in NetBSD tree
(with exception of Coda) and all device drivers supporting poll(2)

based on work done by Jonathan Lemon for FreeBSD
initial NetBSD port done by Luke Mewburn and Jason Thorpe


Revision tags: kqueue-beforemerge kqueue-base
# 1.70 04-Sep-2002 matt

Use the queue macros from <sys/queue.h> instead of referring to the queue
members directly. Use *_FOREACH whenever possible.


Revision tags: gehenna-devsw-base
# 1.69 31-May-2002 itojun

support setsockopt() with larger data (up to MCLBYTES).
From: Hitoshi Asaeda <Hitoshi.Asaeda@sophia.inria.fr>


Revision tags: netbsd-1-6-PATCH002-RELEASE netbsd-1-6-PATCH002 netbsd-1-6-PATCH002-RC4 netbsd-1-6-PATCH002-RC3 netbsd-1-6-PATCH002-RC2 netbsd-1-6-PATCH002-RC1 netbsd-1-6-PATCH001 netbsd-1-6-PATCH001-RELEASE netbsd-1-6-PATCH001-RC3 netbsd-1-6-PATCH001-RC2 netbsd-1-6-PATCH001-RC1 netbsd-1-6-RELEASE netbsd-1-6-RC3 netbsd-1-6-RC2 netbsd-1-6-RC1 netbsd-1-6-base eeh-devprop-base newlock-base ifpoll-base
# 1.68 11-Feb-2002 jdolecek

branches: 1.68.8;
Switch default for pipes to the faster John S. Dyson's implementation.
Old, socketpair-based ones are available with option PIPE_SOCKETPAIR.


Revision tags: thorpej-mips-cache-base
# 1.67 12-Nov-2001 lukem

add RCSIDs


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2
# 1.66 16-Sep-2001 wiz

branches: 1.66.2;
Spell 'occurred' with two 'r's.


Revision tags: post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.65 17-Jul-2001 jdolecek

branches: 1.65.2;
Expel MSG_COMPAT/COMPAT_OLDSOCK, make the COMPAT_43 wrappers
arrange things as needed. Unfortunately, the check in sockargs()
have to stay, since 4.3BSD bind(2), connect(2) and sendto(2) were
not versioned at the time :(

This code was tested to pass regression tests.


# 1.64 01-Jul-2001 matt

branches: 1.64.2;
Use consistent types for len. Limit sockarg length to reasonable values.


# 1.63 25-Jun-2001 jdolecek

Back off the sendit()/recvit() change, some have problems with it


# 1.62 25-Jun-2001 jdolecek

sys_connect(): fix the call to FILE_UNUSE() so that it's done on return, rather
than immediatelly after getsock() call


# 1.61 25-Jun-2001 jdolecek

Add 'kernsa' parameter for sendit()/recvit(); if nonzero, msg->msg_name
is supposed to point directly to struct mbuf or struct sockaddr in kernel
space as appropriate, rather than being a pointer to memory in userland.

This is to be used by compat/* when emulation needs to wrap
send{to|msg}(2)/recv{from|msg}(2) and modify the passed struct
sockaddr.


# 1.60 16-Jun-2001 jdolecek

Add port of high performance pipe implementation written by John S. Dyson
for FreeBSD project. Besides huge speed boost compared with socketpair-based
pipes, this implementation also uses pagable kernel memory instead of mbufs.

Significant differences to FreeBSD version:
* uses uvm_loan() facility for direct write
* async/SIGIO handling correct also for sync writer, async reader
* limits settable via sysctl, amountpipekva and nbigpipes available via sysctl
* pipes are unidirectional - this is enforced on file descriptor level
for now only, the code would be updated to take advantage of it
eventually
* uses lockmgr(9)-based locks instead of home brew variant
* scatter-gather write is handled correctly for direct write case, data
is transferred by PIPE_DIRECT_CHUNK bytes maximum, to avoid running out of kva

All FreeBSD/NetBSD specific code is within appropriate #ifdef, in preparation
to feed changes back to FreeBSD tree.

This pipe implementation is optional for now, add 'options NEW_PIPE'
to your kernel config to use it.


# 1.59 14-Jun-2001 thorpej

Fix a partial construction problem that can cause race conditions
between creation of a file descriptor and close(2) when using kernel
assisted threads. What we do is stick descriptors in the table, but
mark them as "larval". This causes essentially everything to treat
it as a non-existent descriptor, except for fdalloc(), which sees a
filled slot so that it won't (incorrectly) allocate it again. When
a descriptor is fully constructed, the code that has constructed it
marks it as "mature" (which actually clears the "larval" flag), and
things continue to work as normal.

While here, gather all the code that gets a descriptor from the table
into a fd_getfile() function, and call it, rather than having the
same (sometimes incorrect) code copied all over the place.


# 1.58 06-May-2001 manu

implement the recently introduced EMUL_BSD_ASYNCIO_PIPE emulation package
flag.

EMUL_BSD_ASYNCIO_PIPE notes that the emulated binaries expect the original
BSD pipe behavior for asynchronous I/O, which is to fire SIGIO on read() and
write(). OSes without this flag do not expect any SIGIO to be fired on
read() and write() for pipes, even when async I/O was requested. As far as
we know, the OSes that need EMUL_BSD_ASYNCIO_PIPE are NetBSD, OSF/1 and
Darwin.


Revision tags: thorpej_scsipi_beforemerge thorpej_scsipi_nbase thorpej_scsipi_base
# 1.57 27-Feb-2001 lukem

branches: 1.57.2;
convert to ANSI KNF


# 1.56 10-Dec-2000 fvdl

Make sobind() take a struct proc *. It already took curproc and
passed it down to the appropriate usrreq function, and this
allows usage for contexts that need to be explicitly different
from curproc (like in the NFS code when binding to a reserved port).


# 1.55 24-Nov-2000 jdolecek

define COMPAT_OLDSOCK unconditionally - the code is used virtually for all
emulations besides NetBSD, and this way it's LKM-safe


# 1.54 02-Aug-2000 thorpej

MALLOC()/FREE() are not to be used for variable sized allocations.


# 1.53 27-Jun-2000 mrg

remove include of <vm/vm.h>


Revision tags: netbsd-1-5-RELEASE netbsd-1-5-BETA2 netbsd-1-5-BETA netbsd-1-5-ALPHA2 netbsd-1-5-base minoura-xpg4dl-base
# 1.52 27-May-2000 sommerfeld

branches: 1.52.4;
Reduce use of curproc in several places:

- Change ktrace interface to pass in the current process, rather than
p->p_tracep, since the various ktr* function need curproc anyway.

- Add curproc as a parameter to mi_switch() since all callers had it
handy anyway.

- Add a second proc argument for inferior() since callers all had
curproc handy.

Also, miscellaneous cleanups in ktrace:

- ktrace now always uses file-based, rather than vnode-based I/O
(simplifies, increases type safety); eliminate KTRFLAG_FD & KTRFAC_FD.
Do non-blocking I/O, and yield a finite number of times when receiving
EWOULDBLOCK before giving up.

- move code duplicated between sys_fktrace and sys_ktrace into ktrace_common.

- simplify interface to ktrwrite()


# 1.51 30-Mar-2000 augustss

Get rid of register declarations.


# 1.50 23-Mar-2000 thorpej

Implement fdremove() which is used in place of all the code that
did the "fdp->fd_ofiles[fd] = 0" assignment; fdremove() make sure
the fd_freefiles hints stay in sync.

From OpenBSD.


Revision tags: chs-ubc2-newbase wrstuden-devbsize-19991221 wrstuden-devbsize-base comdex-fall-1999-base fvdl-softdep-base
# 1.49 05-Nov-1999 mycroft

branches: 1.49.2;
Fix recent bug in sys_accept(): we must remove the file descriptor from the
file descriptor table before freeing the file description.


# 1.48 30-Oct-1999 enami

back out unnecessary stylistic changes in recent changes, to keep coding
style closer to NKF.


# 1.47 27-Oct-1999 jdolecek

minor cleanup of previous - avoid goto and code duplication


# 1.46 27-Oct-1999 darrenr

patch from Greg A. Woods to fix panic problems with code that attempts to
recover from failures to accept a socket successfully. Problem suggested
by this:
> It would appear (from two "panic: closef: count < 0" failures in less
> than 12 hours) that Darren's fix to accept(2) for lost file descriptors
> isn't quite correct. His fix inserts a call to closef() to handle one
> of several possible error conditions. However everywhere else in the
> socket code in the same file where falloc() cleanup is necessary the
> function used is ffree().


Revision tags: chs-ubc2-base
# 1.45 01-Jul-1999 itojun

branches: 1.45.2; 1.45.4; 1.45.6;
IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.


# 1.44 01-Jul-1999 darrenr

fix sys_accept() to return EOPNOTSUPP for protocols which don't support
listen/accept (PR_LISTEN flag in protosw) and detect obvious faults in
parameters passed. It is still possible for the address used for copying
the socket information to become invalid between that check and the copyout
so close the connection's allocated fd if the copyout fails so that we can
return EFAULT without allocating an fd and the application not knowing about
it. Ideally we'd be able to queue the connection back up so a later accept
could retrieve it but unfortunately that's not possible.


# 1.43 05-May-1999 thorpej

Add "use counting" to file entries. When closing a file, and it's reference
count is 0, wait for use count to drain before finishing the close.

This is necessary in order for multiple processes to safely share file
descriptor tables.


# 1.42 30-Apr-1999 cgd

add checks for COMPAT_OSF1 in the appropriate places


Revision tags: netbsd-1-4-RELEASE netbsd-1-4-base
# 1.41 10-Feb-1999 kleink

branches: 1.41.2; 1.41.4; 1.41.6;
* Due to addition and use of socklen_t, make the socket option and address
arguments passed to accept(), bind(), connect(), getpeername(), getsockname(),
getsockopt(), recvfrom(), sendto() and sendmsg() unsigned, which also elimiates
a few casts.
* Reflect the (now) signedness of msg_iovlen, which necessiates the addition
of a few casts.


# 1.40 18-Dec-1998 drochner

solve the COMPAT_OLDSOCK/MSG_COMPAT problem differently:
The source files which need MSG_COMPAT define COMPAT_OLDSOCK.


Revision tags: kenh-if-detach-base
# 1.39 26-Nov-1998 mycroft

Revert the functional change in rev 1.38; permit a msg_iovlen of 0.
There are two reasons for this:
* We should be able to pass file descriptors without sending any data.
* We could send zero-length iovecs anyway (but we shouldn't have to do this).
Also, msg_iovlen is already a u_int, so delete a bunch of casts.


Revision tags: chs-ubc-base
# 1.38 04-Aug-1998 kleink

Per XNS Issue 5, calling recvmsg(2) or sendmsg(2) with an msg.msg_iovlen less
than or equal to 0 shall fail with EMSGSIZE; the latter condition was not being
checked for. Also, document the msg.msg_iovlen > {IOV_MAX} case.


# 1.37 04-Aug-1998 kleink

UIO_MAXIOV -> IOV_MAX


# 1.36 04-Aug-1998 perry

Abolition of bcopy, ovbcopy, bcmp, and bzero, phase one.
bcopy(x, y, z) -> memcpy(y, x, z)
ovbcopy(x, y, z) -> memmove(y, x, z)
bcmp(x, y, z) -> memcmp(x, y, z)
bzero(x, y) -> memset(x, 0, y)


# 1.35 03-Aug-1998 kleink

Fix two off-by-one bugs, both present in each recvmsg(2) and sendmsg(2):
* the first one would cause an unnecessary malloc() of iovec storage for
a msg_iovlen of UIO_SMALLIOV although the required amount of memory has
been allocated on the stack.
* the second one would cause a recvmsg() or sendmsg() with a msg_iovlen of
UIO_MAXIOV to fail with EMSGSIZE, which is also a violation of XNS5.


# 1.34 31-Jul-1998 perry

fix sizeofs so they comply with the KNF style guide. yes, it is pedantic.


Revision tags: eeh-paddr_t-base
# 1.33 29-Jul-1998 thorpej

branches: 1.33.2;
When checking for overflow in the residual count, test against SSIZE_MAX.
The read/write system calls return ssize_t because -1 is used to indicate
error, therefore the transfer size MUST be limited to SSIZE_MAX, otherwise
garbage can be returned to the user.

There is NO change from existing behavior here, only a more precise
definition of that the semantics are, except in the Alpha case, where
the full SSIZE_MAX transfer size can now be realized (ssize_t is 64-bit
on the Alpha).


# 1.32 18-Jul-1998 lukem

use AF_LOCAL instead of AF_UNIX


# 1.31 25-Jun-1998 thorpej

defopt KTRACE


# 1.30 25-Apr-1998 matt

Hook for 0-copy (or other optimized) sends and receives


# 1.29 01-Mar-1998 fvdl

Merge with Lite2 + local changes


# 1.28 06-Feb-1998 thorpej

When copying out multiple control messages, ensure that the next control
message is aligned. From David Borman <dab@bsdi.com>.


# 1.27 07-Jan-1998 thorpej

Make insertion and removal of sockets from the partial and incoming
connections queues O(C) rather than O(N).


# 1.26 07-Jan-1998 thorpej

Fix bug in recvit() that would cause recvmsg() to only receive one
control message, even if there were multiple control messages on
the queue. From Jean-Luc Richier <Jean-Luc.Richier@imag.fr>, in
bug report kern/4700.


Revision tags: netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base thorpej-signal-base marc-pcmcia-bp marc-pcmcia-base
# 1.25 26-Jun-1997 thorpej

branches: 1.25.8;
Use UCHAR_MAX rather than "255" when sanity-checking the length of a
sockaddr in sockargs().


# 1.24 26-Jun-1997 thorpej

In sockargs():
- Add a comment describing my feelings about this interface, in general.
- Remove the COMPAT_OLDSOCK length hack. Instead, if the socket argument
is too long to fit in an mbuf, allocate enough external storage to
hold it.
- If the socket argument is a sockaddr, don't allow the length to be
greater than 255, as that would overflow sa_len.
Many thanks to enami tsugutomo <enami@cv.sony.co.jp> for his sanity checking.


Revision tags: is-newarp-before-merge is-newarp-base
# 1.23 22-Dec-1996 cgd

* catch up with system call argument type fixups/const poisoning.
* Fix arguments to various copyin()/copyout() invocations, to avoid
gratuitous casts.
* Some KNF formatting fixes
* Change sockargs()'s second argument to be a const void *, to help
with dealing with the syscall argument type fixups/const poisoning.


# 1.22 14-Jun-1996 cgd

avoid unnecessary checks of m_get/MGET/etc.'s return values. When
they're called with M_WAIT, they are defined to never return NULL.


# 1.21 22-May-1996 mycroft

Pass a proc pointer down to the usrreq and pcbbind functions for PRU_ATTACH, PRU_BIND and
PRU_CONTROL. The usrreq interface really needs to be split up, but this will have to wait.
Remove SS_PRIV completely.


Revision tags: netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.20 17-May-1996 pk

branches: 1.20.4;
Don't touch retval[] in socketpair(); manual page says this system call
return 0 on success (PR#2428).


# 1.19 09-Feb-1996 christos

More proto fixes


# 1.18 04-Feb-1996 christos

First pass at prototyping


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.17 10-Oct-1995 mycroft

Add hooks for COMPAT_FREEBSD, from Noriyuki Soda.


# 1.16 07-Oct-1995 mycroft

Prefix names of system call implementation functions with `sys_'.


# 1.15 19-Sep-1995 thorpej

Make system calls conform to a standard prototype and bring those
prototypes into scope.


# 1.14 12-Aug-1995 mycroft

splnet --> splsoftnet


# 1.13 24-Jun-1995 christos

Extracted all of the compat_xxx routines, and created a library [libcompat]
for them. There are a few #ifdef COMPAT_XX remaining, but they are not easy
or worth eliminating (yet).


# 1.12 10-May-1995 christos

tty_tb.c: need to include ioctl_compat.h in order to compile.
sysv_shm.c: make shm_find_segment_by_shmid global so it can be used by
COMPAT_HPUX. There should be a better way...
rest: Add #ifdef COMPAT_HPUX where needed


# 1.11 05-Mar-1995 fvdl

Extended a couple of defines with "|| defined(COMPAT_LINUX)" to make
things compile without requiring COMPAT_43 and/or COMPAT_09.


# 1.10 30-Oct-1994 cgd

be more careful with types, also pull in headers where necessary.


# 1.9 20-Oct-1994 cgd

update for new syscall args description mechanism


Revision tags: netbsd-1-0-PATCH06 netbsd-1-0-PATCH05 netbsd-1-0-PATCH04 netbsd-1-0-PATCH03 netbsd-1-0-PATCH02 netbsd-1-0-PATCH1 netbsd-1-0-PATCH0 netbsd-1-0-RELEASE netbsd-1-0-base
# 1.8 29-Jun-1994 cgd

New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.7 04-May-1994 mycroft

Add return types where missing. Simplify some of the compat conditionals.
Include compat code if COMPAT_SUNOS with or without COMPAT_43.


# 1.6 18-Dec-1993 mycroft

Canonicalize all #includes.


Revision tags: magnum-base netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.5 17-Jul-1993 mycroft

branches: 1.5.4;
Finish moving struct definitions outside of function declarations.


# 1.4 27-Jun-1993 andrew

* ansifications
* Yuval Yarom's socket recv(2) fixes - access rights problems (see also
uipc_socket.c).


# 1.3 22-May-1993 cgd

add include of select.h if necessary for protos, or delete if extraneous


# 1.2 18-May-1993 cgd

make kernel select interface be one-stop shopping & clean it all up.


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


# 1.190 04-Jan-2018 christos

Add a new sockopt member to keep track of the actual size of the option
that should be returned to the caller in getsockopt(2).

(Tom Ivar Helbekkmo)


# 1.189 31-Dec-2017 christos

pass valsize for getsockopt like we do for setsockopt


# 1.188 26-Dec-2017 kamil

Refactor pipe1() and correct a bug in sys_pipe2() (SYS_pipe2)

sys_pipe2() returns two integers (values), the 2nd one is a copy of the 2nd
file descriptor that lands in fildes[2]. This is a side effect of reusing
the code for sys_pipe() (SYS_pipe) and not cleaning it up.

The first returned value is (on success) 0.

Introduced a small refactoring in pipe1() that it does not operate over
retval[], but on an array int[2]. A user sets retval[] for pipe() when
desired and needed.

This refactoring touches compat code: netbsd32, linux, linux32.

Before the changes on NetBSD/amd64:

$ ktruss -i ./a.out
[...]
15131 1 a.out pipe2(0x7f7fff2e62b8, 0) = 0, 4
[...]

After the changes:

$ ktruss -i ./a.out
[...]
782 1 a.out pipe2(0x7f7fff97e850, 0) = 0
[...]

There should not be a visible change for current users.

Sponsored by <The NetBSD Foundation>


Revision tags: tls-maxphys-base-20171202 nick-nhusb-base-20170825 perseant-stdc-iso10646-base
# 1.187 20-Jun-2017 christos

Change len type to be unsigned int for consistency with the input type.
Don't check for negative; it does not matter we clamp anyway. This
broke the compat32 getsockname() where an unitialized socklen_t ended
up randomly negative causing it to fail.


Revision tags: matt-nb8-mediatek-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204
# 1.186 03-Feb-2017 christos

expose sendmsg_so and recvmsg_so.


# 1.185 02-Feb-2017 christos

expose copyout_sockname_sb


Revision tags: bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204
# 1.184 03-Dec-2016 christos

branches: 1.184.2;
Add missing ktrkuser


Revision tags: pgoyette-localcount-20161104 nick-nhusb-base-20161004 localcount-20160914
# 1.183 13-Sep-2016 martin

Make the ktrace record written by do_sys_sendmsg/do_sys_recvmsg overridable
by the caller. Use this in compat_netbsd32 to log the 32bit version, so
the 32bit userland kdump is happy.


Revision tags: pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.182 07-Jul-2016 msaitoh

branches: 1.182.2;
KNF. Remove extra spaces. No functional change.


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.181 01-Nov-2015 christos

Don't overwrite the user iov pointer in sendmmsg. Make the send and receive
code look the same.


Revision tags: nick-nhusb-base-20150921
# 1.180 24-Aug-2015 pooka

sprinkle _KERNEL_OPT


# 1.179 22-Jul-2015 maxv

Memory leak. Triggerable from an unprivileged user via COMPAT_43.


Revision tags: nick-nhusb-base-20150606
# 1.178 09-May-2015 rtr

change sosend() to accept sockaddr * instead of mbuf * for nam.

bump to 7.99.16


# 1.177 02-May-2015 rtr

make connect syscall use sockaddr_big and modify pr_{send,connect}
nam parameter type from buf * to sockaddr *.

final commit for parameter type changes to protocol user requests

* bump kernel version to 7.99.15 for parameter type changes to pr_{send,connect}


# 1.176 24-Apr-2015 rtr

make accept, getsockname and getpeername syscalls use sockaddr_big and modify
pr_{accept,sockname,peername} nam parameter type from mbuf * to sockaddr *.

* retained use of mbuftypes[MT_SONAME] for now.
* bump to netbsd version 7.99.12 for parameter type change.

patch posted to tech-net@ 2015/04/19


Revision tags: nick-nhusb-base-20150406
# 1.175 03-Apr-2015 rtr

* change pr_bind to accept struct sockaddr * instead of struct mbuf *
* update protocol bind implementations to use/expect sockaddr *
instead of mbuf *
* introduce sockaddr_big struct for storage of addr data passed via
sys_bind; sockaddr_big is of sufficient size and alignment to
accommodate all addr data sizes received.
* modify sys_bind to allocate sockaddr_big instead of using an mbuf.
* bump kernel version to 7.99.9 for change to pr_bind() parameter type.

Patch posted to tech-net@
http://mail-index.netbsd.org/tech-net/2015/03/15/msg005004.html

The choice to use a new structure sockaddr_big has been retained since
changing sockaddr_storage size would lead to unnecessary ABI change. The
use of the new structure does not preclude future work that increases
the size of sockaddr_storage and at that time sockaddr_big may be
trivially replaced.

Tested by mrg@ and myself, discussed with rmind@, posted to tech-net@


# 1.174 06-Mar-2015 rtr

Return EINVAL if namelen isn't large enough to encompass the expected
members of sockaddr structures. i.e. sa_len and sa_family.

Discussed with and patch by christos@


Revision tags: nick-nhusb-base
# 1.173 05-Sep-2014 matt

branches: 1.173.2;
Try not to use f_data, use f_{vnode,socket,pipe,mqueue,kqueue,ksem} to get
a correctly typed pointer.


Revision tags: netbsd-7-0-RC2 netbsd-7-0-RC1 netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.172 09-Aug-2014 rtr

branches: 1.172.2;
split PRU_CONNECT2 & PRU_PURGEIF function out of pr_generic() usrreq
switches and put into separate functions

- always KASSERT(solocked(so)) even if not implemented
(for PRU_CONNECT2 only)

- replace calls to pr_generic() with req = PRU_CONNECT2 with calls to
pr_connect2()

- replace calls to pr_generic() with req = PRU_PURGEIF with calls to
pr_purgeif()

put common code from unp_connect2() (used by unp_connect() into
unp_connect1() and call out to it when needed

patch only briefly reviewed by rmind@


# 1.171 09-Jul-2014 rtr

* split PRU_PEERADDR and PRU_SOCKADDR function out of pr_generic()
usrreq switches and put into separate functions
xxx_{peer,sock}addr(struct socket *, struct mbuf *).

- KASSERT(solocked(so)) always in new functions even if request
is not implemented

- KASSERT(pcb != NULL) and KASSERT(nam) if the request is
implemented and not for tcp.

* for tcp roll #ifdef KPROF and #ifdef DEBUG code from tcp_usrreq() into
easier to cut & paste functions tcp_debug_capture() and
tcp_debug_trace()

- functions provided by rmind
- remaining use of PRU_{PEER,SOCK}ADDR #define to be removed in a
future commit.

* rename netbt functions to permit consistency of pru function names
(as has been done with other requests already split out).

- l2cap_{peer,sock}addr() -> l2cap_{peer,sock}_addr_pcb()
- rfcomm_{peer,sock}addr() -> rfcomm_{peer,sock}_addr_pcb()
- sco_{peer,sock}addr() -> sco_{peer,sock}_addr_pcb()

* split/refactor do_sys_getsockname(lwp, fd, which, nam) into
two functions do_sys_get{peer,sock}name(fd, nam).

- move PRU_PEERADDR handling into do_sys_getpeername() from
do_sys_getsockname()
- have svr4_stream directly call do_sys_get{sock,peer}name()
respectively instead of providing `which' & fix a DPRINTF string
that incorrectly wrote "getpeername" when it meant "getsockname"
- fix sys_getpeername() and sys_getsockname() to call
do_sys_get{sock,peer}name() without `which' and `lwp' & adjust
comments
- bump kernel version for removal of lwp & which parameters from
do_sys_getsockname()

note: future cleanup to remove struct mbuf * abuse in
xxx_{peer,sock}name()
still to come, not done in this commit since it is easier to do post
split.

patch reviewed by rmind

welcome to 6.99.47


Revision tags: rmind-smpnet-nbase rmind-smpnet-base
# 1.170 18-May-2014 rmind

Add struct pr_usrreqs with a pr_generic function and prepare for the
dismantling of pr_usrreq in the protocols; no functional change intended.
PRU_ATTACH/PRU_DETACH changes will follow soon.

Bump for struct protosw. Welcome to 6.99.62!


# 1.169 17-May-2014 rmind

- fsocreate: set SS_NBIO before the file descriptor is affixed as there is
a theoretical race condition (hard to trigger, though); remove the LWP
parameter and clean up the code a little.
- Sprinkle few comments.
- Remove M_SOOPTS while here.


# 1.168 17-May-2014 rmind

makesocket: set SS_NBIO slightly earlier.


# 1.167 17-May-2014 rmind

Remove trailing whitespaces, wrap long lines, minor KNF; no functional changes.


Revision tags: yamt-pagecache-base9
# 1.166 07-Apr-2014 seanb

Fix a case where an erroneous EAGAIN was returned out of recvmmsg.
This occured when some, but not all of the mmsg array members
were filled with data from a non-blocking socket.
PR kern/48725


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.165 09-Oct-2013 christos

branches: 1.165.2;
delete extra m_len initialization.


# 1.164 09-Oct-2013 christos

PR/48292: Justin Cormack: paccept creates sockets that cannot be made blocking
Reset the socket flags not just the file flags for non-blocking I/O.
XXX: pullup 6


# 1.163 08-Oct-2013 christos

PR/47591: Michael Plass: If the unix socket is closed before accept,
unp->unp_conn will be NULL in PRU_ACCEPT, as called from
sys_accept->so_accept. This will cause the usrreq to return with
no error, leaving the mbuf gotten from m_get() with an uninitialized
length, containing junk from a previous call. Initialize m_len to
be 0 to handle this case. This is yet another reason why Beverly's
idea of setting m_len = 0 in m_get() makes a lot of sense. Arguably
this could be an error, since the data we return now has 0 family
and length.


# 1.162 29-Aug-2013 rmind

Remove SS_ISCONFIRMING, it is unused and TP4 will not come back.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.161 03-Jun-2013 christos

branches: 1.161.2;
use the proper name for kdump pretty-printing.


Revision tags: agc-symver-base
# 1.160 14-Feb-2013 christos

PR/47569: Valery Ushakov: SOCK_NONBLOCK does not work because it does not
set SS_NBIO.
XXX: there are too many flags that mean the same thing in too many places,
and too many flags that mean the same thing and are different.


# 1.159 14-Feb-2013 riastradh

Fix some screw cases in cmsg file descriptor passing.

- Don't leave garbage in the control buffer if allocating file
descriptors fails in unp_externalize.

- Scrub the space between CMSG_LEN and CMSG_SPACE to avoid kernel
memory disclosure in unp_externalize.

- Don't read past cmsg_len when closing file descriptors that
couldn't get delivered, in free_rights.

ok christos


Revision tags: yamt-pagecache-base8
# 1.158 29-Dec-2012 mlelstv

The sanity check prevented messages that carry only ancillary data.


# 1.157 29-Dec-2012 mlelstv

If an untraced process sleeps in recvmsg/sendmsg, the syscall does not
allocate an iov structure for ktrace. When tracing is then enabled
and the process wakes up, it crashes the kernel.

Undo the last commit which introduced this error path.

Avoid the mentioned kmem_alloc assertion by adding a sanity check analog
to similar code in sys_generic.c for I/O on file handles instead of
sockets.

This also causes the syscall to return EMSGSIZE if the msg_iovlen member
of the msg structure is less than or equal to 0, as defined in
recvmsg(2)/sendmsg(2).


Revision tags: yamt-pagecache-base7 yamt-pagecache-base6
# 1.156 17-Jul-2012 njoly

branches: 1.156.2;
Avoid kmem_alloc KASSERT for 0 byte allocation, when tracing processes
that use empty messages with sendmsg/recvmsg.


# 1.155 22-Jun-2012 christos

Add {send,recv}mmsg from Linux


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2 netbsd-6-base
# 1.154 25-Jan-2012 christos

branches: 1.154.2;
revert atomics for so_options since it is a short.


# 1.153 25-Jan-2012 christos

need <sys/atomic.h>


# 1.152 25-Jan-2012 christos

Add locking, requested by yamt. Note that locking is not used everywhere
for these.


# 1.151 25-Jan-2012 christos

As discussed in tech-kern, provide the means to prevent delivery of SIGPIPE
on EPIPE for all file descriptor types:

- provide O_NOSIGPIPE for open,kqueue1,pipe2,dup3,fcntl(F_{G,S}ETFL) [NetBSD]
- provide SOCK_NOSIGPIPE for socket,socketpair [NetBSD]
- provide SO_NOSIGPIPE for {g,s}seckopt [NetBSD/FreeBSD/MacOSX]
- provide F_{G,S}ETNOSIGPIPE for fcntl [MacOSX]


# 1.150 21-Dec-2011 christos

simplify expression


# 1.149 20-Dec-2011 christos

- Eliminate so_nbio and turn it into a bit SS_NBIO in so_state.
- Introduce MSG_NBIO so that we can turn non blocking i/o on a per call basis
- Use MSG_NBIO to fix the XXX: multi-threaded issues on the fifo sockets.
- Don't set SO_CANTRCVMORE, if we were interrupted (perhaps do it for all
errors?).


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2
# 1.148 04-Nov-2011 christos

branches: 1.148.4;
Fix error I introduced in previous commit that caused asymmetric connects
when SOCK_NONBLOCK or SOCK_CLOEXEC was specified. Factor out common code
and simplify error return.


Revision tags: yamt-pagecache-base
# 1.147 21-Sep-2011 christos

branches: 1.147.2;
Put the mbuf type in the ktrace record so that we know how to decode it
in userland.


# 1.146 27-Jul-2011 uebayasi

These don't need uvm/uvm_extern.h.


# 1.145 15-Jul-2011 christos

fail with EINVAL if flags not are not O_CLOEXEC|O_NONBLOCK in pipe2(2) and
dup3(2)


# 1.144 26-Jun-2011 christos

* Arrange for interfaces that create new file descriptors to be able to
set close-on-exec on creation (http://udrepper.livejournal.com/20407.html).

- Add F_DUPFD_CLOEXEC to fcntl(2).
- Add MSG_CMSG_CLOEXEC to recvmsg(2) for unix file descriptor passing.
- Add dup3(2) syscall with a flags argument for O_CLOEXEC, O_NONBLOCK.
- Add pipe2(2) syscall with a flags argument for O_CLOEXEC, O_NONBLOCK.
- Add flags SOCK_CLOEXEC, SOCK_NONBLOCK to the socket type parameter
for socket(2) and socketpair(2).
- Add new paccept(2) syscall that takes an additional sigset_t to alter
the sigmask temporarily and a flags argument to set SOCK_CLOEXEC,
SOCK_NONBLOCK.
- Add new mode character 'e' to fopen(3) and popen(3) to open pipes
and file descriptors for close on exec.
- Add new kqueue1(2) syscall with a new flags argument to open the
kqueue file descriptor with O_CLOEXEC, O_NONBLOCK.

* Fix the system calls that take socklen_t arguments to actually do so.

* Don't include userland header files (signal.h) from system header files
(rump_syscallargs.h).

* Bump libc version for the new syscalls.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.143 24-Apr-2011 rmind

- Replace few malloc(9) uses with kmem(9).
- Rename buf_malloc() to buf_alloc(), fix comments.
- Remove some unnecessary inclusions.


# 1.142 10-Apr-2011 christos

- Add O_CLOEXEC to open(2)
- Add fd_set_exclose() to encapsulate uses of FIO{,N}CLEX, O_CLOEXEC, F{G,S}ETFD
- Add a pipe1() function to allow passing flags to the fd's that pipe(2)
opens to ease implementation of linux pipe2(2)
- Factor out fp handling code from open(2) and fhopen(2)


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base matt-mips64-premerge-20101231 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10 uebayasi-xip-base1
# 1.141 23-Apr-2010 rmind

branches: 1.141.2;
Replace M_IOV and some malloc(9)s with kmem(9), and while there:
- Fix invalid free (M_TEMP vs M_IOV) in do_sys_recvmsg(), spotted by jakllsch@.
Also, same fix in osf1_sys_sendmsg_xopen().
- Fix attempt to free non-allocated memory in error path in netbsd32___getfh30().
- Plug a memory leak in compat_43_netbsd32_orecvmsg().


Revision tags: yamt-nfs-mp-base9 uebayasi-xip-base
# 1.140 21-Jan-2010 pgoyette

branches: 1.140.2; 1.140.4;
Remove unnecessary call to kauth_cred_free().

This resolves an occassional crash I'd been experiencing as reported on
current-users@

Fix suggested by and OK elad@


# 1.139 29-Dec-2009 elad

Add credentials to to sockets.

We don't need any deferred free etc. because we no longer free the
credentials in interrupt context.

Tons of help from matt@, thanks!


# 1.138 20-Dec-2009 dsl

If a multithreaded app closes an fd while another thread is blocked in
read/write/accept, then the expectation is that the blocked thread will
exit and the close complete.
Since only one fd is affected, but many fd can refer to the same file,
the close code can only request the fs code unblock with ERESTART.
Fixed for pipes and sockets, ERESTART will only be generated after such
a close - so there should be no change for other programs.
Also rename fo_abort() to fo_restart() (this used to be fo_drain()).
Fixes PR/26567


Revision tags: matt-premerge-20091211
# 1.137 09-Dec-2009 dsl

Rename fo_drain() to fo_abort(), 'drain' is used to mean 'wait for output
do drain' in many places, whereas fo_drain() was called in order to force
blocking read()/write() etc calls to return to userspace so that a close()
call from a different thread can complete.
In the sockets code comment out the broken code in the inner function,
it was being called from compat code.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 jym-xensuspend-nbase yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.136 04-Apr-2009 ad

Add fileops::fo_drain(), to be called from fd_close() when there is more
than one active reference to a file descriptor. It should dislodge threads
sleeping while holding a reference to the descriptor. Implemented only for
sockets but should be extended to pipes, fifos, etc.

Fixes the case of a multithreaded process doing something like the
following, which would have hung until the process got a signal.

thr0 accept(fd, ...)
thr1 close(fd)


Revision tags: nick-hppapmap-base2
# 1.135 21-Jan-2009 yamt

branches: 1.135.2;
restore the pre socket locking patch signal behaviour.
this fixes a busy-loop in nfs_connect.


Revision tags: netbsd-5-0-RC1 haad-dm-base2 haad-nbase2 ad-audiomp2-base netbsd-5-base matt-mips64-base2 haad-dm-base1 wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 haad-dm-base mjf-devfs2-base
# 1.134 06-Aug-2008 plunky

branches: 1.134.2; 1.134.4;
Convert socket options code to use a sockopt structure
instead of laying everything into an mbuf.

approved by core


Revision tags: simonb-wapbl-nbase simonb-wapbl-base
# 1.133 24-Jun-2008 ad

branches: 1.133.2;
Nothing uses getsock/getvnode any more.


Revision tags: wrstuden-revivesa-base-1 yamt-pf42-base4 yamt-pf42-base3 wrstuden-revivesa-base
# 1.132 30-May-2008 rmind

branches: 1.132.2;
do_sys_accept: release the reference to sock in few error paths.
Should fix PR/38790, report and test-case by Nicolas Joly.


Revision tags: hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2
# 1.131 28-Apr-2008 martin

branches: 1.131.2;
Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.130 24-Apr-2008 ad

branches: 1.130.2;
Network protocol interrupts can now block on locks, so merge the globals
proclist_mutex and proclist_lock into a single adaptive mutex (proc_lock).
Implications:

- Inspecting process state requires thread context, so signals can no longer
be sent from a hardware interrupt handler. Signal activity must be
deferred to a soft interrupt or kthread.

- As the proc state locking is simplified, it's now safe to take exit()
and wait() out from under kernel_lock.

- The system spends less time at IPL_SCHED, and there is less lock activity.


# 1.129 24-Apr-2008 ad

Merge the socket locking patch:

- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.

With much feedback from matt@ and plunky@.


Revision tags: yamt-pf42-baseX yamt-pf42-base ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14
# 1.128 21-Mar-2008 ad

branches: 1.128.2;
Catch up with descriptor handling changes. See kern_descrip.c revision
1.173 for details.


Revision tags: keiichi-mipv6-nbase nick-net80211-sync-base keiichi-mipv6-base matt-armv6-nbase mjf-devfs-base hpcarm-cleanup-base
# 1.127 06-Feb-2008 ad

branches: 1.127.6;
Don't lock the socket to set/clear FNONBLOCK. Just set it atomically.


Revision tags: vmlocking2-base3 bouyer-xeni386-nbase bouyer-xeni386-base matt-armv6-base
# 1.126 26-Dec-2007 ad

Merge more changes from vmlocking2, mainly:

- Locking improvements.
- Use pool_cache for more items.


# 1.125 20-Dec-2007 dsl

Convert all the system call entry points from:
int foo(struct lwp *l, void *v, register_t *retval)
to:
int foo(struct lwp *l, const struct foo_args *uap, register_t *retval)
Fixup compat code to not write into 'uap' and (in some cases) to actually
pass a correctly formatted 'uap' structure with the right name to the
next routine.
A few 'compat' routines that just call standard ones have been deleted.
All the 'compat' code compiles (along with the kernels required to test
build it).
98% done by automated scripts.


# 1.124 16-Dec-2007 elad

Make solisten() take an lwp pointer like the rest, so it can be passed down
to pr_usrreq.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 vmlocking-nbase jmcneill-pm-base reinoud-bufcleanup-base
# 1.123 24-Nov-2007 dyoung

branches: 1.123.2; 1.123.6;
Pass the mbuf type (e.g., MT_SONAME, MT_SOOPTS) as the second
argument to getsockmbuf().


Revision tags: jmcneill-base bouyer-xenamd64-base2 yamt-x86pmap-base4 bouyer-xenamd64-base yamt-x86pmap-base3 yamt-x86pmap-base2 vmlocking-base
# 1.122 05-Oct-2007 dyoung

branches: 1.122.4;
Use getsombuf().


Revision tags: yamt-x86pmap-base
# 1.121 19-Sep-2007 christos

branches: 1.121.2;
minor nits; no code change.


# 1.120 19-Sep-2007 dyoung

1) Introduce a new socket option, (SOL_SOCKET, SO_NOHEADER), that
tells a socket that it should both add a protocol header to tx'd
datagrams and remove the header from rx'd datagrams:

int onoff = 1, s = socket(...);
setsockopt(s, SOL_SOCKET, SO_NOHEADER, &onoff);

2) Add an implementation of (SOL_SOCKET, SO_NOHEADER) for raw IPv4
sockets.

3) Reorganize the protocols' pr_ctloutput implementations a bit.
Consistently return ENOPROTOOPT when an option is unsupported,
and EINVAL if a supported option's arguments are incorrect.
Reorganize the flow of code so that it's more clear how/when
options are passed down the stack until they are handled.

Shorten some pr_ctloutput staircases for readability.

4) Extract common mbuf code into subroutines, add new sockaddr
methods, and introduce a new subroutine, fsocreate(), for reuse
later; use it first in sys_socket():

struct mbuf *m_getsombuf(struct socket *so)

Create an mbuf and make its owner the socket `so'.

struct mbuf *m_intopt(struct socket *so, int val)

Create an mbuf, make its owner the socket `so', put the
int `val' into it, and set its length to sizeof(int).


int fsocreate(..., int *fd)

Create a socket, a la socreate(9), put the socket into the
given LWP's descriptor table, return the descriptor at `fd'
on success.

void *sockaddr_addr(struct sockaddr *sa, socklen_t *slenp)
const void *sockaddr_const_addr(const struct sockaddr *sa, socklen_t *slenp)

Extract a pointer to the address part of a sockaddr. Write
the length of the address part at `slenp', if `slenp' is
not NULL.

socklen_t sockaddr_getlen(const struct sockaddr *sa)

Return the length of a sockaddr. This just evaluates to
sa->sa_len. I only add this for consistency with code that
appears in a portable userland library that I am going to
import.

const struct sockaddr *sockaddr_any(const struct sockaddr *sa)

Return the "don't care" sockaddr in the same family as
`sa'. This is the address a client should sobind(9) if it
does not care the source address and, if applicable, the
port et cetera that it uses.

const void *sockaddr_anyaddr(const struct sockaddr *sa, socklen_t *slenp)

Return the "don't care" sockaddr in the same family as
`sa'. This is the address a client should sobind(9) if it
does not care the source address and, if applicable, the
port et cetera that it uses.


Revision tags: nick-csl-alignment-base5
# 1.119 06-Sep-2007 rmind

do_sys_sendmsg: Plug a possible leak.
From CID: 4535


# 1.118 01-Sep-2007 dsl

Don't error calls to copy socket addresses to userspace when the application
has provided a non-null buffer pointer and a zero length.


# 1.117 27-Aug-2007 dsl

ktrace socket control structures (ie msghdr, address etc) using ktrkuser().


# 1.116 15-Aug-2007 ad

branches: 1.116.2;
Changes to make ktrace LKM friendly and reduce ifdef KTRACE. Proposed
on tech-kern.


Revision tags: matt-mips64-base nick-csl-alignment-base
# 1.115 15-Jul-2007 dsl

branches: 1.115.2; 1.115.6;
Remove non-user flags (especially MSG_IOVUSRSPACE) from mp->msg_flags
before passing to so_receive.
This may (or may not) have any effect...


Revision tags: mjf-ufs-trans-base
# 1.114 01-Jul-2007 dsl

Check for SOL_SOCKET when checking for SCM_RIGHTS.


# 1.113 24-Jun-2007 dsl

Split sys_getpeername() and sys_getsockname() so they can be called when the
'name' is wanted in kernel code.
Similarly split sys_accept() and change the split in recvmsg() so that it
is useful to the compat functions, recvit() is removed and replaced by
do_sys_recvmsg().
Factor out the code that writes socked names to userspace (from mbuf) to
avoid replicated code.
Extract the code that writes socket 'control' (CMSG) data out to userspace,
being more careful about the 'fd' that may exist inside SCM_RIGHTS msgs.
(they still get lost if some of the latter copyout calls fail).
Since these are new functions, and old LKMs will fail to load.


# 1.112 02-Jun-2007 enami

- Fix obvious typos so that sendto(2) works.
- Wrap lines again.


# 1.111 01-Jun-2007 dsl

Split sys_bind() and sys_connect() so that compat code can use common code
once the 'address' has been copied into an mbuf.
Add extra flags for 'struct msghdr.msg_flags' to indicate that the address
and control are already in mbufs, and that the uio structure is in userspace
for sending data, rename sendit() to do_sys_sendmsg() to ensure no old code
passes in random flags.
Changes to compat code to use new functions - removing some stackgap use.
Fix a 'use after free' in compat_43_sys_recvmsg.
I ***THINK*** the code that converts 'cmsg' formatted data is borked!
svr4_stream.c ought to be generated from svr4_32_stream.c during the build.


Revision tags: yamt-idlelwp-base8
# 1.110 13-May-2007 dsl

Fallout from caddr_t deletion - remove a load of redundant (void *) casts.


# 1.109 18-Apr-2007 yamt

sys_accept: fix usecount botch and double soclose in rev.1.108.


# 1.108 15-Apr-2007 yamt

sys_accept: don't leak a socket on error.


Revision tags: thorpej-atomic-base
# 1.107 04-Mar-2007 christos

branches: 1.107.2; 1.107.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


Revision tags: ad-audiomp-base post-newlock2-merge
# 1.106 09-Feb-2007 ad

branches: 1.106.2;
Merge newlock2 to head.


Revision tags: newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base netbsd-4-base
# 1.105 01-Nov-2006 yamt

branches: 1.105.2;
remove some __unused from function parameters.


# 1.104 23-Oct-2006 elad

PR/34873: Ryo Shimizu: sendmsg() can cause kernel panic

Patch applied, thanks!


Revision tags: yamt-splraiseipl-base2
# 1.103 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


Revision tags: yamt-splraiseipl-base yamt-pdpolicy-base9 yamt-pdpolicy-base8 rpaulo-netinet-merge-pcb-base
# 1.102 22-Aug-2006 seanb

branches: 1.102.2; 1.102.4;
Don't leave a dangling socket (no associated struct file) if
user supplied a bad name or anamelen parameter to accept(2).
If bad paramaters were suplied and a copyout() failed, the
struct file was cleaned up but not the associated socket. This
could leave sockets in CLOSE_WAIT that could never be closed.


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base7
# 1.101 23-Jul-2006 ad

branches: 1.101.2;
Use the LWP cached credentials where sane.


# 1.100 26-Jun-2006 mrg

version the socket(2) syscall. for compat30 socket, we use
EPROTONOSUPPORT instead of EAFNOSUPPORT.

from pavel@ with a little bit of clean up from myself.

XXX: netbsd32 (and perhaps other emulations) should be able
XXX: to call the standard socket calls for this i think, but
XXX: revisit this at another time.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base yamt-pdpolicy-base5 chap-midi-base simonb-timecounters-base
# 1.99 16-May-2006 christos

branches: 1.99.4;
Don't set mature an fd that has been ffree'd


Revision tags: elad-kernelauth-base
# 1.98 11-May-2006 christos

Add MSG_NOSIGNAL (from FreeBSD)


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2 yamt-pdpolicy-base
# 1.97 01-Mar-2006 yamt

branches: 1.97.2; 1.97.4; 1.97.6;
merge yamt-uio_vmspace branch.

- use vmspace rather than proc or lwp where appropriate.
the latter is more natural to specify an address space.
(and less likely to be abused for random purposes.)
- fix a swdmover race.


Revision tags: yamt-uio_vmspace-base5
# 1.96 26-Dec-2005 perry

branches: 1.96.2; 1.96.4; 1.96.6;
u_intN_t -> uintN_t


# 1.95 11-Dec-2005 christos

merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base ktrace-lwp-base
# 1.94 03-Sep-2005 martin

In adjust_rights() Use CMSG_SPACE() to calculate the number of
filedescriptors passed in this message - the counterpart in
unp_externalize does this as well.

Note that CMSG_SPACE(0) does not make sense, since it does not invoke
the alignment magic - so use CMSG_SPACE(sizeof(int)) and adjust the
calculated total later.

This fixes the postfix conection cache for 64bit platforms. Previously
the number of passed filed descriptors (nfds) would have been
calculeted too high, causing the fdrelease() of uninitialized junk.


# 1.93 03-Sep-2005 martin

minor knf tweak


# 1.92 30-May-2005 martin

branches: 1.92.2;
Close additional file descriptors if we set MSG_CTRUNC in a SCM_RIGHTS
message. From der Mouse in PR kern/30370.


# 1.91 29-May-2005 christos

- add const.
- remove unnecessary casts.
- add __UNCONST casts and mark them with XXXUNCONST as necessary.


Revision tags: yamt-km-base4 yamt-km-base3 netbsd-3-base kent-audio2-base
# 1.90 26-Feb-2005 perry

branches: 1.90.2;
nuke trailing whitespace


Revision tags: yamt-km-base2 yamt-km-base kent-audio1-beforemerge kent-audio1-base
# 1.89 30-Nov-2004 christos

branches: 1.89.4; 1.89.6;
Cloning cleanup:
1. make fileops const
2. add 2 new negative errno's to `officially' support the cloning hack:
- EDUPFD (used to overload ENODEV)
- EMOVEFD (used to overload ENXIO)
3. Created an fdclone() function to encapsulate the operations needed for
EMOVEFD, and made all cloners use it.
4. Centralize the local noop/badop fileops functions to:
fnullop_fcntl, fnullop_poll, fnullop_kqfilter, fbadop_stat


# 1.88 22-May-2004 jonathan

Eliminate several uses of `curproc' from the socket-layer code and from NFS.

Add a new explicit `struct proc *p' argument to socreate(), sosend().
Use that argument instead of curproc. Follow-on changes to pass that
argument to socreate(), sosend(), and (*so->so_send)() calls.
These changes reviewed and independently recoded by Matt Thomas.

Changes to soreceive() and (*dom->dom_exernalize() from Matt Thomas:
pass soreceive()'s struct uio* uio->uio_procp to unp_externalize().
Eliminate curproc from unp_externalize. Also, now soreceive() uses
its uio->uio_procp value, pass that same value downward to
((pr->pru_usrreq)() calls for consistency, instead of (struct proc * )0.

Similar changes in sys/nfs to eliminate (most) uses of curproc,
either via the req-> r_procp field of a struct nfsreq *req argument,
or by passing down new explicit struct proc * arguments.

Reviewed by: Matt Thomas, posted to tech-kern.
NB: The (*pr->pru_usrreq)() change should be tested on more (all!) protocols.


# 1.87 18-May-2004 ragge

Fix connect() "bug": If connect() is interrupted by a signal, the connection
attempt is terminated, so if a process needs frequent timer interrupts
it can't ever connect() to a machine far away.

Bug found by Erik Lundgren, bugfix (for the same problem) is similar to
the way FreeBSD solved the same problem.

As a side effect, the new connect() behaviour conformes to Posix.


Revision tags: netbsd-2-0-base
# 1.86 29-Nov-2003 matt

branches: 1.86.2;
Restore a change that made AF_LOCAL sockets block on connect(2) until
accepted. However, this time this behavor is not the default. Instead
it must enabled by using the LOCAL_CONNWAIT socket option on either the
connecting or accepting socket.


# 1.85 29-Nov-2003 perry

Revert a change that altered the semantics of AF_LOCAL sockets. Sadly
this made us API incompatible with other Unixes.


# 1.84 13-Nov-2003 chs

eliminate uvm_useracc() in favor of checking the return value of
copyin() or copyout().

uvm_useracc() tells us whether the mapping permissions allow access to
the desired part of an address space, and many callers assume that
this is the same as knowing whether an attempt to access that part of
the address space will succeed. however, access to user space can
fail for reasons other than insufficient permission, most notably that
paging in any non-resident data can fail due to i/o errors. most of
the callers of uvm_useracc() make the above incorrect assumption. the
rest are all misguided optimizations, which optimize for the case
where an operation will fail. we'd rather optimize for operations
succeeding, in which case we should just attempt the access and handle
failures due to insufficient permissions the same way we handle i/o
errors. since there appear to be no good uses of uvm_useracc(), we'll
just remove it.


# 1.83 04-Sep-2003 matt

Adapt to the new calling conventions of unp_connect2


# 1.82 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.81 29-Jun-2003 fvdl

branches: 1.81.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.


# 1.80 28-Jun-2003 darrenr

Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V


# 1.79 05-Apr-2003 christos

PR/21030: Naoto Shimazaki: fcntl to accepted socket does not work properly


# 1.78 26-Feb-2003 matt

Remove leftover MBUFTRACE asserts.


# 1.77 26-Feb-2003 drochner

deactivate MBUFTRACE related KASSERT()s in the !MBUFTRACE case


# 1.76 26-Feb-2003 matt

Add MBUFTRACE kernel option.
Do a little mbuf rework while here. Change all uses of MGET*(*, M_WAIT, *)
to m_get*(M_WAIT, *). These are not performance critical and making them
call m_get saves considerable space. Add m_clget analogue of MCLGET and
make corresponding change for M_WAIT uses.
Modify netinet, gem, fxp, tulip, nfs to support MBUFTRACE.
Begin to change netstat to use sysctl.


# 1.75 18-Jan-2003 thorpej

Merge the nathanw_sa branch.


Revision tags: nathanw_sa_before_merge fvdl_fs64_base gmcgarry_ctxsw_base gmcgarry_ucred_base nathanw_sa_base
# 1.74 26-Nov-2002 christos

si_ -> sel_ to avoid conflicts with siginfo.


# 1.73 25-Nov-2002 itojun

no need for error check after MEXTMALLOC - jdolecek


# 1.72 25-Nov-2002 itojun

MEXTMALLOC() can fail even if M_WAITOK, if arg is too big for malloc().


Revision tags: kqueue-aftermerge
# 1.71 23-Oct-2002 jdolecek

merge kqueue branch into -current

kqueue provides a stateful and efficient event notification framework
currently supported events include socket, file, directory, fifo,
pipe, tty and device changes, and monitoring of processes and signals

kqueue is supported by all writable filesystems in NetBSD tree
(with exception of Coda) and all device drivers supporting poll(2)

based on work done by Jonathan Lemon for FreeBSD
initial NetBSD port done by Luke Mewburn and Jason Thorpe


Revision tags: kqueue-beforemerge kqueue-base
# 1.70 04-Sep-2002 matt

Use the queue macros from <sys/queue.h> instead of referring to the queue
members directly. Use *_FOREACH whenever possible.


Revision tags: gehenna-devsw-base
# 1.69 31-May-2002 itojun

support setsockopt() with larger data (up to MCLBYTES).
From: Hitoshi Asaeda <Hitoshi.Asaeda@sophia.inria.fr>


Revision tags: netbsd-1-6-PATCH002-RELEASE netbsd-1-6-PATCH002 netbsd-1-6-PATCH002-RC4 netbsd-1-6-PATCH002-RC3 netbsd-1-6-PATCH002-RC2 netbsd-1-6-PATCH002-RC1 netbsd-1-6-PATCH001 netbsd-1-6-PATCH001-RELEASE netbsd-1-6-PATCH001-RC3 netbsd-1-6-PATCH001-RC2 netbsd-1-6-PATCH001-RC1 netbsd-1-6-RELEASE netbsd-1-6-RC3 netbsd-1-6-RC2 netbsd-1-6-RC1 netbsd-1-6-base eeh-devprop-base newlock-base ifpoll-base
# 1.68 11-Feb-2002 jdolecek

branches: 1.68.8;
Switch default for pipes to the faster John S. Dyson's implementation.
Old, socketpair-based ones are available with option PIPE_SOCKETPAIR.


Revision tags: thorpej-mips-cache-base
# 1.67 12-Nov-2001 lukem

add RCSIDs


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2
# 1.66 16-Sep-2001 wiz

branches: 1.66.2;
Spell 'occurred' with two 'r's.


Revision tags: post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.65 17-Jul-2001 jdolecek

branches: 1.65.2;
Expel MSG_COMPAT/COMPAT_OLDSOCK, make the COMPAT_43 wrappers
arrange things as needed. Unfortunately, the check in sockargs()
have to stay, since 4.3BSD bind(2), connect(2) and sendto(2) were
not versioned at the time :(

This code was tested to pass regression tests.


# 1.64 01-Jul-2001 matt

branches: 1.64.2;
Use consistent types for len. Limit sockarg length to reasonable values.


# 1.63 25-Jun-2001 jdolecek

Back off the sendit()/recvit() change, some have problems with it


# 1.62 25-Jun-2001 jdolecek

sys_connect(): fix the call to FILE_UNUSE() so that it's done on return, rather
than immediatelly after getsock() call


# 1.61 25-Jun-2001 jdolecek

Add 'kernsa' parameter for sendit()/recvit(); if nonzero, msg->msg_name
is supposed to point directly to struct mbuf or struct sockaddr in kernel
space as appropriate, rather than being a pointer to memory in userland.

This is to be used by compat/* when emulation needs to wrap
send{to|msg}(2)/recv{from|msg}(2) and modify the passed struct
sockaddr.


# 1.60 16-Jun-2001 jdolecek

Add port of high performance pipe implementation written by John S. Dyson
for FreeBSD project. Besides huge speed boost compared with socketpair-based
pipes, this implementation also uses pagable kernel memory instead of mbufs.

Significant differences to FreeBSD version:
* uses uvm_loan() facility for direct write
* async/SIGIO handling correct also for sync writer, async reader
* limits settable via sysctl, amountpipekva and nbigpipes available via sysctl
* pipes are unidirectional - this is enforced on file descriptor level
for now only, the code would be updated to take advantage of it
eventually
* uses lockmgr(9)-based locks instead of home brew variant
* scatter-gather write is handled correctly for direct write case, data
is transferred by PIPE_DIRECT_CHUNK bytes maximum, to avoid running out of kva

All FreeBSD/NetBSD specific code is within appropriate #ifdef, in preparation
to feed changes back to FreeBSD tree.

This pipe implementation is optional for now, add 'options NEW_PIPE'
to your kernel config to use it.


# 1.59 14-Jun-2001 thorpej

Fix a partial construction problem that can cause race conditions
between creation of a file descriptor and close(2) when using kernel
assisted threads. What we do is stick descriptors in the table, but
mark them as "larval". This causes essentially everything to treat
it as a non-existent descriptor, except for fdalloc(), which sees a
filled slot so that it won't (incorrectly) allocate it again. When
a descriptor is fully constructed, the code that has constructed it
marks it as "mature" (which actually clears the "larval" flag), and
things continue to work as normal.

While here, gather all the code that gets a descriptor from the table
into a fd_getfile() function, and call it, rather than having the
same (sometimes incorrect) code copied all over the place.


# 1.58 06-May-2001 manu

implement the recently introduced EMUL_BSD_ASYNCIO_PIPE emulation package
flag.

EMUL_BSD_ASYNCIO_PIPE notes that the emulated binaries expect the original
BSD pipe behavior for asynchronous I/O, which is to fire SIGIO on read() and
write(). OSes without this flag do not expect any SIGIO to be fired on
read() and write() for pipes, even when async I/O was requested. As far as
we know, the OSes that need EMUL_BSD_ASYNCIO_PIPE are NetBSD, OSF/1 and
Darwin.


Revision tags: thorpej_scsipi_beforemerge thorpej_scsipi_nbase thorpej_scsipi_base
# 1.57 27-Feb-2001 lukem

branches: 1.57.2;
convert to ANSI KNF


# 1.56 10-Dec-2000 fvdl

Make sobind() take a struct proc *. It already took curproc and
passed it down to the appropriate usrreq function, and this
allows usage for contexts that need to be explicitly different
from curproc (like in the NFS code when binding to a reserved port).


# 1.55 24-Nov-2000 jdolecek

define COMPAT_OLDSOCK unconditionally - the code is used virtually for all
emulations besides NetBSD, and this way it's LKM-safe


# 1.54 02-Aug-2000 thorpej

MALLOC()/FREE() are not to be used for variable sized allocations.


# 1.53 27-Jun-2000 mrg

remove include of <vm/vm.h>


Revision tags: netbsd-1-5-RELEASE netbsd-1-5-BETA2 netbsd-1-5-BETA netbsd-1-5-ALPHA2 netbsd-1-5-base minoura-xpg4dl-base
# 1.52 27-May-2000 sommerfeld

branches: 1.52.4;
Reduce use of curproc in several places:

- Change ktrace interface to pass in the current process, rather than
p->p_tracep, since the various ktr* function need curproc anyway.

- Add curproc as a parameter to mi_switch() since all callers had it
handy anyway.

- Add a second proc argument for inferior() since callers all had
curproc handy.

Also, miscellaneous cleanups in ktrace:

- ktrace now always uses file-based, rather than vnode-based I/O
(simplifies, increases type safety); eliminate KTRFLAG_FD & KTRFAC_FD.
Do non-blocking I/O, and yield a finite number of times when receiving
EWOULDBLOCK before giving up.

- move code duplicated between sys_fktrace and sys_ktrace into ktrace_common.

- simplify interface to ktrwrite()


# 1.51 30-Mar-2000 augustss

Get rid of register declarations.


# 1.50 23-Mar-2000 thorpej

Implement fdremove() which is used in place of all the code that
did the "fdp->fd_ofiles[fd] = 0" assignment; fdremove() make sure
the fd_freefiles hints stay in sync.

From OpenBSD.


Revision tags: chs-ubc2-newbase wrstuden-devbsize-19991221 wrstuden-devbsize-base comdex-fall-1999-base fvdl-softdep-base
# 1.49 05-Nov-1999 mycroft

branches: 1.49.2;
Fix recent bug in sys_accept(): we must remove the file descriptor from the
file descriptor table before freeing the file description.


# 1.48 30-Oct-1999 enami

back out unnecessary stylistic changes in recent changes, to keep coding
style closer to NKF.


# 1.47 27-Oct-1999 jdolecek

minor cleanup of previous - avoid goto and code duplication


# 1.46 27-Oct-1999 darrenr

patch from Greg A. Woods to fix panic problems with code that attempts to
recover from failures to accept a socket successfully. Problem suggested
by this:
> It would appear (from two "panic: closef: count < 0" failures in less
> than 12 hours) that Darren's fix to accept(2) for lost file descriptors
> isn't quite correct. His fix inserts a call to closef() to handle one
> of several possible error conditions. However everywhere else in the
> socket code in the same file where falloc() cleanup is necessary the
> function used is ffree().


Revision tags: chs-ubc2-base
# 1.45 01-Jul-1999 itojun

branches: 1.45.2; 1.45.4; 1.45.6;
IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.


# 1.44 01-Jul-1999 darrenr

fix sys_accept() to return EOPNOTSUPP for protocols which don't support
listen/accept (PR_LISTEN flag in protosw) and detect obvious faults in
parameters passed. It is still possible for the address used for copying
the socket information to become invalid between that check and the copyout
so close the connection's allocated fd if the copyout fails so that we can
return EFAULT without allocating an fd and the application not knowing about
it. Ideally we'd be able to queue the connection back up so a later accept
could retrieve it but unfortunately that's not possible.


# 1.43 05-May-1999 thorpej

Add "use counting" to file entries. When closing a file, and it's reference
count is 0, wait for use count to drain before finishing the close.

This is necessary in order for multiple processes to safely share file
descriptor tables.


# 1.42 30-Apr-1999 cgd

add checks for COMPAT_OSF1 in the appropriate places


Revision tags: netbsd-1-4-RELEASE netbsd-1-4-base
# 1.41 10-Feb-1999 kleink

branches: 1.41.2; 1.41.4; 1.41.6;
* Due to addition and use of socklen_t, make the socket option and address
arguments passed to accept(), bind(), connect(), getpeername(), getsockname(),
getsockopt(), recvfrom(), sendto() and sendmsg() unsigned, which also elimiates
a few casts.
* Reflect the (now) signedness of msg_iovlen, which necessiates the addition
of a few casts.


# 1.40 18-Dec-1998 drochner

solve the COMPAT_OLDSOCK/MSG_COMPAT problem differently:
The source files which need MSG_COMPAT define COMPAT_OLDSOCK.


Revision tags: kenh-if-detach-base
# 1.39 26-Nov-1998 mycroft

Revert the functional change in rev 1.38; permit a msg_iovlen of 0.
There are two reasons for this:
* We should be able to pass file descriptors without sending any data.
* We could send zero-length iovecs anyway (but we shouldn't have to do this).
Also, msg_iovlen is already a u_int, so delete a bunch of casts.


Revision tags: chs-ubc-base
# 1.38 04-Aug-1998 kleink

Per XNS Issue 5, calling recvmsg(2) or sendmsg(2) with an msg.msg_iovlen less
than or equal to 0 shall fail with EMSGSIZE; the latter condition was not being
checked for. Also, document the msg.msg_iovlen > {IOV_MAX} case.


# 1.37 04-Aug-1998 kleink

UIO_MAXIOV -> IOV_MAX


# 1.36 04-Aug-1998 perry

Abolition of bcopy, ovbcopy, bcmp, and bzero, phase one.
bcopy(x, y, z) -> memcpy(y, x, z)
ovbcopy(x, y, z) -> memmove(y, x, z)
bcmp(x, y, z) -> memcmp(x, y, z)
bzero(x, y) -> memset(x, 0, y)


# 1.35 03-Aug-1998 kleink

Fix two off-by-one bugs, both present in each recvmsg(2) and sendmsg(2):
* the first one would cause an unnecessary malloc() of iovec storage for
a msg_iovlen of UIO_SMALLIOV although the required amount of memory has
been allocated on the stack.
* the second one would cause a recvmsg() or sendmsg() with a msg_iovlen of
UIO_MAXIOV to fail with EMSGSIZE, which is also a violation of XNS5.


# 1.34 31-Jul-1998 perry

fix sizeofs so they comply with the KNF style guide. yes, it is pedantic.


Revision tags: eeh-paddr_t-base
# 1.33 29-Jul-1998 thorpej

branches: 1.33.2;
When checking for overflow in the residual count, test against SSIZE_MAX.
The read/write system calls return ssize_t because -1 is used to indicate
error, therefore the transfer size MUST be limited to SSIZE_MAX, otherwise
garbage can be returned to the user.

There is NO change from existing behavior here, only a more precise
definition of that the semantics are, except in the Alpha case, where
the full SSIZE_MAX transfer size can now be realized (ssize_t is 64-bit
on the Alpha).


# 1.32 18-Jul-1998 lukem

use AF_LOCAL instead of AF_UNIX


# 1.31 25-Jun-1998 thorpej

defopt KTRACE


# 1.30 25-Apr-1998 matt

Hook for 0-copy (or other optimized) sends and receives


# 1.29 01-Mar-1998 fvdl

Merge with Lite2 + local changes


# 1.28 06-Feb-1998 thorpej

When copying out multiple control messages, ensure that the next control
message is aligned. From David Borman <dab@bsdi.com>.


# 1.27 07-Jan-1998 thorpej

Make insertion and removal of sockets from the partial and incoming
connections queues O(C) rather than O(N).


# 1.26 07-Jan-1998 thorpej

Fix bug in recvit() that would cause recvmsg() to only receive one
control message, even if there were multiple control messages on
the queue. From Jean-Luc Richier <Jean-Luc.Richier@imag.fr>, in
bug report kern/4700.


Revision tags: netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base thorpej-signal-base marc-pcmcia-bp marc-pcmcia-base
# 1.25 26-Jun-1997 thorpej

branches: 1.25.8;
Use UCHAR_MAX rather than "255" when sanity-checking the length of a
sockaddr in sockargs().


# 1.24 26-Jun-1997 thorpej

In sockargs():
- Add a comment describing my feelings about this interface, in general.
- Remove the COMPAT_OLDSOCK length hack. Instead, if the socket argument
is too long to fit in an mbuf, allocate enough external storage to
hold it.
- If the socket argument is a sockaddr, don't allow the length to be
greater than 255, as that would overflow sa_len.
Many thanks to enami tsugutomo <enami@cv.sony.co.jp> for his sanity checking.


Revision tags: is-newarp-before-merge is-newarp-base
# 1.23 22-Dec-1996 cgd

* catch up with system call argument type fixups/const poisoning.
* Fix arguments to various copyin()/copyout() invocations, to avoid
gratuitous casts.
* Some KNF formatting fixes
* Change sockargs()'s second argument to be a const void *, to help
with dealing with the syscall argument type fixups/const poisoning.


# 1.22 14-Jun-1996 cgd

avoid unnecessary checks of m_get/MGET/etc.'s return values. When
they're called with M_WAIT, they are defined to never return NULL.


# 1.21 22-May-1996 mycroft

Pass a proc pointer down to the usrreq and pcbbind functions for PRU_ATTACH, PRU_BIND and
PRU_CONTROL. The usrreq interface really needs to be split up, but this will have to wait.
Remove SS_PRIV completely.


Revision tags: netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.20 17-May-1996 pk

branches: 1.20.4;
Don't touch retval[] in socketpair(); manual page says this system call
return 0 on success (PR#2428).


# 1.19 09-Feb-1996 christos

More proto fixes


# 1.18 04-Feb-1996 christos

First pass at prototyping


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.17 10-Oct-1995 mycroft

Add hooks for COMPAT_FREEBSD, from Noriyuki Soda.


# 1.16 07-Oct-1995 mycroft

Prefix names of system call implementation functions with `sys_'.


# 1.15 19-Sep-1995 thorpej

Make system calls conform to a standard prototype and bring those
prototypes into scope.


# 1.14 12-Aug-1995 mycroft

splnet --> splsoftnet


# 1.13 24-Jun-1995 christos

Extracted all of the compat_xxx routines, and created a library [libcompat]
for them. There are a few #ifdef COMPAT_XX remaining, but they are not easy
or worth eliminating (yet).


# 1.12 10-May-1995 christos

tty_tb.c: need to include ioctl_compat.h in order to compile.
sysv_shm.c: make shm_find_segment_by_shmid global so it can be used by
COMPAT_HPUX. There should be a better way...
rest: Add #ifdef COMPAT_HPUX where needed


# 1.11 05-Mar-1995 fvdl

Extended a couple of defines with "|| defined(COMPAT_LINUX)" to make
things compile without requiring COMPAT_43 and/or COMPAT_09.


# 1.10 30-Oct-1994 cgd

be more careful with types, also pull in headers where necessary.


# 1.9 20-Oct-1994 cgd

update for new syscall args description mechanism


Revision tags: netbsd-1-0-PATCH06 netbsd-1-0-PATCH05 netbsd-1-0-PATCH04 netbsd-1-0-PATCH03 netbsd-1-0-PATCH02 netbsd-1-0-PATCH1 netbsd-1-0-PATCH0 netbsd-1-0-RELEASE netbsd-1-0-base
# 1.8 29-Jun-1994 cgd

New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.7 04-May-1994 mycroft

Add return types where missing. Simplify some of the compat conditionals.
Include compat code if COMPAT_SUNOS with or without COMPAT_43.


# 1.6 18-Dec-1993 mycroft

Canonicalize all #includes.


Revision tags: magnum-base netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.5 17-Jul-1993 mycroft

branches: 1.5.4;
Finish moving struct definitions outside of function declarations.


# 1.4 27-Jun-1993 andrew

* ansifications
* Yuval Yarom's socket recv(2) fixes - access rights problems (see also
uipc_socket.c).


# 1.3 22-May-1993 cgd

add include of select.h if necessary for protos, or delete if extraneous


# 1.2 18-May-1993 cgd

make kernel select interface be one-stop shopping & clean it all up.


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


# 1.187 20-Jun-2017 christos

Change len type to be unsigned int for consistency with the input type.
Don't check for negative; it does not matter we clamp anyway. This
broke the compat32 getsockname() where an unitialized socklen_t ended
up randomly negative causing it to fail.


Revision tags: netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204
# 1.186 03-Feb-2017 christos

expose sendmsg_so and recvmsg_so.


# 1.185 02-Feb-2017 christos

expose copyout_sockname_sb


Revision tags: bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204
# 1.184 03-Dec-2016 christos

branches: 1.184.2;
Add missing ktrkuser


Revision tags: pgoyette-localcount-20161104 nick-nhusb-base-20161004 localcount-20160914
# 1.183 13-Sep-2016 martin

Make the ktrace record written by do_sys_sendmsg/do_sys_recvmsg overridable
by the caller. Use this in compat_netbsd32 to log the 32bit version, so
the 32bit userland kdump is happy.


Revision tags: pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.182 07-Jul-2016 msaitoh

branches: 1.182.2;
KNF. Remove extra spaces. No functional change.


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.181 01-Nov-2015 christos

Don't overwrite the user iov pointer in sendmmsg. Make the send and receive
code look the same.


Revision tags: nick-nhusb-base-20150921
# 1.180 24-Aug-2015 pooka

sprinkle _KERNEL_OPT


# 1.179 22-Jul-2015 maxv

Memory leak. Triggerable from an unprivileged user via COMPAT_43.


Revision tags: nick-nhusb-base-20150606
# 1.178 09-May-2015 rtr

change sosend() to accept sockaddr * instead of mbuf * for nam.

bump to 7.99.16


# 1.177 02-May-2015 rtr

make connect syscall use sockaddr_big and modify pr_{send,connect}
nam parameter type from buf * to sockaddr *.

final commit for parameter type changes to protocol user requests

* bump kernel version to 7.99.15 for parameter type changes to pr_{send,connect}


# 1.176 24-Apr-2015 rtr

make accept, getsockname and getpeername syscalls use sockaddr_big and modify
pr_{accept,sockname,peername} nam parameter type from mbuf * to sockaddr *.

* retained use of mbuftypes[MT_SONAME] for now.
* bump to netbsd version 7.99.12 for parameter type change.

patch posted to tech-net@ 2015/04/19


Revision tags: nick-nhusb-base-20150406
# 1.175 03-Apr-2015 rtr

* change pr_bind to accept struct sockaddr * instead of struct mbuf *
* update protocol bind implementations to use/expect sockaddr *
instead of mbuf *
* introduce sockaddr_big struct for storage of addr data passed via
sys_bind; sockaddr_big is of sufficient size and alignment to
accommodate all addr data sizes received.
* modify sys_bind to allocate sockaddr_big instead of using an mbuf.
* bump kernel version to 7.99.9 for change to pr_bind() parameter type.

Patch posted to tech-net@
http://mail-index.netbsd.org/tech-net/2015/03/15/msg005004.html

The choice to use a new structure sockaddr_big has been retained since
changing sockaddr_storage size would lead to unnecessary ABI change. The
use of the new structure does not preclude future work that increases
the size of sockaddr_storage and at that time sockaddr_big may be
trivially replaced.

Tested by mrg@ and myself, discussed with rmind@, posted to tech-net@


# 1.174 06-Mar-2015 rtr

Return EINVAL if namelen isn't large enough to encompass the expected
members of sockaddr structures. i.e. sa_len and sa_family.

Discussed with and patch by christos@


Revision tags: nick-nhusb-base
# 1.173 05-Sep-2014 matt

branches: 1.173.2;
Try not to use f_data, use f_{vnode,socket,pipe,mqueue,kqueue,ksem} to get
a correctly typed pointer.


Revision tags: netbsd-7-0-RC2 netbsd-7-0-RC1 netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.172 09-Aug-2014 rtr

branches: 1.172.2;
split PRU_CONNECT2 & PRU_PURGEIF function out of pr_generic() usrreq
switches and put into separate functions

- always KASSERT(solocked(so)) even if not implemented
(for PRU_CONNECT2 only)

- replace calls to pr_generic() with req = PRU_CONNECT2 with calls to
pr_connect2()

- replace calls to pr_generic() with req = PRU_PURGEIF with calls to
pr_purgeif()

put common code from unp_connect2() (used by unp_connect() into
unp_connect1() and call out to it when needed

patch only briefly reviewed by rmind@


# 1.171 09-Jul-2014 rtr

* split PRU_PEERADDR and PRU_SOCKADDR function out of pr_generic()
usrreq switches and put into separate functions
xxx_{peer,sock}addr(struct socket *, struct mbuf *).

- KASSERT(solocked(so)) always in new functions even if request
is not implemented

- KASSERT(pcb != NULL) and KASSERT(nam) if the request is
implemented and not for tcp.

* for tcp roll #ifdef KPROF and #ifdef DEBUG code from tcp_usrreq() into
easier to cut & paste functions tcp_debug_capture() and
tcp_debug_trace()

- functions provided by rmind
- remaining use of PRU_{PEER,SOCK}ADDR #define to be removed in a
future commit.

* rename netbt functions to permit consistency of pru function names
(as has been done with other requests already split out).

- l2cap_{peer,sock}addr() -> l2cap_{peer,sock}_addr_pcb()
- rfcomm_{peer,sock}addr() -> rfcomm_{peer,sock}_addr_pcb()
- sco_{peer,sock}addr() -> sco_{peer,sock}_addr_pcb()

* split/refactor do_sys_getsockname(lwp, fd, which, nam) into
two functions do_sys_get{peer,sock}name(fd, nam).

- move PRU_PEERADDR handling into do_sys_getpeername() from
do_sys_getsockname()
- have svr4_stream directly call do_sys_get{sock,peer}name()
respectively instead of providing `which' & fix a DPRINTF string
that incorrectly wrote "getpeername" when it meant "getsockname"
- fix sys_getpeername() and sys_getsockname() to call
do_sys_get{sock,peer}name() without `which' and `lwp' & adjust
comments
- bump kernel version for removal of lwp & which parameters from
do_sys_getsockname()

note: future cleanup to remove struct mbuf * abuse in
xxx_{peer,sock}name()
still to come, not done in this commit since it is easier to do post
split.

patch reviewed by rmind

welcome to 6.99.47


Revision tags: rmind-smpnet-nbase rmind-smpnet-base
# 1.170 18-May-2014 rmind

Add struct pr_usrreqs with a pr_generic function and prepare for the
dismantling of pr_usrreq in the protocols; no functional change intended.
PRU_ATTACH/PRU_DETACH changes will follow soon.

Bump for struct protosw. Welcome to 6.99.62!


# 1.169 17-May-2014 rmind

- fsocreate: set SS_NBIO before the file descriptor is affixed as there is
a theoretical race condition (hard to trigger, though); remove the LWP
parameter and clean up the code a little.
- Sprinkle few comments.
- Remove M_SOOPTS while here.


# 1.168 17-May-2014 rmind

makesocket: set SS_NBIO slightly earlier.


# 1.167 17-May-2014 rmind

Remove trailing whitespaces, wrap long lines, minor KNF; no functional changes.


Revision tags: yamt-pagecache-base9
# 1.166 07-Apr-2014 seanb

Fix a case where an erroneous EAGAIN was returned out of recvmmsg.
This occured when some, but not all of the mmsg array members
were filled with data from a non-blocking socket.
PR kern/48725


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.165 09-Oct-2013 christos

branches: 1.165.2;
delete extra m_len initialization.


# 1.164 09-Oct-2013 christos

PR/48292: Justin Cormack: paccept creates sockets that cannot be made blocking
Reset the socket flags not just the file flags for non-blocking I/O.
XXX: pullup 6


# 1.163 08-Oct-2013 christos

PR/47591: Michael Plass: If the unix socket is closed before accept,
unp->unp_conn will be NULL in PRU_ACCEPT, as called from
sys_accept->so_accept. This will cause the usrreq to return with
no error, leaving the mbuf gotten from m_get() with an uninitialized
length, containing junk from a previous call. Initialize m_len to
be 0 to handle this case. This is yet another reason why Beverly's
idea of setting m_len = 0 in m_get() makes a lot of sense. Arguably
this could be an error, since the data we return now has 0 family
and length.


# 1.162 29-Aug-2013 rmind

Remove SS_ISCONFIRMING, it is unused and TP4 will not come back.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.161 03-Jun-2013 christos

branches: 1.161.2;
use the proper name for kdump pretty-printing.


Revision tags: agc-symver-base
# 1.160 14-Feb-2013 christos

PR/47569: Valery Ushakov: SOCK_NONBLOCK does not work because it does not
set SS_NBIO.
XXX: there are too many flags that mean the same thing in too many places,
and too many flags that mean the same thing and are different.


# 1.159 14-Feb-2013 riastradh

Fix some screw cases in cmsg file descriptor passing.

- Don't leave garbage in the control buffer if allocating file
descriptors fails in unp_externalize.

- Scrub the space between CMSG_LEN and CMSG_SPACE to avoid kernel
memory disclosure in unp_externalize.

- Don't read past cmsg_len when closing file descriptors that
couldn't get delivered, in free_rights.

ok christos


Revision tags: yamt-pagecache-base8
# 1.158 29-Dec-2012 mlelstv

The sanity check prevented messages that carry only ancillary data.


# 1.157 29-Dec-2012 mlelstv

If an untraced process sleeps in recvmsg/sendmsg, the syscall does not
allocate an iov structure for ktrace. When tracing is then enabled
and the process wakes up, it crashes the kernel.

Undo the last commit which introduced this error path.

Avoid the mentioned kmem_alloc assertion by adding a sanity check analog
to similar code in sys_generic.c for I/O on file handles instead of
sockets.

This also causes the syscall to return EMSGSIZE if the msg_iovlen member
of the msg structure is less than or equal to 0, as defined in
recvmsg(2)/sendmsg(2).


Revision tags: yamt-pagecache-base7 yamt-pagecache-base6
# 1.156 17-Jul-2012 njoly

branches: 1.156.2;
Avoid kmem_alloc KASSERT for 0 byte allocation, when tracing processes
that use empty messages with sendmsg/recvmsg.


# 1.155 22-Jun-2012 christos

Add {send,recv}mmsg from Linux


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2 netbsd-6-base
# 1.154 25-Jan-2012 christos

branches: 1.154.2;
revert atomics for so_options since it is a short.


# 1.153 25-Jan-2012 christos

need <sys/atomic.h>


# 1.152 25-Jan-2012 christos

Add locking, requested by yamt. Note that locking is not used everywhere
for these.


# 1.151 25-Jan-2012 christos

As discussed in tech-kern, provide the means to prevent delivery of SIGPIPE
on EPIPE for all file descriptor types:

- provide O_NOSIGPIPE for open,kqueue1,pipe2,dup3,fcntl(F_{G,S}ETFL) [NetBSD]
- provide SOCK_NOSIGPIPE for socket,socketpair [NetBSD]
- provide SO_NOSIGPIPE for {g,s}seckopt [NetBSD/FreeBSD/MacOSX]
- provide F_{G,S}ETNOSIGPIPE for fcntl [MacOSX]


# 1.150 21-Dec-2011 christos

simplify expression


# 1.149 20-Dec-2011 christos

- Eliminate so_nbio and turn it into a bit SS_NBIO in so_state.
- Introduce MSG_NBIO so that we can turn non blocking i/o on a per call basis
- Use MSG_NBIO to fix the XXX: multi-threaded issues on the fifo sockets.
- Don't set SO_CANTRCVMORE, if we were interrupted (perhaps do it for all
errors?).


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2
# 1.148 04-Nov-2011 christos

branches: 1.148.4;
Fix error I introduced in previous commit that caused asymmetric connects
when SOCK_NONBLOCK or SOCK_CLOEXEC was specified. Factor out common code
and simplify error return.


Revision tags: yamt-pagecache-base
# 1.147 21-Sep-2011 christos

branches: 1.147.2;
Put the mbuf type in the ktrace record so that we know how to decode it
in userland.


# 1.146 27-Jul-2011 uebayasi

These don't need uvm/uvm_extern.h.


# 1.145 15-Jul-2011 christos

fail with EINVAL if flags not are not O_CLOEXEC|O_NONBLOCK in pipe2(2) and
dup3(2)


# 1.144 26-Jun-2011 christos

* Arrange for interfaces that create new file descriptors to be able to
set close-on-exec on creation (http://udrepper.livejournal.com/20407.html).

- Add F_DUPFD_CLOEXEC to fcntl(2).
- Add MSG_CMSG_CLOEXEC to recvmsg(2) for unix file descriptor passing.
- Add dup3(2) syscall with a flags argument for O_CLOEXEC, O_NONBLOCK.
- Add pipe2(2) syscall with a flags argument for O_CLOEXEC, O_NONBLOCK.
- Add flags SOCK_CLOEXEC, SOCK_NONBLOCK to the socket type parameter
for socket(2) and socketpair(2).
- Add new paccept(2) syscall that takes an additional sigset_t to alter
the sigmask temporarily and a flags argument to set SOCK_CLOEXEC,
SOCK_NONBLOCK.
- Add new mode character 'e' to fopen(3) and popen(3) to open pipes
and file descriptors for close on exec.
- Add new kqueue1(2) syscall with a new flags argument to open the
kqueue file descriptor with O_CLOEXEC, O_NONBLOCK.

* Fix the system calls that take socklen_t arguments to actually do so.

* Don't include userland header files (signal.h) from system header files
(rump_syscallargs.h).

* Bump libc version for the new syscalls.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.143 24-Apr-2011 rmind

- Replace few malloc(9) uses with kmem(9).
- Rename buf_malloc() to buf_alloc(), fix comments.
- Remove some unnecessary inclusions.


# 1.142 10-Apr-2011 christos

- Add O_CLOEXEC to open(2)
- Add fd_set_exclose() to encapsulate uses of FIO{,N}CLEX, O_CLOEXEC, F{G,S}ETFD
- Add a pipe1() function to allow passing flags to the fd's that pipe(2)
opens to ease implementation of linux pipe2(2)
- Factor out fp handling code from open(2) and fhopen(2)


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base matt-mips64-premerge-20101231 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10 uebayasi-xip-base1
# 1.141 23-Apr-2010 rmind

branches: 1.141.2;
Replace M_IOV and some malloc(9)s with kmem(9), and while there:
- Fix invalid free (M_TEMP vs M_IOV) in do_sys_recvmsg(), spotted by jakllsch@.
Also, same fix in osf1_sys_sendmsg_xopen().
- Fix attempt to free non-allocated memory in error path in netbsd32___getfh30().
- Plug a memory leak in compat_43_netbsd32_orecvmsg().


Revision tags: yamt-nfs-mp-base9 uebayasi-xip-base
# 1.140 21-Jan-2010 pgoyette

branches: 1.140.2; 1.140.4;
Remove unnecessary call to kauth_cred_free().

This resolves an occassional crash I'd been experiencing as reported on
current-users@

Fix suggested by and OK elad@


# 1.139 29-Dec-2009 elad

Add credentials to to sockets.

We don't need any deferred free etc. because we no longer free the
credentials in interrupt context.

Tons of help from matt@, thanks!


# 1.138 20-Dec-2009 dsl

If a multithreaded app closes an fd while another thread is blocked in
read/write/accept, then the expectation is that the blocked thread will
exit and the close complete.
Since only one fd is affected, but many fd can refer to the same file,
the close code can only request the fs code unblock with ERESTART.
Fixed for pipes and sockets, ERESTART will only be generated after such
a close - so there should be no change for other programs.
Also rename fo_abort() to fo_restart() (this used to be fo_drain()).
Fixes PR/26567


Revision tags: matt-premerge-20091211
# 1.137 09-Dec-2009 dsl

Rename fo_drain() to fo_abort(), 'drain' is used to mean 'wait for output
do drain' in many places, whereas fo_drain() was called in order to force
blocking read()/write() etc calls to return to userspace so that a close()
call from a different thread can complete.
In the sockets code comment out the broken code in the inner function,
it was being called from compat code.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 jym-xensuspend-nbase yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.136 04-Apr-2009 ad

Add fileops::fo_drain(), to be called from fd_close() when there is more
than one active reference to a file descriptor. It should dislodge threads
sleeping while holding a reference to the descriptor. Implemented only for
sockets but should be extended to pipes, fifos, etc.

Fixes the case of a multithreaded process doing something like the
following, which would have hung until the process got a signal.

thr0 accept(fd, ...)
thr1 close(fd)


Revision tags: nick-hppapmap-base2
# 1.135 21-Jan-2009 yamt

branches: 1.135.2;
restore the pre socket locking patch signal behaviour.
this fixes a busy-loop in nfs_connect.


Revision tags: netbsd-5-0-RC1 haad-dm-base2 haad-nbase2 ad-audiomp2-base netbsd-5-base matt-mips64-base2 haad-dm-base1 wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 haad-dm-base mjf-devfs2-base
# 1.134 06-Aug-2008 plunky

branches: 1.134.2; 1.134.4;
Convert socket options code to use a sockopt structure
instead of laying everything into an mbuf.

approved by core


Revision tags: simonb-wapbl-nbase simonb-wapbl-base
# 1.133 24-Jun-2008 ad

branches: 1.133.2;
Nothing uses getsock/getvnode any more.


Revision tags: wrstuden-revivesa-base-1 yamt-pf42-base4 yamt-pf42-base3 wrstuden-revivesa-base
# 1.132 30-May-2008 rmind

branches: 1.132.2;
do_sys_accept: release the reference to sock in few error paths.
Should fix PR/38790, report and test-case by Nicolas Joly.


Revision tags: hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2
# 1.131 28-Apr-2008 martin

branches: 1.131.2;
Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.130 24-Apr-2008 ad

branches: 1.130.2;
Network protocol interrupts can now block on locks, so merge the globals
proclist_mutex and proclist_lock into a single adaptive mutex (proc_lock).
Implications:

- Inspecting process state requires thread context, so signals can no longer
be sent from a hardware interrupt handler. Signal activity must be
deferred to a soft interrupt or kthread.

- As the proc state locking is simplified, it's now safe to take exit()
and wait() out from under kernel_lock.

- The system spends less time at IPL_SCHED, and there is less lock activity.


# 1.129 24-Apr-2008 ad

Merge the socket locking patch:

- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.

With much feedback from matt@ and plunky@.


Revision tags: yamt-pf42-baseX yamt-pf42-base ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14
# 1.128 21-Mar-2008 ad

branches: 1.128.2;
Catch up with descriptor handling changes. See kern_descrip.c revision
1.173 for details.


Revision tags: keiichi-mipv6-nbase nick-net80211-sync-base keiichi-mipv6-base matt-armv6-nbase mjf-devfs-base hpcarm-cleanup-base
# 1.127 06-Feb-2008 ad

branches: 1.127.6;
Don't lock the socket to set/clear FNONBLOCK. Just set it atomically.


Revision tags: vmlocking2-base3 bouyer-xeni386-nbase bouyer-xeni386-base matt-armv6-base
# 1.126 26-Dec-2007 ad

Merge more changes from vmlocking2, mainly:

- Locking improvements.
- Use pool_cache for more items.


# 1.125 20-Dec-2007 dsl

Convert all the system call entry points from:
int foo(struct lwp *l, void *v, register_t *retval)
to:
int foo(struct lwp *l, const struct foo_args *uap, register_t *retval)
Fixup compat code to not write into 'uap' and (in some cases) to actually
pass a correctly formatted 'uap' structure with the right name to the
next routine.
A few 'compat' routines that just call standard ones have been deleted.
All the 'compat' code compiles (along with the kernels required to test
build it).
98% done by automated scripts.


# 1.124 16-Dec-2007 elad

Make solisten() take an lwp pointer like the rest, so it can be passed down
to pr_usrreq.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 vmlocking-nbase jmcneill-pm-base reinoud-bufcleanup-base
# 1.123 24-Nov-2007 dyoung

branches: 1.123.2; 1.123.6;
Pass the mbuf type (e.g., MT_SONAME, MT_SOOPTS) as the second
argument to getsockmbuf().


Revision tags: jmcneill-base bouyer-xenamd64-base2 yamt-x86pmap-base4 bouyer-xenamd64-base yamt-x86pmap-base3 yamt-x86pmap-base2 vmlocking-base
# 1.122 05-Oct-2007 dyoung

branches: 1.122.4;
Use getsombuf().


Revision tags: yamt-x86pmap-base
# 1.121 19-Sep-2007 christos

branches: 1.121.2;
minor nits; no code change.


# 1.120 19-Sep-2007 dyoung

1) Introduce a new socket option, (SOL_SOCKET, SO_NOHEADER), that
tells a socket that it should both add a protocol header to tx'd
datagrams and remove the header from rx'd datagrams:

int onoff = 1, s = socket(...);
setsockopt(s, SOL_SOCKET, SO_NOHEADER, &onoff);

2) Add an implementation of (SOL_SOCKET, SO_NOHEADER) for raw IPv4
sockets.

3) Reorganize the protocols' pr_ctloutput implementations a bit.
Consistently return ENOPROTOOPT when an option is unsupported,
and EINVAL if a supported option's arguments are incorrect.
Reorganize the flow of code so that it's more clear how/when
options are passed down the stack until they are handled.

Shorten some pr_ctloutput staircases for readability.

4) Extract common mbuf code into subroutines, add new sockaddr
methods, and introduce a new subroutine, fsocreate(), for reuse
later; use it first in sys_socket():

struct mbuf *m_getsombuf(struct socket *so)

Create an mbuf and make its owner the socket `so'.

struct mbuf *m_intopt(struct socket *so, int val)

Create an mbuf, make its owner the socket `so', put the
int `val' into it, and set its length to sizeof(int).


int fsocreate(..., int *fd)

Create a socket, a la socreate(9), put the socket into the
given LWP's descriptor table, return the descriptor at `fd'
on success.

void *sockaddr_addr(struct sockaddr *sa, socklen_t *slenp)
const void *sockaddr_const_addr(const struct sockaddr *sa, socklen_t *slenp)

Extract a pointer to the address part of a sockaddr. Write
the length of the address part at `slenp', if `slenp' is
not NULL.

socklen_t sockaddr_getlen(const struct sockaddr *sa)

Return the length of a sockaddr. This just evaluates to
sa->sa_len. I only add this for consistency with code that
appears in a portable userland library that I am going to
import.

const struct sockaddr *sockaddr_any(const struct sockaddr *sa)

Return the "don't care" sockaddr in the same family as
`sa'. This is the address a client should sobind(9) if it
does not care the source address and, if applicable, the
port et cetera that it uses.

const void *sockaddr_anyaddr(const struct sockaddr *sa, socklen_t *slenp)

Return the "don't care" sockaddr in the same family as
`sa'. This is the address a client should sobind(9) if it
does not care the source address and, if applicable, the
port et cetera that it uses.


Revision tags: nick-csl-alignment-base5
# 1.119 06-Sep-2007 rmind

do_sys_sendmsg: Plug a possible leak.
From CID: 4535


# 1.118 01-Sep-2007 dsl

Don't error calls to copy socket addresses to userspace when the application
has provided a non-null buffer pointer and a zero length.


# 1.117 27-Aug-2007 dsl

ktrace socket control structures (ie msghdr, address etc) using ktrkuser().


# 1.116 15-Aug-2007 ad

branches: 1.116.2;
Changes to make ktrace LKM friendly and reduce ifdef KTRACE. Proposed
on tech-kern.


Revision tags: matt-mips64-base nick-csl-alignment-base
# 1.115 15-Jul-2007 dsl

branches: 1.115.2; 1.115.6;
Remove non-user flags (especially MSG_IOVUSRSPACE) from mp->msg_flags
before passing to so_receive.
This may (or may not) have any effect...


Revision tags: mjf-ufs-trans-base
# 1.114 01-Jul-2007 dsl

Check for SOL_SOCKET when checking for SCM_RIGHTS.


# 1.113 24-Jun-2007 dsl

Split sys_getpeername() and sys_getsockname() so they can be called when the
'name' is wanted in kernel code.
Similarly split sys_accept() and change the split in recvmsg() so that it
is useful to the compat functions, recvit() is removed and replaced by
do_sys_recvmsg().
Factor out the code that writes socked names to userspace (from mbuf) to
avoid replicated code.
Extract the code that writes socket 'control' (CMSG) data out to userspace,
being more careful about the 'fd' that may exist inside SCM_RIGHTS msgs.
(they still get lost if some of the latter copyout calls fail).
Since these are new functions, and old LKMs will fail to load.


# 1.112 02-Jun-2007 enami

- Fix obvious typos so that sendto(2) works.
- Wrap lines again.


# 1.111 01-Jun-2007 dsl

Split sys_bind() and sys_connect() so that compat code can use common code
once the 'address' has been copied into an mbuf.
Add extra flags for 'struct msghdr.msg_flags' to indicate that the address
and control are already in mbufs, and that the uio structure is in userspace
for sending data, rename sendit() to do_sys_sendmsg() to ensure no old code
passes in random flags.
Changes to compat code to use new functions - removing some stackgap use.
Fix a 'use after free' in compat_43_sys_recvmsg.
I ***THINK*** the code that converts 'cmsg' formatted data is borked!
svr4_stream.c ought to be generated from svr4_32_stream.c during the build.


Revision tags: yamt-idlelwp-base8
# 1.110 13-May-2007 dsl

Fallout from caddr_t deletion - remove a load of redundant (void *) casts.


# 1.109 18-Apr-2007 yamt

sys_accept: fix usecount botch and double soclose in rev.1.108.


# 1.108 15-Apr-2007 yamt

sys_accept: don't leak a socket on error.


Revision tags: thorpej-atomic-base
# 1.107 04-Mar-2007 christos

branches: 1.107.2; 1.107.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


Revision tags: ad-audiomp-base post-newlock2-merge
# 1.106 09-Feb-2007 ad

branches: 1.106.2;
Merge newlock2 to head.


Revision tags: newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base netbsd-4-base
# 1.105 01-Nov-2006 yamt

branches: 1.105.2;
remove some __unused from function parameters.


# 1.104 23-Oct-2006 elad

PR/34873: Ryo Shimizu: sendmsg() can cause kernel panic

Patch applied, thanks!


Revision tags: yamt-splraiseipl-base2
# 1.103 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


Revision tags: yamt-splraiseipl-base yamt-pdpolicy-base9 yamt-pdpolicy-base8 rpaulo-netinet-merge-pcb-base
# 1.102 22-Aug-2006 seanb

branches: 1.102.2; 1.102.4;
Don't leave a dangling socket (no associated struct file) if
user supplied a bad name or anamelen parameter to accept(2).
If bad paramaters were suplied and a copyout() failed, the
struct file was cleaned up but not the associated socket. This
could leave sockets in CLOSE_WAIT that could never be closed.


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base7
# 1.101 23-Jul-2006 ad

branches: 1.101.2;
Use the LWP cached credentials where sane.


# 1.100 26-Jun-2006 mrg

version the socket(2) syscall. for compat30 socket, we use
EPROTONOSUPPORT instead of EAFNOSUPPORT.

from pavel@ with a little bit of clean up from myself.

XXX: netbsd32 (and perhaps other emulations) should be able
XXX: to call the standard socket calls for this i think, but
XXX: revisit this at another time.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base yamt-pdpolicy-base5 chap-midi-base simonb-timecounters-base
# 1.99 16-May-2006 christos

branches: 1.99.4;
Don't set mature an fd that has been ffree'd


Revision tags: elad-kernelauth-base
# 1.98 11-May-2006 christos

Add MSG_NOSIGNAL (from FreeBSD)


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2 yamt-pdpolicy-base
# 1.97 01-Mar-2006 yamt

branches: 1.97.2; 1.97.4; 1.97.6;
merge yamt-uio_vmspace branch.

- use vmspace rather than proc or lwp where appropriate.
the latter is more natural to specify an address space.
(and less likely to be abused for random purposes.)
- fix a swdmover race.


Revision tags: yamt-uio_vmspace-base5
# 1.96 26-Dec-2005 perry

branches: 1.96.2; 1.96.4; 1.96.6;
u_intN_t -> uintN_t


# 1.95 11-Dec-2005 christos

merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base ktrace-lwp-base
# 1.94 03-Sep-2005 martin

In adjust_rights() Use CMSG_SPACE() to calculate the number of
filedescriptors passed in this message - the counterpart in
unp_externalize does this as well.

Note that CMSG_SPACE(0) does not make sense, since it does not invoke
the alignment magic - so use CMSG_SPACE(sizeof(int)) and adjust the
calculated total later.

This fixes the postfix conection cache for 64bit platforms. Previously
the number of passed filed descriptors (nfds) would have been
calculeted too high, causing the fdrelease() of uninitialized junk.


# 1.93 03-Sep-2005 martin

minor knf tweak


# 1.92 30-May-2005 martin

branches: 1.92.2;
Close additional file descriptors if we set MSG_CTRUNC in a SCM_RIGHTS
message. From der Mouse in PR kern/30370.


# 1.91 29-May-2005 christos

- add const.
- remove unnecessary casts.
- add __UNCONST casts and mark them with XXXUNCONST as necessary.


Revision tags: yamt-km-base4 yamt-km-base3 netbsd-3-base kent-audio2-base
# 1.90 26-Feb-2005 perry

branches: 1.90.2;
nuke trailing whitespace


Revision tags: yamt-km-base2 yamt-km-base kent-audio1-beforemerge kent-audio1-base
# 1.89 30-Nov-2004 christos

branches: 1.89.4; 1.89.6;
Cloning cleanup:
1. make fileops const
2. add 2 new negative errno's to `officially' support the cloning hack:
- EDUPFD (used to overload ENODEV)
- EMOVEFD (used to overload ENXIO)
3. Created an fdclone() function to encapsulate the operations needed for
EMOVEFD, and made all cloners use it.
4. Centralize the local noop/badop fileops functions to:
fnullop_fcntl, fnullop_poll, fnullop_kqfilter, fbadop_stat


# 1.88 22-May-2004 jonathan

Eliminate several uses of `curproc' from the socket-layer code and from NFS.

Add a new explicit `struct proc *p' argument to socreate(), sosend().
Use that argument instead of curproc. Follow-on changes to pass that
argument to socreate(), sosend(), and (*so->so_send)() calls.
These changes reviewed and independently recoded by Matt Thomas.

Changes to soreceive() and (*dom->dom_exernalize() from Matt Thomas:
pass soreceive()'s struct uio* uio->uio_procp to unp_externalize().
Eliminate curproc from unp_externalize. Also, now soreceive() uses
its uio->uio_procp value, pass that same value downward to
((pr->pru_usrreq)() calls for consistency, instead of (struct proc * )0.

Similar changes in sys/nfs to eliminate (most) uses of curproc,
either via the req-> r_procp field of a struct nfsreq *req argument,
or by passing down new explicit struct proc * arguments.

Reviewed by: Matt Thomas, posted to tech-kern.
NB: The (*pr->pru_usrreq)() change should be tested on more (all!) protocols.


# 1.87 18-May-2004 ragge

Fix connect() "bug": If connect() is interrupted by a signal, the connection
attempt is terminated, so if a process needs frequent timer interrupts
it can't ever connect() to a machine far away.

Bug found by Erik Lundgren, bugfix (for the same problem) is similar to
the way FreeBSD solved the same problem.

As a side effect, the new connect() behaviour conformes to Posix.


Revision tags: netbsd-2-0-base
# 1.86 29-Nov-2003 matt

branches: 1.86.2;
Restore a change that made AF_LOCAL sockets block on connect(2) until
accepted. However, this time this behavor is not the default. Instead
it must enabled by using the LOCAL_CONNWAIT socket option on either the
connecting or accepting socket.


# 1.85 29-Nov-2003 perry

Revert a change that altered the semantics of AF_LOCAL sockets. Sadly
this made us API incompatible with other Unixes.


# 1.84 13-Nov-2003 chs

eliminate uvm_useracc() in favor of checking the return value of
copyin() or copyout().

uvm_useracc() tells us whether the mapping permissions allow access to
the desired part of an address space, and many callers assume that
this is the same as knowing whether an attempt to access that part of
the address space will succeed. however, access to user space can
fail for reasons other than insufficient permission, most notably that
paging in any non-resident data can fail due to i/o errors. most of
the callers of uvm_useracc() make the above incorrect assumption. the
rest are all misguided optimizations, which optimize for the case
where an operation will fail. we'd rather optimize for operations
succeeding, in which case we should just attempt the access and handle
failures due to insufficient permissions the same way we handle i/o
errors. since there appear to be no good uses of uvm_useracc(), we'll
just remove it.


# 1.83 04-Sep-2003 matt

Adapt to the new calling conventions of unp_connect2


# 1.82 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.81 29-Jun-2003 fvdl

branches: 1.81.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.


# 1.80 28-Jun-2003 darrenr

Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V


# 1.79 05-Apr-2003 christos

PR/21030: Naoto Shimazaki: fcntl to accepted socket does not work properly


# 1.78 26-Feb-2003 matt

Remove leftover MBUFTRACE asserts.


# 1.77 26-Feb-2003 drochner

deactivate MBUFTRACE related KASSERT()s in the !MBUFTRACE case


# 1.76 26-Feb-2003 matt

Add MBUFTRACE kernel option.
Do a little mbuf rework while here. Change all uses of MGET*(*, M_WAIT, *)
to m_get*(M_WAIT, *). These are not performance critical and making them
call m_get saves considerable space. Add m_clget analogue of MCLGET and
make corresponding change for M_WAIT uses.
Modify netinet, gem, fxp, tulip, nfs to support MBUFTRACE.
Begin to change netstat to use sysctl.


# 1.75 18-Jan-2003 thorpej

Merge the nathanw_sa branch.


Revision tags: nathanw_sa_before_merge fvdl_fs64_base gmcgarry_ctxsw_base gmcgarry_ucred_base nathanw_sa_base
# 1.74 26-Nov-2002 christos

si_ -> sel_ to avoid conflicts with siginfo.


# 1.73 25-Nov-2002 itojun

no need for error check after MEXTMALLOC - jdolecek


# 1.72 25-Nov-2002 itojun

MEXTMALLOC() can fail even if M_WAITOK, if arg is too big for malloc().


Revision tags: kqueue-aftermerge
# 1.71 23-Oct-2002 jdolecek

merge kqueue branch into -current

kqueue provides a stateful and efficient event notification framework
currently supported events include socket, file, directory, fifo,
pipe, tty and device changes, and monitoring of processes and signals

kqueue is supported by all writable filesystems in NetBSD tree
(with exception of Coda) and all device drivers supporting poll(2)

based on work done by Jonathan Lemon for FreeBSD
initial NetBSD port done by Luke Mewburn and Jason Thorpe


Revision tags: kqueue-beforemerge kqueue-base
# 1.70 04-Sep-2002 matt

Use the queue macros from <sys/queue.h> instead of referring to the queue
members directly. Use *_FOREACH whenever possible.


Revision tags: gehenna-devsw-base
# 1.69 31-May-2002 itojun

support setsockopt() with larger data (up to MCLBYTES).
From: Hitoshi Asaeda <Hitoshi.Asaeda@sophia.inria.fr>


Revision tags: netbsd-1-6-PATCH002-RELEASE netbsd-1-6-PATCH002 netbsd-1-6-PATCH002-RC4 netbsd-1-6-PATCH002-RC3 netbsd-1-6-PATCH002-RC2 netbsd-1-6-PATCH002-RC1 netbsd-1-6-PATCH001 netbsd-1-6-PATCH001-RELEASE netbsd-1-6-PATCH001-RC3 netbsd-1-6-PATCH001-RC2 netbsd-1-6-PATCH001-RC1 netbsd-1-6-RELEASE netbsd-1-6-RC3 netbsd-1-6-RC2 netbsd-1-6-RC1 netbsd-1-6-base eeh-devprop-base newlock-base ifpoll-base
# 1.68 11-Feb-2002 jdolecek

branches: 1.68.8;
Switch default for pipes to the faster John S. Dyson's implementation.
Old, socketpair-based ones are available with option PIPE_SOCKETPAIR.


Revision tags: thorpej-mips-cache-base
# 1.67 12-Nov-2001 lukem

add RCSIDs


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2
# 1.66 16-Sep-2001 wiz

branches: 1.66.2;
Spell 'occurred' with two 'r's.


Revision tags: post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.65 17-Jul-2001 jdolecek

branches: 1.65.2;
Expel MSG_COMPAT/COMPAT_OLDSOCK, make the COMPAT_43 wrappers
arrange things as needed. Unfortunately, the check in sockargs()
have to stay, since 4.3BSD bind(2), connect(2) and sendto(2) were
not versioned at the time :(

This code was tested to pass regression tests.


# 1.64 01-Jul-2001 matt

branches: 1.64.2;
Use consistent types for len. Limit sockarg length to reasonable values.


# 1.63 25-Jun-2001 jdolecek

Back off the sendit()/recvit() change, some have problems with it


# 1.62 25-Jun-2001 jdolecek

sys_connect(): fix the call to FILE_UNUSE() so that it's done on return, rather
than immediatelly after getsock() call


# 1.61 25-Jun-2001 jdolecek

Add 'kernsa' parameter for sendit()/recvit(); if nonzero, msg->msg_name
is supposed to point directly to struct mbuf or struct sockaddr in kernel
space as appropriate, rather than being a pointer to memory in userland.

This is to be used by compat/* when emulation needs to wrap
send{to|msg}(2)/recv{from|msg}(2) and modify the passed struct
sockaddr.


# 1.60 16-Jun-2001 jdolecek

Add port of high performance pipe implementation written by John S. Dyson
for FreeBSD project. Besides huge speed boost compared with socketpair-based
pipes, this implementation also uses pagable kernel memory instead of mbufs.

Significant differences to FreeBSD version:
* uses uvm_loan() facility for direct write
* async/SIGIO handling correct also for sync writer, async reader
* limits settable via sysctl, amountpipekva and nbigpipes available via sysctl
* pipes are unidirectional - this is enforced on file descriptor level
for now only, the code would be updated to take advantage of it
eventually
* uses lockmgr(9)-based locks instead of home brew variant
* scatter-gather write is handled correctly for direct write case, data
is transferred by PIPE_DIRECT_CHUNK bytes maximum, to avoid running out of kva

All FreeBSD/NetBSD specific code is within appropriate #ifdef, in preparation
to feed changes back to FreeBSD tree.

This pipe implementation is optional for now, add 'options NEW_PIPE'
to your kernel config to use it.


# 1.59 14-Jun-2001 thorpej

Fix a partial construction problem that can cause race conditions
between creation of a file descriptor and close(2) when using kernel
assisted threads. What we do is stick descriptors in the table, but
mark them as "larval". This causes essentially everything to treat
it as a non-existent descriptor, except for fdalloc(), which sees a
filled slot so that it won't (incorrectly) allocate it again. When
a descriptor is fully constructed, the code that has constructed it
marks it as "mature" (which actually clears the "larval" flag), and
things continue to work as normal.

While here, gather all the code that gets a descriptor from the table
into a fd_getfile() function, and call it, rather than having the
same (sometimes incorrect) code copied all over the place.


# 1.58 06-May-2001 manu

implement the recently introduced EMUL_BSD_ASYNCIO_PIPE emulation package
flag.

EMUL_BSD_ASYNCIO_PIPE notes that the emulated binaries expect the original
BSD pipe behavior for asynchronous I/O, which is to fire SIGIO on read() and
write(). OSes without this flag do not expect any SIGIO to be fired on
read() and write() for pipes, even when async I/O was requested. As far as
we know, the OSes that need EMUL_BSD_ASYNCIO_PIPE are NetBSD, OSF/1 and
Darwin.


Revision tags: thorpej_scsipi_beforemerge thorpej_scsipi_nbase thorpej_scsipi_base
# 1.57 27-Feb-2001 lukem

branches: 1.57.2;
convert to ANSI KNF


# 1.56 10-Dec-2000 fvdl

Make sobind() take a struct proc *. It already took curproc and
passed it down to the appropriate usrreq function, and this
allows usage for contexts that need to be explicitly different
from curproc (like in the NFS code when binding to a reserved port).


# 1.55 24-Nov-2000 jdolecek

define COMPAT_OLDSOCK unconditionally - the code is used virtually for all
emulations besides NetBSD, and this way it's LKM-safe


# 1.54 02-Aug-2000 thorpej

MALLOC()/FREE() are not to be used for variable sized allocations.


# 1.53 27-Jun-2000 mrg

remove include of <vm/vm.h>


Revision tags: netbsd-1-5-RELEASE netbsd-1-5-BETA2 netbsd-1-5-BETA netbsd-1-5-ALPHA2 netbsd-1-5-base minoura-xpg4dl-base
# 1.52 27-May-2000 sommerfeld

branches: 1.52.4;
Reduce use of curproc in several places:

- Change ktrace interface to pass in the current process, rather than
p->p_tracep, since the various ktr* function need curproc anyway.

- Add curproc as a parameter to mi_switch() since all callers had it
handy anyway.

- Add a second proc argument for inferior() since callers all had
curproc handy.

Also, miscellaneous cleanups in ktrace:

- ktrace now always uses file-based, rather than vnode-based I/O
(simplifies, increases type safety); eliminate KTRFLAG_FD & KTRFAC_FD.
Do non-blocking I/O, and yield a finite number of times when receiving
EWOULDBLOCK before giving up.

- move code duplicated between sys_fktrace and sys_ktrace into ktrace_common.

- simplify interface to ktrwrite()


# 1.51 30-Mar-2000 augustss

Get rid of register declarations.


# 1.50 23-Mar-2000 thorpej

Implement fdremove() which is used in place of all the code that
did the "fdp->fd_ofiles[fd] = 0" assignment; fdremove() make sure
the fd_freefiles hints stay in sync.

From OpenBSD.


Revision tags: chs-ubc2-newbase wrstuden-devbsize-19991221 wrstuden-devbsize-base comdex-fall-1999-base fvdl-softdep-base
# 1.49 05-Nov-1999 mycroft

branches: 1.49.2;
Fix recent bug in sys_accept(): we must remove the file descriptor from the
file descriptor table before freeing the file description.


# 1.48 30-Oct-1999 enami

back out unnecessary stylistic changes in recent changes, to keep coding
style closer to NKF.


# 1.47 27-Oct-1999 jdolecek

minor cleanup of previous - avoid goto and code duplication


# 1.46 27-Oct-1999 darrenr

patch from Greg A. Woods to fix panic problems with code that attempts to
recover from failures to accept a socket successfully. Problem suggested
by this:
> It would appear (from two "panic: closef: count < 0" failures in less
> than 12 hours) that Darren's fix to accept(2) for lost file descriptors
> isn't quite correct. His fix inserts a call to closef() to handle one
> of several possible error conditions. However everywhere else in the
> socket code in the same file where falloc() cleanup is necessary the
> function used is ffree().


Revision tags: chs-ubc2-base
# 1.45 01-Jul-1999 itojun

branches: 1.45.2; 1.45.4; 1.45.6;
IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.


# 1.44 01-Jul-1999 darrenr

fix sys_accept() to return EOPNOTSUPP for protocols which don't support
listen/accept (PR_LISTEN flag in protosw) and detect obvious faults in
parameters passed. It is still possible for the address used for copying
the socket information to become invalid between that check and the copyout
so close the connection's allocated fd if the copyout fails so that we can
return EFAULT without allocating an fd and the application not knowing about
it. Ideally we'd be able to queue the connection back up so a later accept
could retrieve it but unfortunately that's not possible.


# 1.43 05-May-1999 thorpej

Add "use counting" to file entries. When closing a file, and it's reference
count is 0, wait for use count to drain before finishing the close.

This is necessary in order for multiple processes to safely share file
descriptor tables.


# 1.42 30-Apr-1999 cgd

add checks for COMPAT_OSF1 in the appropriate places


Revision tags: netbsd-1-4-RELEASE netbsd-1-4-base
# 1.41 10-Feb-1999 kleink

branches: 1.41.2; 1.41.4; 1.41.6;
* Due to addition and use of socklen_t, make the socket option and address
arguments passed to accept(), bind(), connect(), getpeername(), getsockname(),
getsockopt(), recvfrom(), sendto() and sendmsg() unsigned, which also elimiates
a few casts.
* Reflect the (now) signedness of msg_iovlen, which necessiates the addition
of a few casts.


# 1.40 18-Dec-1998 drochner

solve the COMPAT_OLDSOCK/MSG_COMPAT problem differently:
The source files which need MSG_COMPAT define COMPAT_OLDSOCK.


Revision tags: kenh-if-detach-base
# 1.39 26-Nov-1998 mycroft

Revert the functional change in rev 1.38; permit a msg_iovlen of 0.
There are two reasons for this:
* We should be able to pass file descriptors without sending any data.
* We could send zero-length iovecs anyway (but we shouldn't have to do this).
Also, msg_iovlen is already a u_int, so delete a bunch of casts.


Revision tags: chs-ubc-base
# 1.38 04-Aug-1998 kleink

Per XNS Issue 5, calling recvmsg(2) or sendmsg(2) with an msg.msg_iovlen less
than or equal to 0 shall fail with EMSGSIZE; the latter condition was not being
checked for. Also, document the msg.msg_iovlen > {IOV_MAX} case.


# 1.37 04-Aug-1998 kleink

UIO_MAXIOV -> IOV_MAX


# 1.36 04-Aug-1998 perry

Abolition of bcopy, ovbcopy, bcmp, and bzero, phase one.
bcopy(x, y, z) -> memcpy(y, x, z)
ovbcopy(x, y, z) -> memmove(y, x, z)
bcmp(x, y, z) -> memcmp(x, y, z)
bzero(x, y) -> memset(x, 0, y)


# 1.35 03-Aug-1998 kleink

Fix two off-by-one bugs, both present in each recvmsg(2) and sendmsg(2):
* the first one would cause an unnecessary malloc() of iovec storage for
a msg_iovlen of UIO_SMALLIOV although the required amount of memory has
been allocated on the stack.
* the second one would cause a recvmsg() or sendmsg() with a msg_iovlen of
UIO_MAXIOV to fail with EMSGSIZE, which is also a violation of XNS5.


# 1.34 31-Jul-1998 perry

fix sizeofs so they comply with the KNF style guide. yes, it is pedantic.


Revision tags: eeh-paddr_t-base
# 1.33 29-Jul-1998 thorpej

branches: 1.33.2;
When checking for overflow in the residual count, test against SSIZE_MAX.
The read/write system calls return ssize_t because -1 is used to indicate
error, therefore the transfer size MUST be limited to SSIZE_MAX, otherwise
garbage can be returned to the user.

There is NO change from existing behavior here, only a more precise
definition of that the semantics are, except in the Alpha case, where
the full SSIZE_MAX transfer size can now be realized (ssize_t is 64-bit
on the Alpha).


# 1.32 18-Jul-1998 lukem

use AF_LOCAL instead of AF_UNIX


# 1.31 25-Jun-1998 thorpej

defopt KTRACE


# 1.30 25-Apr-1998 matt

Hook for 0-copy (or other optimized) sends and receives


# 1.29 01-Mar-1998 fvdl

Merge with Lite2 + local changes


# 1.28 06-Feb-1998 thorpej

When copying out multiple control messages, ensure that the next control
message is aligned. From David Borman <dab@bsdi.com>.


# 1.27 07-Jan-1998 thorpej

Make insertion and removal of sockets from the partial and incoming
connections queues O(C) rather than O(N).


# 1.26 07-Jan-1998 thorpej

Fix bug in recvit() that would cause recvmsg() to only receive one
control message, even if there were multiple control messages on
the queue. From Jean-Luc Richier <Jean-Luc.Richier@imag.fr>, in
bug report kern/4700.


Revision tags: netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base thorpej-signal-base marc-pcmcia-bp marc-pcmcia-base
# 1.25 26-Jun-1997 thorpej

branches: 1.25.8;
Use UCHAR_MAX rather than "255" when sanity-checking the length of a
sockaddr in sockargs().


# 1.24 26-Jun-1997 thorpej

In sockargs():
- Add a comment describing my feelings about this interface, in general.
- Remove the COMPAT_OLDSOCK length hack. Instead, if the socket argument
is too long to fit in an mbuf, allocate enough external storage to
hold it.
- If the socket argument is a sockaddr, don't allow the length to be
greater than 255, as that would overflow sa_len.
Many thanks to enami tsugutomo <enami@cv.sony.co.jp> for his sanity checking.


Revision tags: is-newarp-before-merge is-newarp-base
# 1.23 22-Dec-1996 cgd

* catch up with system call argument type fixups/const poisoning.
* Fix arguments to various copyin()/copyout() invocations, to avoid
gratuitous casts.
* Some KNF formatting fixes
* Change sockargs()'s second argument to be a const void *, to help
with dealing with the syscall argument type fixups/const poisoning.


# 1.22 14-Jun-1996 cgd

avoid unnecessary checks of m_get/MGET/etc.'s return values. When
they're called with M_WAIT, they are defined to never return NULL.


# 1.21 22-May-1996 mycroft

Pass a proc pointer down to the usrreq and pcbbind functions for PRU_ATTACH, PRU_BIND and
PRU_CONTROL. The usrreq interface really needs to be split up, but this will have to wait.
Remove SS_PRIV completely.


Revision tags: netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.20 17-May-1996 pk

branches: 1.20.4;
Don't touch retval[] in socketpair(); manual page says this system call
return 0 on success (PR#2428).


# 1.19 09-Feb-1996 christos

More proto fixes


# 1.18 04-Feb-1996 christos

First pass at prototyping


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.17 10-Oct-1995 mycroft

Add hooks for COMPAT_FREEBSD, from Noriyuki Soda.


# 1.16 07-Oct-1995 mycroft

Prefix names of system call implementation functions with `sys_'.


# 1.15 19-Sep-1995 thorpej

Make system calls conform to a standard prototype and bring those
prototypes into scope.


# 1.14 12-Aug-1995 mycroft

splnet --> splsoftnet


# 1.13 24-Jun-1995 christos

Extracted all of the compat_xxx routines, and created a library [libcompat]
for them. There are a few #ifdef COMPAT_XX remaining, but they are not easy
or worth eliminating (yet).


# 1.12 10-May-1995 christos

tty_tb.c: need to include ioctl_compat.h in order to compile.
sysv_shm.c: make shm_find_segment_by_shmid global so it can be used by
COMPAT_HPUX. There should be a better way...
rest: Add #ifdef COMPAT_HPUX where needed


# 1.11 05-Mar-1995 fvdl

Extended a couple of defines with "|| defined(COMPAT_LINUX)" to make
things compile without requiring COMPAT_43 and/or COMPAT_09.


# 1.10 30-Oct-1994 cgd

be more careful with types, also pull in headers where necessary.


# 1.9 20-Oct-1994 cgd

update for new syscall args description mechanism


Revision tags: netbsd-1-0-PATCH06 netbsd-1-0-PATCH05 netbsd-1-0-PATCH04 netbsd-1-0-PATCH03 netbsd-1-0-PATCH02 netbsd-1-0-PATCH1 netbsd-1-0-PATCH0 netbsd-1-0-RELEASE netbsd-1-0-base
# 1.8 29-Jun-1994 cgd

New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.7 04-May-1994 mycroft

Add return types where missing. Simplify some of the compat conditionals.
Include compat code if COMPAT_SUNOS with or without COMPAT_43.


# 1.6 18-Dec-1993 mycroft

Canonicalize all #includes.


Revision tags: magnum-base netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.5 17-Jul-1993 mycroft

branches: 1.5.4;
Finish moving struct definitions outside of function declarations.


# 1.4 27-Jun-1993 andrew

* ansifications
* Yuval Yarom's socket recv(2) fixes - access rights problems (see also
uipc_socket.c).


# 1.3 22-May-1993 cgd

add include of select.h if necessary for protos, or delete if extraneous


# 1.2 18-May-1993 cgd

make kernel select interface be one-stop shopping & clean it all up.


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


Revision tags: nick-nhusb-base-20170204
# 1.186 03-Feb-2017 christos

expose sendmsg_so and recvmsg_so.


# 1.185 02-Feb-2017 christos

expose copyout_sockname_sb


Revision tags: bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204
# 1.184 03-Dec-2016 christos

Add missing ktrkuser


Revision tags: pgoyette-localcount-20161104 nick-nhusb-base-20161004 localcount-20160914
# 1.183 13-Sep-2016 martin

Make the ktrace record written by do_sys_sendmsg/do_sys_recvmsg overridable
by the caller. Use this in compat_netbsd32 to log the 32bit version, so
the 32bit userland kdump is happy.


Revision tags: pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.182 07-Jul-2016 msaitoh

branches: 1.182.2;
KNF. Remove extra spaces. No functional change.


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.181 01-Nov-2015 christos

Don't overwrite the user iov pointer in sendmmsg. Make the send and receive
code look the same.


Revision tags: nick-nhusb-base-20150921
# 1.180 24-Aug-2015 pooka

sprinkle _KERNEL_OPT


# 1.179 22-Jul-2015 maxv

Memory leak. Triggerable from an unprivileged user via COMPAT_43.


Revision tags: nick-nhusb-base-20150606
# 1.178 09-May-2015 rtr

change sosend() to accept sockaddr * instead of mbuf * for nam.

bump to 7.99.16


# 1.177 02-May-2015 rtr

make connect syscall use sockaddr_big and modify pr_{send,connect}
nam parameter type from buf * to sockaddr *.

final commit for parameter type changes to protocol user requests

* bump kernel version to 7.99.15 for parameter type changes to pr_{send,connect}


# 1.176 24-Apr-2015 rtr

make accept, getsockname and getpeername syscalls use sockaddr_big and modify
pr_{accept,sockname,peername} nam parameter type from mbuf * to sockaddr *.

* retained use of mbuftypes[MT_SONAME] for now.
* bump to netbsd version 7.99.12 for parameter type change.

patch posted to tech-net@ 2015/04/19


Revision tags: nick-nhusb-base-20150406
# 1.175 03-Apr-2015 rtr

* change pr_bind to accept struct sockaddr * instead of struct mbuf *
* update protocol bind implementations to use/expect sockaddr *
instead of mbuf *
* introduce sockaddr_big struct for storage of addr data passed via
sys_bind; sockaddr_big is of sufficient size and alignment to
accommodate all addr data sizes received.
* modify sys_bind to allocate sockaddr_big instead of using an mbuf.
* bump kernel version to 7.99.9 for change to pr_bind() parameter type.

Patch posted to tech-net@
http://mail-index.netbsd.org/tech-net/2015/03/15/msg005004.html

The choice to use a new structure sockaddr_big has been retained since
changing sockaddr_storage size would lead to unnecessary ABI change. The
use of the new structure does not preclude future work that increases
the size of sockaddr_storage and at that time sockaddr_big may be
trivially replaced.

Tested by mrg@ and myself, discussed with rmind@, posted to tech-net@


# 1.174 06-Mar-2015 rtr

Return EINVAL if namelen isn't large enough to encompass the expected
members of sockaddr structures. i.e. sa_len and sa_family.

Discussed with and patch by christos@


Revision tags: nick-nhusb-base
# 1.173 05-Sep-2014 matt

branches: 1.173.2;
Try not to use f_data, use f_{vnode,socket,pipe,mqueue,kqueue,ksem} to get
a correctly typed pointer.


Revision tags: netbsd-7-0-RC2 netbsd-7-0-RC1 netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.172 09-Aug-2014 rtr

branches: 1.172.2;
split PRU_CONNECT2 & PRU_PURGEIF function out of pr_generic() usrreq
switches and put into separate functions

- always KASSERT(solocked(so)) even if not implemented
(for PRU_CONNECT2 only)

- replace calls to pr_generic() with req = PRU_CONNECT2 with calls to
pr_connect2()

- replace calls to pr_generic() with req = PRU_PURGEIF with calls to
pr_purgeif()

put common code from unp_connect2() (used by unp_connect() into
unp_connect1() and call out to it when needed

patch only briefly reviewed by rmind@


# 1.171 09-Jul-2014 rtr

* split PRU_PEERADDR and PRU_SOCKADDR function out of pr_generic()
usrreq switches and put into separate functions
xxx_{peer,sock}addr(struct socket *, struct mbuf *).

- KASSERT(solocked(so)) always in new functions even if request
is not implemented

- KASSERT(pcb != NULL) and KASSERT(nam) if the request is
implemented and not for tcp.

* for tcp roll #ifdef KPROF and #ifdef DEBUG code from tcp_usrreq() into
easier to cut & paste functions tcp_debug_capture() and
tcp_debug_trace()

- functions provided by rmind
- remaining use of PRU_{PEER,SOCK}ADDR #define to be removed in a
future commit.

* rename netbt functions to permit consistency of pru function names
(as has been done with other requests already split out).

- l2cap_{peer,sock}addr() -> l2cap_{peer,sock}_addr_pcb()
- rfcomm_{peer,sock}addr() -> rfcomm_{peer,sock}_addr_pcb()
- sco_{peer,sock}addr() -> sco_{peer,sock}_addr_pcb()

* split/refactor do_sys_getsockname(lwp, fd, which, nam) into
two functions do_sys_get{peer,sock}name(fd, nam).

- move PRU_PEERADDR handling into do_sys_getpeername() from
do_sys_getsockname()
- have svr4_stream directly call do_sys_get{sock,peer}name()
respectively instead of providing `which' & fix a DPRINTF string
that incorrectly wrote "getpeername" when it meant "getsockname"
- fix sys_getpeername() and sys_getsockname() to call
do_sys_get{sock,peer}name() without `which' and `lwp' & adjust
comments
- bump kernel version for removal of lwp & which parameters from
do_sys_getsockname()

note: future cleanup to remove struct mbuf * abuse in
xxx_{peer,sock}name()
still to come, not done in this commit since it is easier to do post
split.

patch reviewed by rmind

welcome to 6.99.47


Revision tags: rmind-smpnet-nbase rmind-smpnet-base
# 1.170 18-May-2014 rmind

Add struct pr_usrreqs with a pr_generic function and prepare for the
dismantling of pr_usrreq in the protocols; no functional change intended.
PRU_ATTACH/PRU_DETACH changes will follow soon.

Bump for struct protosw. Welcome to 6.99.62!


# 1.169 17-May-2014 rmind

- fsocreate: set SS_NBIO before the file descriptor is affixed as there is
a theoretical race condition (hard to trigger, though); remove the LWP
parameter and clean up the code a little.
- Sprinkle few comments.
- Remove M_SOOPTS while here.


# 1.168 17-May-2014 rmind

makesocket: set SS_NBIO slightly earlier.


# 1.167 17-May-2014 rmind

Remove trailing whitespaces, wrap long lines, minor KNF; no functional changes.


Revision tags: yamt-pagecache-base9
# 1.166 07-Apr-2014 seanb

Fix a case where an erroneous EAGAIN was returned out of recvmmsg.
This occured when some, but not all of the mmsg array members
were filled with data from a non-blocking socket.
PR kern/48725


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.165 09-Oct-2013 christos

branches: 1.165.2;
delete extra m_len initialization.


# 1.164 09-Oct-2013 christos

PR/48292: Justin Cormack: paccept creates sockets that cannot be made blocking
Reset the socket flags not just the file flags for non-blocking I/O.
XXX: pullup 6


# 1.163 08-Oct-2013 christos

PR/47591: Michael Plass: If the unix socket is closed before accept,
unp->unp_conn will be NULL in PRU_ACCEPT, as called from
sys_accept->so_accept. This will cause the usrreq to return with
no error, leaving the mbuf gotten from m_get() with an uninitialized
length, containing junk from a previous call. Initialize m_len to
be 0 to handle this case. This is yet another reason why Beverly's
idea of setting m_len = 0 in m_get() makes a lot of sense. Arguably
this could be an error, since the data we return now has 0 family
and length.


# 1.162 29-Aug-2013 rmind

Remove SS_ISCONFIRMING, it is unused and TP4 will not come back.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.161 03-Jun-2013 christos

branches: 1.161.2;
use the proper name for kdump pretty-printing.


Revision tags: agc-symver-base
# 1.160 14-Feb-2013 christos

PR/47569: Valery Ushakov: SOCK_NONBLOCK does not work because it does not
set SS_NBIO.
XXX: there are too many flags that mean the same thing in too many places,
and too many flags that mean the same thing and are different.


# 1.159 14-Feb-2013 riastradh

Fix some screw cases in cmsg file descriptor passing.

- Don't leave garbage in the control buffer if allocating file
descriptors fails in unp_externalize.

- Scrub the space between CMSG_LEN and CMSG_SPACE to avoid kernel
memory disclosure in unp_externalize.

- Don't read past cmsg_len when closing file descriptors that
couldn't get delivered, in free_rights.

ok christos


Revision tags: yamt-pagecache-base8
# 1.158 29-Dec-2012 mlelstv

The sanity check prevented messages that carry only ancillary data.


# 1.157 29-Dec-2012 mlelstv

If an untraced process sleeps in recvmsg/sendmsg, the syscall does not
allocate an iov structure for ktrace. When tracing is then enabled
and the process wakes up, it crashes the kernel.

Undo the last commit which introduced this error path.

Avoid the mentioned kmem_alloc assertion by adding a sanity check analog
to similar code in sys_generic.c for I/O on file handles instead of
sockets.

This also causes the syscall to return EMSGSIZE if the msg_iovlen member
of the msg structure is less than or equal to 0, as defined in
recvmsg(2)/sendmsg(2).


Revision tags: yamt-pagecache-base7 yamt-pagecache-base6
# 1.156 17-Jul-2012 njoly

branches: 1.156.2;
Avoid kmem_alloc KASSERT for 0 byte allocation, when tracing processes
that use empty messages with sendmsg/recvmsg.


# 1.155 22-Jun-2012 christos

Add {send,recv}mmsg from Linux


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2 netbsd-6-base
# 1.154 25-Jan-2012 christos

branches: 1.154.2;
revert atomics for so_options since it is a short.


# 1.153 25-Jan-2012 christos

need <sys/atomic.h>


# 1.152 25-Jan-2012 christos

Add locking, requested by yamt. Note that locking is not used everywhere
for these.


# 1.151 25-Jan-2012 christos

As discussed in tech-kern, provide the means to prevent delivery of SIGPIPE
on EPIPE for all file descriptor types:

- provide O_NOSIGPIPE for open,kqueue1,pipe2,dup3,fcntl(F_{G,S}ETFL) [NetBSD]
- provide SOCK_NOSIGPIPE for socket,socketpair [NetBSD]
- provide SO_NOSIGPIPE for {g,s}seckopt [NetBSD/FreeBSD/MacOSX]
- provide F_{G,S}ETNOSIGPIPE for fcntl [MacOSX]


# 1.150 21-Dec-2011 christos

simplify expression


# 1.149 20-Dec-2011 christos

- Eliminate so_nbio and turn it into a bit SS_NBIO in so_state.
- Introduce MSG_NBIO so that we can turn non blocking i/o on a per call basis
- Use MSG_NBIO to fix the XXX: multi-threaded issues on the fifo sockets.
- Don't set SO_CANTRCVMORE, if we were interrupted (perhaps do it for all
errors?).


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2
# 1.148 04-Nov-2011 christos

branches: 1.148.4;
Fix error I introduced in previous commit that caused asymmetric connects
when SOCK_NONBLOCK or SOCK_CLOEXEC was specified. Factor out common code
and simplify error return.


Revision tags: yamt-pagecache-base
# 1.147 21-Sep-2011 christos

branches: 1.147.2;
Put the mbuf type in the ktrace record so that we know how to decode it
in userland.


# 1.146 27-Jul-2011 uebayasi

These don't need uvm/uvm_extern.h.


# 1.145 15-Jul-2011 christos

fail with EINVAL if flags not are not O_CLOEXEC|O_NONBLOCK in pipe2(2) and
dup3(2)


# 1.144 26-Jun-2011 christos

* Arrange for interfaces that create new file descriptors to be able to
set close-on-exec on creation (http://udrepper.livejournal.com/20407.html).

- Add F_DUPFD_CLOEXEC to fcntl(2).
- Add MSG_CMSG_CLOEXEC to recvmsg(2) for unix file descriptor passing.
- Add dup3(2) syscall with a flags argument for O_CLOEXEC, O_NONBLOCK.
- Add pipe2(2) syscall with a flags argument for O_CLOEXEC, O_NONBLOCK.
- Add flags SOCK_CLOEXEC, SOCK_NONBLOCK to the socket type parameter
for socket(2) and socketpair(2).
- Add new paccept(2) syscall that takes an additional sigset_t to alter
the sigmask temporarily and a flags argument to set SOCK_CLOEXEC,
SOCK_NONBLOCK.
- Add new mode character 'e' to fopen(3) and popen(3) to open pipes
and file descriptors for close on exec.
- Add new kqueue1(2) syscall with a new flags argument to open the
kqueue file descriptor with O_CLOEXEC, O_NONBLOCK.

* Fix the system calls that take socklen_t arguments to actually do so.

* Don't include userland header files (signal.h) from system header files
(rump_syscallargs.h).

* Bump libc version for the new syscalls.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.143 24-Apr-2011 rmind

- Replace few malloc(9) uses with kmem(9).
- Rename buf_malloc() to buf_alloc(), fix comments.
- Remove some unnecessary inclusions.


# 1.142 10-Apr-2011 christos

- Add O_CLOEXEC to open(2)
- Add fd_set_exclose() to encapsulate uses of FIO{,N}CLEX, O_CLOEXEC, F{G,S}ETFD
- Add a pipe1() function to allow passing flags to the fd's that pipe(2)
opens to ease implementation of linux pipe2(2)
- Factor out fp handling code from open(2) and fhopen(2)


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base matt-mips64-premerge-20101231 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10 uebayasi-xip-base1
# 1.141 23-Apr-2010 rmind

branches: 1.141.2;
Replace M_IOV and some malloc(9)s with kmem(9), and while there:
- Fix invalid free (M_TEMP vs M_IOV) in do_sys_recvmsg(), spotted by jakllsch@.
Also, same fix in osf1_sys_sendmsg_xopen().
- Fix attempt to free non-allocated memory in error path in netbsd32___getfh30().
- Plug a memory leak in compat_43_netbsd32_orecvmsg().


Revision tags: yamt-nfs-mp-base9 uebayasi-xip-base
# 1.140 21-Jan-2010 pgoyette

branches: 1.140.2; 1.140.4;
Remove unnecessary call to kauth_cred_free().

This resolves an occassional crash I'd been experiencing as reported on
current-users@

Fix suggested by and OK elad@


# 1.139 29-Dec-2009 elad

Add credentials to to sockets.

We don't need any deferred free etc. because we no longer free the
credentials in interrupt context.

Tons of help from matt@, thanks!


# 1.138 20-Dec-2009 dsl

If a multithreaded app closes an fd while another thread is blocked in
read/write/accept, then the expectation is that the blocked thread will
exit and the close complete.
Since only one fd is affected, but many fd can refer to the same file,
the close code can only request the fs code unblock with ERESTART.
Fixed for pipes and sockets, ERESTART will only be generated after such
a close - so there should be no change for other programs.
Also rename fo_abort() to fo_restart() (this used to be fo_drain()).
Fixes PR/26567


Revision tags: matt-premerge-20091211
# 1.137 09-Dec-2009 dsl

Rename fo_drain() to fo_abort(), 'drain' is used to mean 'wait for output
do drain' in many places, whereas fo_drain() was called in order to force
blocking read()/write() etc calls to return to userspace so that a close()
call from a different thread can complete.
In the sockets code comment out the broken code in the inner function,
it was being called from compat code.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 jym-xensuspend-nbase yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.136 04-Apr-2009 ad

Add fileops::fo_drain(), to be called from fd_close() when there is more
than one active reference to a file descriptor. It should dislodge threads
sleeping while holding a reference to the descriptor. Implemented only for
sockets but should be extended to pipes, fifos, etc.

Fixes the case of a multithreaded process doing something like the
following, which would have hung until the process got a signal.

thr0 accept(fd, ...)
thr1 close(fd)


Revision tags: nick-hppapmap-base2
# 1.135 21-Jan-2009 yamt

branches: 1.135.2;
restore the pre socket locking patch signal behaviour.
this fixes a busy-loop in nfs_connect.


Revision tags: netbsd-5-0-RC1 haad-dm-base2 haad-nbase2 ad-audiomp2-base netbsd-5-base matt-mips64-base2 haad-dm-base1 wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 haad-dm-base mjf-devfs2-base
# 1.134 06-Aug-2008 plunky

branches: 1.134.2; 1.134.4;
Convert socket options code to use a sockopt structure
instead of laying everything into an mbuf.

approved by core


Revision tags: simonb-wapbl-nbase simonb-wapbl-base
# 1.133 24-Jun-2008 ad

branches: 1.133.2;
Nothing uses getsock/getvnode any more.


Revision tags: wrstuden-revivesa-base-1 yamt-pf42-base4 yamt-pf42-base3 wrstuden-revivesa-base
# 1.132 30-May-2008 rmind

branches: 1.132.2;
do_sys_accept: release the reference to sock in few error paths.
Should fix PR/38790, report and test-case by Nicolas Joly.


Revision tags: hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2
# 1.131 28-Apr-2008 martin

branches: 1.131.2;
Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.130 24-Apr-2008 ad

branches: 1.130.2;
Network protocol interrupts can now block on locks, so merge the globals
proclist_mutex and proclist_lock into a single adaptive mutex (proc_lock).
Implications:

- Inspecting process state requires thread context, so signals can no longer
be sent from a hardware interrupt handler. Signal activity must be
deferred to a soft interrupt or kthread.

- As the proc state locking is simplified, it's now safe to take exit()
and wait() out from under kernel_lock.

- The system spends less time at IPL_SCHED, and there is less lock activity.


# 1.129 24-Apr-2008 ad

Merge the socket locking patch:

- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.

With much feedback from matt@ and plunky@.


Revision tags: yamt-pf42-baseX yamt-pf42-base ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14
# 1.128 21-Mar-2008 ad

branches: 1.128.2;
Catch up with descriptor handling changes. See kern_descrip.c revision
1.173 for details.


Revision tags: keiichi-mipv6-nbase nick-net80211-sync-base keiichi-mipv6-base matt-armv6-nbase mjf-devfs-base hpcarm-cleanup-base
# 1.127 06-Feb-2008 ad

branches: 1.127.6;
Don't lock the socket to set/clear FNONBLOCK. Just set it atomically.


Revision tags: vmlocking2-base3 bouyer-xeni386-nbase bouyer-xeni386-base matt-armv6-base
# 1.126 26-Dec-2007 ad

Merge more changes from vmlocking2, mainly:

- Locking improvements.
- Use pool_cache for more items.


# 1.125 20-Dec-2007 dsl

Convert all the system call entry points from:
int foo(struct lwp *l, void *v, register_t *retval)
to:
int foo(struct lwp *l, const struct foo_args *uap, register_t *retval)
Fixup compat code to not write into 'uap' and (in some cases) to actually
pass a correctly formatted 'uap' structure with the right name to the
next routine.
A few 'compat' routines that just call standard ones have been deleted.
All the 'compat' code compiles (along with the kernels required to test
build it).
98% done by automated scripts.


# 1.124 16-Dec-2007 elad

Make solisten() take an lwp pointer like the rest, so it can be passed down
to pr_usrreq.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 vmlocking-nbase jmcneill-pm-base reinoud-bufcleanup-base
# 1.123 24-Nov-2007 dyoung

branches: 1.123.2; 1.123.6;
Pass the mbuf type (e.g., MT_SONAME, MT_SOOPTS) as the second
argument to getsockmbuf().


Revision tags: jmcneill-base bouyer-xenamd64-base2 yamt-x86pmap-base4 bouyer-xenamd64-base yamt-x86pmap-base3 yamt-x86pmap-base2 vmlocking-base
# 1.122 05-Oct-2007 dyoung

branches: 1.122.4;
Use getsombuf().


Revision tags: yamt-x86pmap-base
# 1.121 19-Sep-2007 christos

branches: 1.121.2;
minor nits; no code change.


# 1.120 19-Sep-2007 dyoung

1) Introduce a new socket option, (SOL_SOCKET, SO_NOHEADER), that
tells a socket that it should both add a protocol header to tx'd
datagrams and remove the header from rx'd datagrams:

int onoff = 1, s = socket(...);
setsockopt(s, SOL_SOCKET, SO_NOHEADER, &onoff);

2) Add an implementation of (SOL_SOCKET, SO_NOHEADER) for raw IPv4
sockets.

3) Reorganize the protocols' pr_ctloutput implementations a bit.
Consistently return ENOPROTOOPT when an option is unsupported,
and EINVAL if a supported option's arguments are incorrect.
Reorganize the flow of code so that it's more clear how/when
options are passed down the stack until they are handled.

Shorten some pr_ctloutput staircases for readability.

4) Extract common mbuf code into subroutines, add new sockaddr
methods, and introduce a new subroutine, fsocreate(), for reuse
later; use it first in sys_socket():

struct mbuf *m_getsombuf(struct socket *so)

Create an mbuf and make its owner the socket `so'.

struct mbuf *m_intopt(struct socket *so, int val)

Create an mbuf, make its owner the socket `so', put the
int `val' into it, and set its length to sizeof(int).


int fsocreate(..., int *fd)

Create a socket, a la socreate(9), put the socket into the
given LWP's descriptor table, return the descriptor at `fd'
on success.

void *sockaddr_addr(struct sockaddr *sa, socklen_t *slenp)
const void *sockaddr_const_addr(const struct sockaddr *sa, socklen_t *slenp)

Extract a pointer to the address part of a sockaddr. Write
the length of the address part at `slenp', if `slenp' is
not NULL.

socklen_t sockaddr_getlen(const struct sockaddr *sa)

Return the length of a sockaddr. This just evaluates to
sa->sa_len. I only add this for consistency with code that
appears in a portable userland library that I am going to
import.

const struct sockaddr *sockaddr_any(const struct sockaddr *sa)

Return the "don't care" sockaddr in the same family as
`sa'. This is the address a client should sobind(9) if it
does not care the source address and, if applicable, the
port et cetera that it uses.

const void *sockaddr_anyaddr(const struct sockaddr *sa, socklen_t *slenp)

Return the "don't care" sockaddr in the same family as
`sa'. This is the address a client should sobind(9) if it
does not care the source address and, if applicable, the
port et cetera that it uses.


Revision tags: nick-csl-alignment-base5
# 1.119 06-Sep-2007 rmind

do_sys_sendmsg: Plug a possible leak.
From CID: 4535


# 1.118 01-Sep-2007 dsl

Don't error calls to copy socket addresses to userspace when the application
has provided a non-null buffer pointer and a zero length.


# 1.117 27-Aug-2007 dsl

ktrace socket control structures (ie msghdr, address etc) using ktrkuser().


# 1.116 15-Aug-2007 ad

branches: 1.116.2;
Changes to make ktrace LKM friendly and reduce ifdef KTRACE. Proposed
on tech-kern.


Revision tags: matt-mips64-base nick-csl-alignment-base
# 1.115 15-Jul-2007 dsl

branches: 1.115.2; 1.115.6;
Remove non-user flags (especially MSG_IOVUSRSPACE) from mp->msg_flags
before passing to so_receive.
This may (or may not) have any effect...


Revision tags: mjf-ufs-trans-base
# 1.114 01-Jul-2007 dsl

Check for SOL_SOCKET when checking for SCM_RIGHTS.


# 1.113 24-Jun-2007 dsl

Split sys_getpeername() and sys_getsockname() so they can be called when the
'name' is wanted in kernel code.
Similarly split sys_accept() and change the split in recvmsg() so that it
is useful to the compat functions, recvit() is removed and replaced by
do_sys_recvmsg().
Factor out the code that writes socked names to userspace (from mbuf) to
avoid replicated code.
Extract the code that writes socket 'control' (CMSG) data out to userspace,
being more careful about the 'fd' that may exist inside SCM_RIGHTS msgs.
(they still get lost if some of the latter copyout calls fail).
Since these are new functions, and old LKMs will fail to load.


# 1.112 02-Jun-2007 enami

- Fix obvious typos so that sendto(2) works.
- Wrap lines again.


# 1.111 01-Jun-2007 dsl

Split sys_bind() and sys_connect() so that compat code can use common code
once the 'address' has been copied into an mbuf.
Add extra flags for 'struct msghdr.msg_flags' to indicate that the address
and control are already in mbufs, and that the uio structure is in userspace
for sending data, rename sendit() to do_sys_sendmsg() to ensure no old code
passes in random flags.
Changes to compat code to use new functions - removing some stackgap use.
Fix a 'use after free' in compat_43_sys_recvmsg.
I ***THINK*** the code that converts 'cmsg' formatted data is borked!
svr4_stream.c ought to be generated from svr4_32_stream.c during the build.


Revision tags: yamt-idlelwp-base8
# 1.110 13-May-2007 dsl

Fallout from caddr_t deletion - remove a load of redundant (void *) casts.


# 1.109 18-Apr-2007 yamt

sys_accept: fix usecount botch and double soclose in rev.1.108.


# 1.108 15-Apr-2007 yamt

sys_accept: don't leak a socket on error.


Revision tags: thorpej-atomic-base
# 1.107 04-Mar-2007 christos

branches: 1.107.2; 1.107.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


Revision tags: ad-audiomp-base post-newlock2-merge
# 1.106 09-Feb-2007 ad

branches: 1.106.2;
Merge newlock2 to head.


Revision tags: newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base netbsd-4-base
# 1.105 01-Nov-2006 yamt

branches: 1.105.2;
remove some __unused from function parameters.


# 1.104 23-Oct-2006 elad

PR/34873: Ryo Shimizu: sendmsg() can cause kernel panic

Patch applied, thanks!


Revision tags: yamt-splraiseipl-base2
# 1.103 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


Revision tags: yamt-splraiseipl-base yamt-pdpolicy-base9 yamt-pdpolicy-base8 rpaulo-netinet-merge-pcb-base
# 1.102 22-Aug-2006 seanb

branches: 1.102.2; 1.102.4;
Don't leave a dangling socket (no associated struct file) if
user supplied a bad name or anamelen parameter to accept(2).
If bad paramaters were suplied and a copyout() failed, the
struct file was cleaned up but not the associated socket. This
could leave sockets in CLOSE_WAIT that could never be closed.


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base7
# 1.101 23-Jul-2006 ad

branches: 1.101.2;
Use the LWP cached credentials where sane.


# 1.100 26-Jun-2006 mrg

version the socket(2) syscall. for compat30 socket, we use
EPROTONOSUPPORT instead of EAFNOSUPPORT.

from pavel@ with a little bit of clean up from myself.

XXX: netbsd32 (and perhaps other emulations) should be able
XXX: to call the standard socket calls for this i think, but
XXX: revisit this at another time.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base yamt-pdpolicy-base5 chap-midi-base simonb-timecounters-base
# 1.99 16-May-2006 christos

branches: 1.99.4;
Don't set mature an fd that has been ffree'd


Revision tags: elad-kernelauth-base
# 1.98 11-May-2006 christos

Add MSG_NOSIGNAL (from FreeBSD)


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2 yamt-pdpolicy-base
# 1.97 01-Mar-2006 yamt

branches: 1.97.2; 1.97.4; 1.97.6;
merge yamt-uio_vmspace branch.

- use vmspace rather than proc or lwp where appropriate.
the latter is more natural to specify an address space.
(and less likely to be abused for random purposes.)
- fix a swdmover race.


Revision tags: yamt-uio_vmspace-base5
# 1.96 26-Dec-2005 perry

branches: 1.96.2; 1.96.4; 1.96.6;
u_intN_t -> uintN_t


# 1.95 11-Dec-2005 christos

merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base ktrace-lwp-base
# 1.94 03-Sep-2005 martin

In adjust_rights() Use CMSG_SPACE() to calculate the number of
filedescriptors passed in this message - the counterpart in
unp_externalize does this as well.

Note that CMSG_SPACE(0) does not make sense, since it does not invoke
the alignment magic - so use CMSG_SPACE(sizeof(int)) and adjust the
calculated total later.

This fixes the postfix conection cache for 64bit platforms. Previously
the number of passed filed descriptors (nfds) would have been
calculeted too high, causing the fdrelease() of uninitialized junk.


# 1.93 03-Sep-2005 martin

minor knf tweak


# 1.92 30-May-2005 martin

branches: 1.92.2;
Close additional file descriptors if we set MSG_CTRUNC in a SCM_RIGHTS
message. From der Mouse in PR kern/30370.


# 1.91 29-May-2005 christos

- add const.
- remove unnecessary casts.
- add __UNCONST casts and mark them with XXXUNCONST as necessary.


Revision tags: yamt-km-base4 yamt-km-base3 netbsd-3-base kent-audio2-base
# 1.90 26-Feb-2005 perry

branches: 1.90.2;
nuke trailing whitespace


Revision tags: yamt-km-base2 yamt-km-base kent-audio1-beforemerge kent-audio1-base
# 1.89 30-Nov-2004 christos

branches: 1.89.4; 1.89.6;
Cloning cleanup:
1. make fileops const
2. add 2 new negative errno's to `officially' support the cloning hack:
- EDUPFD (used to overload ENODEV)
- EMOVEFD (used to overload ENXIO)
3. Created an fdclone() function to encapsulate the operations needed for
EMOVEFD, and made all cloners use it.
4. Centralize the local noop/badop fileops functions to:
fnullop_fcntl, fnullop_poll, fnullop_kqfilter, fbadop_stat


# 1.88 22-May-2004 jonathan

Eliminate several uses of `curproc' from the socket-layer code and from NFS.

Add a new explicit `struct proc *p' argument to socreate(), sosend().
Use that argument instead of curproc. Follow-on changes to pass that
argument to socreate(), sosend(), and (*so->so_send)() calls.
These changes reviewed and independently recoded by Matt Thomas.

Changes to soreceive() and (*dom->dom_exernalize() from Matt Thomas:
pass soreceive()'s struct uio* uio->uio_procp to unp_externalize().
Eliminate curproc from unp_externalize. Also, now soreceive() uses
its uio->uio_procp value, pass that same value downward to
((pr->pru_usrreq)() calls for consistency, instead of (struct proc * )0.

Similar changes in sys/nfs to eliminate (most) uses of curproc,
either via the req-> r_procp field of a struct nfsreq *req argument,
or by passing down new explicit struct proc * arguments.

Reviewed by: Matt Thomas, posted to tech-kern.
NB: The (*pr->pru_usrreq)() change should be tested on more (all!) protocols.


# 1.87 18-May-2004 ragge

Fix connect() "bug": If connect() is interrupted by a signal, the connection
attempt is terminated, so if a process needs frequent timer interrupts
it can't ever connect() to a machine far away.

Bug found by Erik Lundgren, bugfix (for the same problem) is similar to
the way FreeBSD solved the same problem.

As a side effect, the new connect() behaviour conformes to Posix.


Revision tags: netbsd-2-0-base
# 1.86 29-Nov-2003 matt

branches: 1.86.2;
Restore a change that made AF_LOCAL sockets block on connect(2) until
accepted. However, this time this behavor is not the default. Instead
it must enabled by using the LOCAL_CONNWAIT socket option on either the
connecting or accepting socket.


# 1.85 29-Nov-2003 perry

Revert a change that altered the semantics of AF_LOCAL sockets. Sadly
this made us API incompatible with other Unixes.


# 1.84 13-Nov-2003 chs

eliminate uvm_useracc() in favor of checking the return value of
copyin() or copyout().

uvm_useracc() tells us whether the mapping permissions allow access to
the desired part of an address space, and many callers assume that
this is the same as knowing whether an attempt to access that part of
the address space will succeed. however, access to user space can
fail for reasons other than insufficient permission, most notably that
paging in any non-resident data can fail due to i/o errors. most of
the callers of uvm_useracc() make the above incorrect assumption. the
rest are all misguided optimizations, which optimize for the case
where an operation will fail. we'd rather optimize for operations
succeeding, in which case we should just attempt the access and handle
failures due to insufficient permissions the same way we handle i/o
errors. since there appear to be no good uses of uvm_useracc(), we'll
just remove it.


# 1.83 04-Sep-2003 matt

Adapt to the new calling conventions of unp_connect2


# 1.82 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.81 29-Jun-2003 fvdl

branches: 1.81.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.


# 1.80 28-Jun-2003 darrenr

Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V


# 1.79 05-Apr-2003 christos

PR/21030: Naoto Shimazaki: fcntl to accepted socket does not work properly


# 1.78 26-Feb-2003 matt

Remove leftover MBUFTRACE asserts.


# 1.77 26-Feb-2003 drochner

deactivate MBUFTRACE related KASSERT()s in the !MBUFTRACE case


# 1.76 26-Feb-2003 matt

Add MBUFTRACE kernel option.
Do a little mbuf rework while here. Change all uses of MGET*(*, M_WAIT, *)
to m_get*(M_WAIT, *). These are not performance critical and making them
call m_get saves considerable space. Add m_clget analogue of MCLGET and
make corresponding change for M_WAIT uses.
Modify netinet, gem, fxp, tulip, nfs to support MBUFTRACE.
Begin to change netstat to use sysctl.


# 1.75 18-Jan-2003 thorpej

Merge the nathanw_sa branch.


Revision tags: nathanw_sa_before_merge fvdl_fs64_base gmcgarry_ctxsw_base gmcgarry_ucred_base nathanw_sa_base
# 1.74 26-Nov-2002 christos

si_ -> sel_ to avoid conflicts with siginfo.


# 1.73 25-Nov-2002 itojun

no need for error check after MEXTMALLOC - jdolecek


# 1.72 25-Nov-2002 itojun

MEXTMALLOC() can fail even if M_WAITOK, if arg is too big for malloc().


Revision tags: kqueue-aftermerge
# 1.71 23-Oct-2002 jdolecek

merge kqueue branch into -current

kqueue provides a stateful and efficient event notification framework
currently supported events include socket, file, directory, fifo,
pipe, tty and device changes, and monitoring of processes and signals

kqueue is supported by all writable filesystems in NetBSD tree
(with exception of Coda) and all device drivers supporting poll(2)

based on work done by Jonathan Lemon for FreeBSD
initial NetBSD port done by Luke Mewburn and Jason Thorpe


Revision tags: kqueue-beforemerge kqueue-base
# 1.70 04-Sep-2002 matt

Use the queue macros from <sys/queue.h> instead of referring to the queue
members directly. Use *_FOREACH whenever possible.


Revision tags: gehenna-devsw-base
# 1.69 31-May-2002 itojun

support setsockopt() with larger data (up to MCLBYTES).
From: Hitoshi Asaeda <Hitoshi.Asaeda@sophia.inria.fr>


Revision tags: netbsd-1-6-PATCH002-RELEASE netbsd-1-6-PATCH002 netbsd-1-6-PATCH002-RC4 netbsd-1-6-PATCH002-RC3 netbsd-1-6-PATCH002-RC2 netbsd-1-6-PATCH002-RC1 netbsd-1-6-PATCH001 netbsd-1-6-PATCH001-RELEASE netbsd-1-6-PATCH001-RC3 netbsd-1-6-PATCH001-RC2 netbsd-1-6-PATCH001-RC1 netbsd-1-6-RELEASE netbsd-1-6-RC3 netbsd-1-6-RC2 netbsd-1-6-RC1 netbsd-1-6-base eeh-devprop-base newlock-base ifpoll-base
# 1.68 11-Feb-2002 jdolecek

branches: 1.68.8;
Switch default for pipes to the faster John S. Dyson's implementation.
Old, socketpair-based ones are available with option PIPE_SOCKETPAIR.


Revision tags: thorpej-mips-cache-base
# 1.67 12-Nov-2001 lukem

add RCSIDs


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2
# 1.66 16-Sep-2001 wiz

branches: 1.66.2;
Spell 'occurred' with two 'r's.


Revision tags: post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.65 17-Jul-2001 jdolecek

branches: 1.65.2;
Expel MSG_COMPAT/COMPAT_OLDSOCK, make the COMPAT_43 wrappers
arrange things as needed. Unfortunately, the check in sockargs()
have to stay, since 4.3BSD bind(2), connect(2) and sendto(2) were
not versioned at the time :(

This code was tested to pass regression tests.


# 1.64 01-Jul-2001 matt

branches: 1.64.2;
Use consistent types for len. Limit sockarg length to reasonable values.


# 1.63 25-Jun-2001 jdolecek

Back off the sendit()/recvit() change, some have problems with it


# 1.62 25-Jun-2001 jdolecek

sys_connect(): fix the call to FILE_UNUSE() so that it's done on return, rather
than immediatelly after getsock() call


# 1.61 25-Jun-2001 jdolecek

Add 'kernsa' parameter for sendit()/recvit(); if nonzero, msg->msg_name
is supposed to point directly to struct mbuf or struct sockaddr in kernel
space as appropriate, rather than being a pointer to memory in userland.

This is to be used by compat/* when emulation needs to wrap
send{to|msg}(2)/recv{from|msg}(2) and modify the passed struct
sockaddr.


# 1.60 16-Jun-2001 jdolecek

Add port of high performance pipe implementation written by John S. Dyson
for FreeBSD project. Besides huge speed boost compared with socketpair-based
pipes, this implementation also uses pagable kernel memory instead of mbufs.

Significant differences to FreeBSD version:
* uses uvm_loan() facility for direct write
* async/SIGIO handling correct also for sync writer, async reader
* limits settable via sysctl, amountpipekva and nbigpipes available via sysctl
* pipes are unidirectional - this is enforced on file descriptor level
for now only, the code would be updated to take advantage of it
eventually
* uses lockmgr(9)-based locks instead of home brew variant
* scatter-gather write is handled correctly for direct write case, data
is transferred by PIPE_DIRECT_CHUNK bytes maximum, to avoid running out of kva

All FreeBSD/NetBSD specific code is within appropriate #ifdef, in preparation
to feed changes back to FreeBSD tree.

This pipe implementation is optional for now, add 'options NEW_PIPE'
to your kernel config to use it.


# 1.59 14-Jun-2001 thorpej

Fix a partial construction problem that can cause race conditions
between creation of a file descriptor and close(2) when using kernel
assisted threads. What we do is stick descriptors in the table, but
mark them as "larval". This causes essentially everything to treat
it as a non-existent descriptor, except for fdalloc(), which sees a
filled slot so that it won't (incorrectly) allocate it again. When
a descriptor is fully constructed, the code that has constructed it
marks it as "mature" (which actually clears the "larval" flag), and
things continue to work as normal.

While here, gather all the code that gets a descriptor from the table
into a fd_getfile() function, and call it, rather than having the
same (sometimes incorrect) code copied all over the place.


# 1.58 06-May-2001 manu

implement the recently introduced EMUL_BSD_ASYNCIO_PIPE emulation package
flag.

EMUL_BSD_ASYNCIO_PIPE notes that the emulated binaries expect the original
BSD pipe behavior for asynchronous I/O, which is to fire SIGIO on read() and
write(). OSes without this flag do not expect any SIGIO to be fired on
read() and write() for pipes, even when async I/O was requested. As far as
we know, the OSes that need EMUL_BSD_ASYNCIO_PIPE are NetBSD, OSF/1 and
Darwin.


Revision tags: thorpej_scsipi_beforemerge thorpej_scsipi_nbase thorpej_scsipi_base
# 1.57 27-Feb-2001 lukem

branches: 1.57.2;
convert to ANSI KNF


# 1.56 10-Dec-2000 fvdl

Make sobind() take a struct proc *. It already took curproc and
passed it down to the appropriate usrreq function, and this
allows usage for contexts that need to be explicitly different
from curproc (like in the NFS code when binding to a reserved port).


# 1.55 24-Nov-2000 jdolecek

define COMPAT_OLDSOCK unconditionally - the code is used virtually for all
emulations besides NetBSD, and this way it's LKM-safe


# 1.54 02-Aug-2000 thorpej

MALLOC()/FREE() are not to be used for variable sized allocations.


# 1.53 27-Jun-2000 mrg

remove include of <vm/vm.h>


Revision tags: netbsd-1-5-RELEASE netbsd-1-5-BETA2 netbsd-1-5-BETA netbsd-1-5-ALPHA2 netbsd-1-5-base minoura-xpg4dl-base
# 1.52 27-May-2000 sommerfeld

branches: 1.52.4;
Reduce use of curproc in several places:

- Change ktrace interface to pass in the current process, rather than
p->p_tracep, since the various ktr* function need curproc anyway.

- Add curproc as a parameter to mi_switch() since all callers had it
handy anyway.

- Add a second proc argument for inferior() since callers all had
curproc handy.

Also, miscellaneous cleanups in ktrace:

- ktrace now always uses file-based, rather than vnode-based I/O
(simplifies, increases type safety); eliminate KTRFLAG_FD & KTRFAC_FD.
Do non-blocking I/O, and yield a finite number of times when receiving
EWOULDBLOCK before giving up.

- move code duplicated between sys_fktrace and sys_ktrace into ktrace_common.

- simplify interface to ktrwrite()


# 1.51 30-Mar-2000 augustss

Get rid of register declarations.


# 1.50 23-Mar-2000 thorpej

Implement fdremove() which is used in place of all the code that
did the "fdp->fd_ofiles[fd] = 0" assignment; fdremove() make sure
the fd_freefiles hints stay in sync.

From OpenBSD.


Revision tags: chs-ubc2-newbase wrstuden-devbsize-19991221 wrstuden-devbsize-base comdex-fall-1999-base fvdl-softdep-base
# 1.49 05-Nov-1999 mycroft

branches: 1.49.2;
Fix recent bug in sys_accept(): we must remove the file descriptor from the
file descriptor table before freeing the file description.


# 1.48 30-Oct-1999 enami

back out unnecessary stylistic changes in recent changes, to keep coding
style closer to NKF.


# 1.47 27-Oct-1999 jdolecek

minor cleanup of previous - avoid goto and code duplication


# 1.46 27-Oct-1999 darrenr

patch from Greg A. Woods to fix panic problems with code that attempts to
recover from failures to accept a socket successfully. Problem suggested
by this:
> It would appear (from two "panic: closef: count < 0" failures in less
> than 12 hours) that Darren's fix to accept(2) for lost file descriptors
> isn't quite correct. His fix inserts a call to closef() to handle one
> of several possible error conditions. However everywhere else in the
> socket code in the same file where falloc() cleanup is necessary the
> function used is ffree().


Revision tags: chs-ubc2-base
# 1.45 01-Jul-1999 itojun

branches: 1.45.2; 1.45.4; 1.45.6;
IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.


# 1.44 01-Jul-1999 darrenr

fix sys_accept() to return EOPNOTSUPP for protocols which don't support
listen/accept (PR_LISTEN flag in protosw) and detect obvious faults in
parameters passed. It is still possible for the address used for copying
the socket information to become invalid between that check and the copyout
so close the connection's allocated fd if the copyout fails so that we can
return EFAULT without allocating an fd and the application not knowing about
it. Ideally we'd be able to queue the connection back up so a later accept
could retrieve it but unfortunately that's not possible.


# 1.43 05-May-1999 thorpej

Add "use counting" to file entries. When closing a file, and it's reference
count is 0, wait for use count to drain before finishing the close.

This is necessary in order for multiple processes to safely share file
descriptor tables.


# 1.42 30-Apr-1999 cgd

add checks for COMPAT_OSF1 in the appropriate places


Revision tags: netbsd-1-4-RELEASE netbsd-1-4-base
# 1.41 10-Feb-1999 kleink

branches: 1.41.2; 1.41.4; 1.41.6;
* Due to addition and use of socklen_t, make the socket option and address
arguments passed to accept(), bind(), connect(), getpeername(), getsockname(),
getsockopt(), recvfrom(), sendto() and sendmsg() unsigned, which also elimiates
a few casts.
* Reflect the (now) signedness of msg_iovlen, which necessiates the addition
of a few casts.


# 1.40 18-Dec-1998 drochner

solve the COMPAT_OLDSOCK/MSG_COMPAT problem differently:
The source files which need MSG_COMPAT define COMPAT_OLDSOCK.


Revision tags: kenh-if-detach-base
# 1.39 26-Nov-1998 mycroft

Revert the functional change in rev 1.38; permit a msg_iovlen of 0.
There are two reasons for this:
* We should be able to pass file descriptors without sending any data.
* We could send zero-length iovecs anyway (but we shouldn't have to do this).
Also, msg_iovlen is already a u_int, so delete a bunch of casts.


Revision tags: chs-ubc-base
# 1.38 04-Aug-1998 kleink

Per XNS Issue 5, calling recvmsg(2) or sendmsg(2) with an msg.msg_iovlen less
than or equal to 0 shall fail with EMSGSIZE; the latter condition was not being
checked for. Also, document the msg.msg_iovlen > {IOV_MAX} case.


# 1.37 04-Aug-1998 kleink

UIO_MAXIOV -> IOV_MAX


# 1.36 04-Aug-1998 perry

Abolition of bcopy, ovbcopy, bcmp, and bzero, phase one.
bcopy(x, y, z) -> memcpy(y, x, z)
ovbcopy(x, y, z) -> memmove(y, x, z)
bcmp(x, y, z) -> memcmp(x, y, z)
bzero(x, y) -> memset(x, 0, y)


# 1.35 03-Aug-1998 kleink

Fix two off-by-one bugs, both present in each recvmsg(2) and sendmsg(2):
* the first one would cause an unnecessary malloc() of iovec storage for
a msg_iovlen of UIO_SMALLIOV although the required amount of memory has
been allocated on the stack.
* the second one would cause a recvmsg() or sendmsg() with a msg_iovlen of
UIO_MAXIOV to fail with EMSGSIZE, which is also a violation of XNS5.


# 1.34 31-Jul-1998 perry

fix sizeofs so they comply with the KNF style guide. yes, it is pedantic.


Revision tags: eeh-paddr_t-base
# 1.33 29-Jul-1998 thorpej

branches: 1.33.2;
When checking for overflow in the residual count, test against SSIZE_MAX.
The read/write system calls return ssize_t because -1 is used to indicate
error, therefore the transfer size MUST be limited to SSIZE_MAX, otherwise
garbage can be returned to the user.

There is NO change from existing behavior here, only a more precise
definition of that the semantics are, except in the Alpha case, where
the full SSIZE_MAX transfer size can now be realized (ssize_t is 64-bit
on the Alpha).


# 1.32 18-Jul-1998 lukem

use AF_LOCAL instead of AF_UNIX


# 1.31 25-Jun-1998 thorpej

defopt KTRACE


# 1.30 25-Apr-1998 matt

Hook for 0-copy (or other optimized) sends and receives


# 1.29 01-Mar-1998 fvdl

Merge with Lite2 + local changes


# 1.28 06-Feb-1998 thorpej

When copying out multiple control messages, ensure that the next control
message is aligned. From David Borman <dab@bsdi.com>.


# 1.27 07-Jan-1998 thorpej

Make insertion and removal of sockets from the partial and incoming
connections queues O(C) rather than O(N).


# 1.26 07-Jan-1998 thorpej

Fix bug in recvit() that would cause recvmsg() to only receive one
control message, even if there were multiple control messages on
the queue. From Jean-Luc Richier <Jean-Luc.Richier@imag.fr>, in
bug report kern/4700.


Revision tags: netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base thorpej-signal-base marc-pcmcia-bp marc-pcmcia-base
# 1.25 26-Jun-1997 thorpej

branches: 1.25.8;
Use UCHAR_MAX rather than "255" when sanity-checking the length of a
sockaddr in sockargs().


# 1.24 26-Jun-1997 thorpej

In sockargs():
- Add a comment describing my feelings about this interface, in general.
- Remove the COMPAT_OLDSOCK length hack. Instead, if the socket argument
is too long to fit in an mbuf, allocate enough external storage to
hold it.
- If the socket argument is a sockaddr, don't allow the length to be
greater than 255, as that would overflow sa_len.
Many thanks to enami tsugutomo <enami@cv.sony.co.jp> for his sanity checking.


Revision tags: is-newarp-before-merge is-newarp-base
# 1.23 22-Dec-1996 cgd

* catch up with system call argument type fixups/const poisoning.
* Fix arguments to various copyin()/copyout() invocations, to avoid
gratuitous casts.
* Some KNF formatting fixes
* Change sockargs()'s second argument to be a const void *, to help
with dealing with the syscall argument type fixups/const poisoning.


# 1.22 14-Jun-1996 cgd

avoid unnecessary checks of m_get/MGET/etc.'s return values. When
they're called with M_WAIT, they are defined to never return NULL.


# 1.21 22-May-1996 mycroft

Pass a proc pointer down to the usrreq and pcbbind functions for PRU_ATTACH, PRU_BIND and
PRU_CONTROL. The usrreq interface really needs to be split up, but this will have to wait.
Remove SS_PRIV completely.


Revision tags: netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.20 17-May-1996 pk

branches: 1.20.4;
Don't touch retval[] in socketpair(); manual page says this system call
return 0 on success (PR#2428).


# 1.19 09-Feb-1996 christos

More proto fixes


# 1.18 04-Feb-1996 christos

First pass at prototyping


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.17 10-Oct-1995 mycroft

Add hooks for COMPAT_FREEBSD, from Noriyuki Soda.


# 1.16 07-Oct-1995 mycroft

Prefix names of system call implementation functions with `sys_'.


# 1.15 19-Sep-1995 thorpej

Make system calls conform to a standard prototype and bring those
prototypes into scope.


# 1.14 12-Aug-1995 mycroft

splnet --> splsoftnet


# 1.13 24-Jun-1995 christos

Extracted all of the compat_xxx routines, and created a library [libcompat]
for them. There are a few #ifdef COMPAT_XX remaining, but they are not easy
or worth eliminating (yet).


# 1.12 10-May-1995 christos

tty_tb.c: need to include ioctl_compat.h in order to compile.
sysv_shm.c: make shm_find_segment_by_shmid global so it can be used by
COMPAT_HPUX. There should be a better way...
rest: Add #ifdef COMPAT_HPUX where needed


# 1.11 05-Mar-1995 fvdl

Extended a couple of defines with "|| defined(COMPAT_LINUX)" to make
things compile without requiring COMPAT_43 and/or COMPAT_09.


# 1.10 30-Oct-1994 cgd

be more careful with types, also pull in headers where necessary.


# 1.9 20-Oct-1994 cgd

update for new syscall args description mechanism


Revision tags: netbsd-1-0-PATCH06 netbsd-1-0-PATCH05 netbsd-1-0-PATCH04 netbsd-1-0-PATCH03 netbsd-1-0-PATCH02 netbsd-1-0-PATCH1 netbsd-1-0-PATCH0 netbsd-1-0-RELEASE netbsd-1-0-base
# 1.8 29-Jun-1994 cgd

New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.7 04-May-1994 mycroft

Add return types where missing. Simplify some of the compat conditionals.
Include compat code if COMPAT_SUNOS with or without COMPAT_43.


# 1.6 18-Dec-1993 mycroft

Canonicalize all #includes.


Revision tags: magnum-base netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.5 17-Jul-1993 mycroft

branches: 1.5.4;
Finish moving struct definitions outside of function declarations.


# 1.4 27-Jun-1993 andrew

* ansifications
* Yuval Yarom's socket recv(2) fixes - access rights problems (see also
uipc_socket.c).


# 1.3 22-May-1993 cgd

add include of select.h if necessary for protos, or delete if extraneous


# 1.2 18-May-1993 cgd

make kernel select interface be one-stop shopping & clean it all up.


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


Revision tags: nick-nhusb-base-20161204
# 1.184 03-Dec-2016 christos

Add missing ktrkuser


Revision tags: pgoyette-localcount-20161104 nick-nhusb-base-20161004 localcount-20160914
# 1.183 13-Sep-2016 martin

Make the ktrace record written by do_sys_sendmsg/do_sys_recvmsg overridable
by the caller. Use this in compat_netbsd32 to log the 32bit version, so
the 32bit userland kdump is happy.


Revision tags: pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.182 07-Jul-2016 msaitoh

KNF. Remove extra spaces. No functional change.


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.181 01-Nov-2015 christos

Don't overwrite the user iov pointer in sendmmsg. Make the send and receive
code look the same.


Revision tags: nick-nhusb-base-20150921
# 1.180 24-Aug-2015 pooka

sprinkle _KERNEL_OPT


# 1.179 22-Jul-2015 maxv

Memory leak. Triggerable from an unprivileged user via COMPAT_43.


Revision tags: nick-nhusb-base-20150606
# 1.178 09-May-2015 rtr

change sosend() to accept sockaddr * instead of mbuf * for nam.

bump to 7.99.16


# 1.177 02-May-2015 rtr

make connect syscall use sockaddr_big and modify pr_{send,connect}
nam parameter type from buf * to sockaddr *.

final commit for parameter type changes to protocol user requests

* bump kernel version to 7.99.15 for parameter type changes to pr_{send,connect}


# 1.176 24-Apr-2015 rtr

make accept, getsockname and getpeername syscalls use sockaddr_big and modify
pr_{accept,sockname,peername} nam parameter type from mbuf * to sockaddr *.

* retained use of mbuftypes[MT_SONAME] for now.
* bump to netbsd version 7.99.12 for parameter type change.

patch posted to tech-net@ 2015/04/19


Revision tags: nick-nhusb-base-20150406
# 1.175 03-Apr-2015 rtr

* change pr_bind to accept struct sockaddr * instead of struct mbuf *
* update protocol bind implementations to use/expect sockaddr *
instead of mbuf *
* introduce sockaddr_big struct for storage of addr data passed via
sys_bind; sockaddr_big is of sufficient size and alignment to
accommodate all addr data sizes received.
* modify sys_bind to allocate sockaddr_big instead of using an mbuf.
* bump kernel version to 7.99.9 for change to pr_bind() parameter type.

Patch posted to tech-net@
http://mail-index.netbsd.org/tech-net/2015/03/15/msg005004.html

The choice to use a new structure sockaddr_big has been retained since
changing sockaddr_storage size would lead to unnecessary ABI change. The
use of the new structure does not preclude future work that increases
the size of sockaddr_storage and at that time sockaddr_big may be
trivially replaced.

Tested by mrg@ and myself, discussed with rmind@, posted to tech-net@


# 1.174 06-Mar-2015 rtr

Return EINVAL if namelen isn't large enough to encompass the expected
members of sockaddr structures. i.e. sa_len and sa_family.

Discussed with and patch by christos@


Revision tags: nick-nhusb-base
# 1.173 05-Sep-2014 matt

branches: 1.173.2;
Try not to use f_data, use f_{vnode,socket,pipe,mqueue,kqueue,ksem} to get
a correctly typed pointer.


Revision tags: netbsd-7-0-RC2 netbsd-7-0-RC1 netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.172 09-Aug-2014 rtr

branches: 1.172.2;
split PRU_CONNECT2 & PRU_PURGEIF function out of pr_generic() usrreq
switches and put into separate functions

- always KASSERT(solocked(so)) even if not implemented
(for PRU_CONNECT2 only)

- replace calls to pr_generic() with req = PRU_CONNECT2 with calls to
pr_connect2()

- replace calls to pr_generic() with req = PRU_PURGEIF with calls to
pr_purgeif()

put common code from unp_connect2() (used by unp_connect() into
unp_connect1() and call out to it when needed

patch only briefly reviewed by rmind@


# 1.171 09-Jul-2014 rtr

* split PRU_PEERADDR and PRU_SOCKADDR function out of pr_generic()
usrreq switches and put into separate functions
xxx_{peer,sock}addr(struct socket *, struct mbuf *).

- KASSERT(solocked(so)) always in new functions even if request
is not implemented

- KASSERT(pcb != NULL) and KASSERT(nam) if the request is
implemented and not for tcp.

* for tcp roll #ifdef KPROF and #ifdef DEBUG code from tcp_usrreq() into
easier to cut & paste functions tcp_debug_capture() and
tcp_debug_trace()

- functions provided by rmind
- remaining use of PRU_{PEER,SOCK}ADDR #define to be removed in a
future commit.

* rename netbt functions to permit consistency of pru function names
(as has been done with other requests already split out).

- l2cap_{peer,sock}addr() -> l2cap_{peer,sock}_addr_pcb()
- rfcomm_{peer,sock}addr() -> rfcomm_{peer,sock}_addr_pcb()
- sco_{peer,sock}addr() -> sco_{peer,sock}_addr_pcb()

* split/refactor do_sys_getsockname(lwp, fd, which, nam) into
two functions do_sys_get{peer,sock}name(fd, nam).

- move PRU_PEERADDR handling into do_sys_getpeername() from
do_sys_getsockname()
- have svr4_stream directly call do_sys_get{sock,peer}name()
respectively instead of providing `which' & fix a DPRINTF string
that incorrectly wrote "getpeername" when it meant "getsockname"
- fix sys_getpeername() and sys_getsockname() to call
do_sys_get{sock,peer}name() without `which' and `lwp' & adjust
comments
- bump kernel version for removal of lwp & which parameters from
do_sys_getsockname()

note: future cleanup to remove struct mbuf * abuse in
xxx_{peer,sock}name()
still to come, not done in this commit since it is easier to do post
split.

patch reviewed by rmind

welcome to 6.99.47


Revision tags: rmind-smpnet-nbase rmind-smpnet-base
# 1.170 18-May-2014 rmind

Add struct pr_usrreqs with a pr_generic function and prepare for the
dismantling of pr_usrreq in the protocols; no functional change intended.
PRU_ATTACH/PRU_DETACH changes will follow soon.

Bump for struct protosw. Welcome to 6.99.62!


# 1.169 17-May-2014 rmind

- fsocreate: set SS_NBIO before the file descriptor is affixed as there is
a theoretical race condition (hard to trigger, though); remove the LWP
parameter and clean up the code a little.
- Sprinkle few comments.
- Remove M_SOOPTS while here.


# 1.168 17-May-2014 rmind

makesocket: set SS_NBIO slightly earlier.


# 1.167 17-May-2014 rmind

Remove trailing whitespaces, wrap long lines, minor KNF; no functional changes.


Revision tags: yamt-pagecache-base9
# 1.166 07-Apr-2014 seanb

Fix a case where an erroneous EAGAIN was returned out of recvmmsg.
This occured when some, but not all of the mmsg array members
were filled with data from a non-blocking socket.
PR kern/48725


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.165 09-Oct-2013 christos

branches: 1.165.2;
delete extra m_len initialization.


# 1.164 09-Oct-2013 christos

PR/48292: Justin Cormack: paccept creates sockets that cannot be made blocking
Reset the socket flags not just the file flags for non-blocking I/O.
XXX: pullup 6


# 1.163 08-Oct-2013 christos

PR/47591: Michael Plass: If the unix socket is closed before accept,
unp->unp_conn will be NULL in PRU_ACCEPT, as called from
sys_accept->so_accept. This will cause the usrreq to return with
no error, leaving the mbuf gotten from m_get() with an uninitialized
length, containing junk from a previous call. Initialize m_len to
be 0 to handle this case. This is yet another reason why Beverly's
idea of setting m_len = 0 in m_get() makes a lot of sense. Arguably
this could be an error, since the data we return now has 0 family
and length.


# 1.162 29-Aug-2013 rmind

Remove SS_ISCONFIRMING, it is unused and TP4 will not come back.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.161 03-Jun-2013 christos

branches: 1.161.2;
use the proper name for kdump pretty-printing.


Revision tags: agc-symver-base
# 1.160 14-Feb-2013 christos

PR/47569: Valery Ushakov: SOCK_NONBLOCK does not work because it does not
set SS_NBIO.
XXX: there are too many flags that mean the same thing in too many places,
and too many flags that mean the same thing and are different.


# 1.159 14-Feb-2013 riastradh

Fix some screw cases in cmsg file descriptor passing.

- Don't leave garbage in the control buffer if allocating file
descriptors fails in unp_externalize.

- Scrub the space between CMSG_LEN and CMSG_SPACE to avoid kernel
memory disclosure in unp_externalize.

- Don't read past cmsg_len when closing file descriptors that
couldn't get delivered, in free_rights.

ok christos


Revision tags: yamt-pagecache-base8
# 1.158 29-Dec-2012 mlelstv

The sanity check prevented messages that carry only ancillary data.


# 1.157 29-Dec-2012 mlelstv

If an untraced process sleeps in recvmsg/sendmsg, the syscall does not
allocate an iov structure for ktrace. When tracing is then enabled
and the process wakes up, it crashes the kernel.

Undo the last commit which introduced this error path.

Avoid the mentioned kmem_alloc assertion by adding a sanity check analog
to similar code in sys_generic.c for I/O on file handles instead of
sockets.

This also causes the syscall to return EMSGSIZE if the msg_iovlen member
of the msg structure is less than or equal to 0, as defined in
recvmsg(2)/sendmsg(2).


Revision tags: yamt-pagecache-base7 yamt-pagecache-base6
# 1.156 17-Jul-2012 njoly

branches: 1.156.2;
Avoid kmem_alloc KASSERT for 0 byte allocation, when tracing processes
that use empty messages with sendmsg/recvmsg.


# 1.155 22-Jun-2012 christos

Add {send,recv}mmsg from Linux


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2 netbsd-6-base
# 1.154 25-Jan-2012 christos

branches: 1.154.2;
revert atomics for so_options since it is a short.


# 1.153 25-Jan-2012 christos

need <sys/atomic.h>


# 1.152 25-Jan-2012 christos

Add locking, requested by yamt. Note that locking is not used everywhere
for these.


# 1.151 25-Jan-2012 christos

As discussed in tech-kern, provide the means to prevent delivery of SIGPIPE
on EPIPE for all file descriptor types:

- provide O_NOSIGPIPE for open,kqueue1,pipe2,dup3,fcntl(F_{G,S}ETFL) [NetBSD]
- provide SOCK_NOSIGPIPE for socket,socketpair [NetBSD]
- provide SO_NOSIGPIPE for {g,s}seckopt [NetBSD/FreeBSD/MacOSX]
- provide F_{G,S}ETNOSIGPIPE for fcntl [MacOSX]


# 1.150 21-Dec-2011 christos

simplify expression


# 1.149 20-Dec-2011 christos

- Eliminate so_nbio and turn it into a bit SS_NBIO in so_state.
- Introduce MSG_NBIO so that we can turn non blocking i/o on a per call basis
- Use MSG_NBIO to fix the XXX: multi-threaded issues on the fifo sockets.
- Don't set SO_CANTRCVMORE, if we were interrupted (perhaps do it for all
errors?).


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2
# 1.148 04-Nov-2011 christos

branches: 1.148.4;
Fix error I introduced in previous commit that caused asymmetric connects
when SOCK_NONBLOCK or SOCK_CLOEXEC was specified. Factor out common code
and simplify error return.


Revision tags: yamt-pagecache-base
# 1.147 21-Sep-2011 christos

branches: 1.147.2;
Put the mbuf type in the ktrace record so that we know how to decode it
in userland.


# 1.146 27-Jul-2011 uebayasi

These don't need uvm/uvm_extern.h.


# 1.145 15-Jul-2011 christos

fail with EINVAL if flags not are not O_CLOEXEC|O_NONBLOCK in pipe2(2) and
dup3(2)


# 1.144 26-Jun-2011 christos

* Arrange for interfaces that create new file descriptors to be able to
set close-on-exec on creation (http://udrepper.livejournal.com/20407.html).

- Add F_DUPFD_CLOEXEC to fcntl(2).
- Add MSG_CMSG_CLOEXEC to recvmsg(2) for unix file descriptor passing.
- Add dup3(2) syscall with a flags argument for O_CLOEXEC, O_NONBLOCK.
- Add pipe2(2) syscall with a flags argument for O_CLOEXEC, O_NONBLOCK.
- Add flags SOCK_CLOEXEC, SOCK_NONBLOCK to the socket type parameter
for socket(2) and socketpair(2).
- Add new paccept(2) syscall that takes an additional sigset_t to alter
the sigmask temporarily and a flags argument to set SOCK_CLOEXEC,
SOCK_NONBLOCK.
- Add new mode character 'e' to fopen(3) and popen(3) to open pipes
and file descriptors for close on exec.
- Add new kqueue1(2) syscall with a new flags argument to open the
kqueue file descriptor with O_CLOEXEC, O_NONBLOCK.

* Fix the system calls that take socklen_t arguments to actually do so.

* Don't include userland header files (signal.h) from system header files
(rump_syscallargs.h).

* Bump libc version for the new syscalls.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.143 24-Apr-2011 rmind

- Replace few malloc(9) uses with kmem(9).
- Rename buf_malloc() to buf_alloc(), fix comments.
- Remove some unnecessary inclusions.


# 1.142 10-Apr-2011 christos

- Add O_CLOEXEC to open(2)
- Add fd_set_exclose() to encapsulate uses of FIO{,N}CLEX, O_CLOEXEC, F{G,S}ETFD
- Add a pipe1() function to allow passing flags to the fd's that pipe(2)
opens to ease implementation of linux pipe2(2)
- Factor out fp handling code from open(2) and fhopen(2)


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base matt-mips64-premerge-20101231 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10 uebayasi-xip-base1
# 1.141 23-Apr-2010 rmind

branches: 1.141.2;
Replace M_IOV and some malloc(9)s with kmem(9), and while there:
- Fix invalid free (M_TEMP vs M_IOV) in do_sys_recvmsg(), spotted by jakllsch@.
Also, same fix in osf1_sys_sendmsg_xopen().
- Fix attempt to free non-allocated memory in error path in netbsd32___getfh30().
- Plug a memory leak in compat_43_netbsd32_orecvmsg().


Revision tags: yamt-nfs-mp-base9 uebayasi-xip-base
# 1.140 21-Jan-2010 pgoyette

branches: 1.140.2; 1.140.4;
Remove unnecessary call to kauth_cred_free().

This resolves an occassional crash I'd been experiencing as reported on
current-users@

Fix suggested by and OK elad@


# 1.139 29-Dec-2009 elad

Add credentials to to sockets.

We don't need any deferred free etc. because we no longer free the
credentials in interrupt context.

Tons of help from matt@, thanks!


# 1.138 20-Dec-2009 dsl

If a multithreaded app closes an fd while another thread is blocked in
read/write/accept, then the expectation is that the blocked thread will
exit and the close complete.
Since only one fd is affected, but many fd can refer to the same file,
the close code can only request the fs code unblock with ERESTART.
Fixed for pipes and sockets, ERESTART will only be generated after such
a close - so there should be no change for other programs.
Also rename fo_abort() to fo_restart() (this used to be fo_drain()).
Fixes PR/26567


Revision tags: matt-premerge-20091211
# 1.137 09-Dec-2009 dsl

Rename fo_drain() to fo_abort(), 'drain' is used to mean 'wait for output
do drain' in many places, whereas fo_drain() was called in order to force
blocking read()/write() etc calls to return to userspace so that a close()
call from a different thread can complete.
In the sockets code comment out the broken code in the inner function,
it was being called from compat code.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 jym-xensuspend-nbase yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.136 04-Apr-2009 ad

Add fileops::fo_drain(), to be called from fd_close() when there is more
than one active reference to a file descriptor. It should dislodge threads
sleeping while holding a reference to the descriptor. Implemented only for
sockets but should be extended to pipes, fifos, etc.

Fixes the case of a multithreaded process doing something like the
following, which would have hung until the process got a signal.

thr0 accept(fd, ...)
thr1 close(fd)


Revision tags: nick-hppapmap-base2
# 1.135 21-Jan-2009 yamt

branches: 1.135.2;
restore the pre socket locking patch signal behaviour.
this fixes a busy-loop in nfs_connect.


Revision tags: netbsd-5-0-RC1 haad-dm-base2 haad-nbase2 ad-audiomp2-base netbsd-5-base matt-mips64-base2 haad-dm-base1 wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 haad-dm-base mjf-devfs2-base
# 1.134 06-Aug-2008 plunky

branches: 1.134.2; 1.134.4;
Convert socket options code to use a sockopt structure
instead of laying everything into an mbuf.

approved by core


Revision tags: simonb-wapbl-nbase simonb-wapbl-base
# 1.133 24-Jun-2008 ad

branches: 1.133.2;
Nothing uses getsock/getvnode any more.


Revision tags: wrstuden-revivesa-base-1 yamt-pf42-base4 yamt-pf42-base3 wrstuden-revivesa-base
# 1.132 30-May-2008 rmind

branches: 1.132.2;
do_sys_accept: release the reference to sock in few error paths.
Should fix PR/38790, report and test-case by Nicolas Joly.


Revision tags: hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2
# 1.131 28-Apr-2008 martin

branches: 1.131.2;
Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.130 24-Apr-2008 ad

branches: 1.130.2;
Network protocol interrupts can now block on locks, so merge the globals
proclist_mutex and proclist_lock into a single adaptive mutex (proc_lock).
Implications:

- Inspecting process state requires thread context, so signals can no longer
be sent from a hardware interrupt handler. Signal activity must be
deferred to a soft interrupt or kthread.

- As the proc state locking is simplified, it's now safe to take exit()
and wait() out from under kernel_lock.

- The system spends less time at IPL_SCHED, and there is less lock activity.


# 1.129 24-Apr-2008 ad

Merge the socket locking patch:

- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.

With much feedback from matt@ and plunky@.


Revision tags: yamt-pf42-baseX yamt-pf42-base ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14
# 1.128 21-Mar-2008 ad

branches: 1.128.2;
Catch up with descriptor handling changes. See kern_descrip.c revision
1.173 for details.


Revision tags: keiichi-mipv6-nbase nick-net80211-sync-base keiichi-mipv6-base matt-armv6-nbase mjf-devfs-base hpcarm-cleanup-base
# 1.127 06-Feb-2008 ad

branches: 1.127.6;
Don't lock the socket to set/clear FNONBLOCK. Just set it atomically.


Revision tags: vmlocking2-base3 bouyer-xeni386-nbase bouyer-xeni386-base matt-armv6-base
# 1.126 26-Dec-2007 ad

Merge more changes from vmlocking2, mainly:

- Locking improvements.
- Use pool_cache for more items.


# 1.125 20-Dec-2007 dsl

Convert all the system call entry points from:
int foo(struct lwp *l, void *v, register_t *retval)
to:
int foo(struct lwp *l, const struct foo_args *uap, register_t *retval)
Fixup compat code to not write into 'uap' and (in some cases) to actually
pass a correctly formatted 'uap' structure with the right name to the
next routine.
A few 'compat' routines that just call standard ones have been deleted.
All the 'compat' code compiles (along with the kernels required to test
build it).
98% done by automated scripts.


# 1.124 16-Dec-2007 elad

Make solisten() take an lwp pointer like the rest, so it can be passed down
to pr_usrreq.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 vmlocking-nbase jmcneill-pm-base reinoud-bufcleanup-base
# 1.123 24-Nov-2007 dyoung

branches: 1.123.2; 1.123.6;
Pass the mbuf type (e.g., MT_SONAME, MT_SOOPTS) as the second
argument to getsockmbuf().


Revision tags: jmcneill-base bouyer-xenamd64-base2 yamt-x86pmap-base4 bouyer-xenamd64-base yamt-x86pmap-base3 yamt-x86pmap-base2 vmlocking-base
# 1.122 05-Oct-2007 dyoung

branches: 1.122.4;
Use getsombuf().


Revision tags: yamt-x86pmap-base
# 1.121 19-Sep-2007 christos

branches: 1.121.2;
minor nits; no code change.


# 1.120 19-Sep-2007 dyoung

1) Introduce a new socket option, (SOL_SOCKET, SO_NOHEADER), that
tells a socket that it should both add a protocol header to tx'd
datagrams and remove the header from rx'd datagrams:

int onoff = 1, s = socket(...);
setsockopt(s, SOL_SOCKET, SO_NOHEADER, &onoff);

2) Add an implementation of (SOL_SOCKET, SO_NOHEADER) for raw IPv4
sockets.

3) Reorganize the protocols' pr_ctloutput implementations a bit.
Consistently return ENOPROTOOPT when an option is unsupported,
and EINVAL if a supported option's arguments are incorrect.
Reorganize the flow of code so that it's more clear how/when
options are passed down the stack until they are handled.

Shorten some pr_ctloutput staircases for readability.

4) Extract common mbuf code into subroutines, add new sockaddr
methods, and introduce a new subroutine, fsocreate(), for reuse
later; use it first in sys_socket():

struct mbuf *m_getsombuf(struct socket *so)

Create an mbuf and make its owner the socket `so'.

struct mbuf *m_intopt(struct socket *so, int val)

Create an mbuf, make its owner the socket `so', put the
int `val' into it, and set its length to sizeof(int).


int fsocreate(..., int *fd)

Create a socket, a la socreate(9), put the socket into the
given LWP's descriptor table, return the descriptor at `fd'
on success.

void *sockaddr_addr(struct sockaddr *sa, socklen_t *slenp)
const void *sockaddr_const_addr(const struct sockaddr *sa, socklen_t *slenp)

Extract a pointer to the address part of a sockaddr. Write
the length of the address part at `slenp', if `slenp' is
not NULL.

socklen_t sockaddr_getlen(const struct sockaddr *sa)

Return the length of a sockaddr. This just evaluates to
sa->sa_len. I only add this for consistency with code that
appears in a portable userland library that I am going to
import.

const struct sockaddr *sockaddr_any(const struct sockaddr *sa)

Return the "don't care" sockaddr in the same family as
`sa'. This is the address a client should sobind(9) if it
does not care the source address and, if applicable, the
port et cetera that it uses.

const void *sockaddr_anyaddr(const struct sockaddr *sa, socklen_t *slenp)

Return the "don't care" sockaddr in the same family as
`sa'. This is the address a client should sobind(9) if it
does not care the source address and, if applicable, the
port et cetera that it uses.


Revision tags: nick-csl-alignment-base5
# 1.119 06-Sep-2007 rmind

do_sys_sendmsg: Plug a possible leak.
From CID: 4535


# 1.118 01-Sep-2007 dsl

Don't error calls to copy socket addresses to userspace when the application
has provided a non-null buffer pointer and a zero length.


# 1.117 27-Aug-2007 dsl

ktrace socket control structures (ie msghdr, address etc) using ktrkuser().


# 1.116 15-Aug-2007 ad

branches: 1.116.2;
Changes to make ktrace LKM friendly and reduce ifdef KTRACE. Proposed
on tech-kern.


Revision tags: matt-mips64-base nick-csl-alignment-base
# 1.115 15-Jul-2007 dsl

branches: 1.115.2; 1.115.6;
Remove non-user flags (especially MSG_IOVUSRSPACE) from mp->msg_flags
before passing to so_receive.
This may (or may not) have any effect...


Revision tags: mjf-ufs-trans-base
# 1.114 01-Jul-2007 dsl

Check for SOL_SOCKET when checking for SCM_RIGHTS.


# 1.113 24-Jun-2007 dsl

Split sys_getpeername() and sys_getsockname() so they can be called when the
'name' is wanted in kernel code.
Similarly split sys_accept() and change the split in recvmsg() so that it
is useful to the compat functions, recvit() is removed and replaced by
do_sys_recvmsg().
Factor out the code that writes socked names to userspace (from mbuf) to
avoid replicated code.
Extract the code that writes socket 'control' (CMSG) data out to userspace,
being more careful about the 'fd' that may exist inside SCM_RIGHTS msgs.
(they still get lost if some of the latter copyout calls fail).
Since these are new functions, and old LKMs will fail to load.


# 1.112 02-Jun-2007 enami

- Fix obvious typos so that sendto(2) works.
- Wrap lines again.


# 1.111 01-Jun-2007 dsl

Split sys_bind() and sys_connect() so that compat code can use common code
once the 'address' has been copied into an mbuf.
Add extra flags for 'struct msghdr.msg_flags' to indicate that the address
and control are already in mbufs, and that the uio structure is in userspace
for sending data, rename sendit() to do_sys_sendmsg() to ensure no old code
passes in random flags.
Changes to compat code to use new functions - removing some stackgap use.
Fix a 'use after free' in compat_43_sys_recvmsg.
I ***THINK*** the code that converts 'cmsg' formatted data is borked!
svr4_stream.c ought to be generated from svr4_32_stream.c during the build.


Revision tags: yamt-idlelwp-base8
# 1.110 13-May-2007 dsl

Fallout from caddr_t deletion - remove a load of redundant (void *) casts.


# 1.109 18-Apr-2007 yamt

sys_accept: fix usecount botch and double soclose in rev.1.108.


# 1.108 15-Apr-2007 yamt

sys_accept: don't leak a socket on error.


Revision tags: thorpej-atomic-base
# 1.107 04-Mar-2007 christos

branches: 1.107.2; 1.107.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


Revision tags: ad-audiomp-base post-newlock2-merge
# 1.106 09-Feb-2007 ad

branches: 1.106.2;
Merge newlock2 to head.


Revision tags: newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base netbsd-4-base
# 1.105 01-Nov-2006 yamt

branches: 1.105.2;
remove some __unused from function parameters.


# 1.104 23-Oct-2006 elad

PR/34873: Ryo Shimizu: sendmsg() can cause kernel panic

Patch applied, thanks!


Revision tags: yamt-splraiseipl-base2
# 1.103 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


Revision tags: yamt-splraiseipl-base yamt-pdpolicy-base9 yamt-pdpolicy-base8 rpaulo-netinet-merge-pcb-base
# 1.102 22-Aug-2006 seanb

branches: 1.102.2; 1.102.4;
Don't leave a dangling socket (no associated struct file) if
user supplied a bad name or anamelen parameter to accept(2).
If bad paramaters were suplied and a copyout() failed, the
struct file was cleaned up but not the associated socket. This
could leave sockets in CLOSE_WAIT that could never be closed.


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base7
# 1.101 23-Jul-2006 ad

branches: 1.101.2;
Use the LWP cached credentials where sane.


# 1.100 26-Jun-2006 mrg

version the socket(2) syscall. for compat30 socket, we use
EPROTONOSUPPORT instead of EAFNOSUPPORT.

from pavel@ with a little bit of clean up from myself.

XXX: netbsd32 (and perhaps other emulations) should be able
XXX: to call the standard socket calls for this i think, but
XXX: revisit this at another time.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base yamt-pdpolicy-base5 chap-midi-base simonb-timecounters-base
# 1.99 16-May-2006 christos

branches: 1.99.4;
Don't set mature an fd that has been ffree'd


Revision tags: elad-kernelauth-base
# 1.98 11-May-2006 christos

Add MSG_NOSIGNAL (from FreeBSD)


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2 yamt-pdpolicy-base
# 1.97 01-Mar-2006 yamt

branches: 1.97.2; 1.97.4; 1.97.6;
merge yamt-uio_vmspace branch.

- use vmspace rather than proc or lwp where appropriate.
the latter is more natural to specify an address space.
(and less likely to be abused for random purposes.)
- fix a swdmover race.


Revision tags: yamt-uio_vmspace-base5
# 1.96 26-Dec-2005 perry

branches: 1.96.2; 1.96.4; 1.96.6;
u_intN_t -> uintN_t


# 1.95 11-Dec-2005 christos

merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base ktrace-lwp-base
# 1.94 03-Sep-2005 martin

In adjust_rights() Use CMSG_SPACE() to calculate the number of
filedescriptors passed in this message - the counterpart in
unp_externalize does this as well.

Note that CMSG_SPACE(0) does not make sense, since it does not invoke
the alignment magic - so use CMSG_SPACE(sizeof(int)) and adjust the
calculated total later.

This fixes the postfix conection cache for 64bit platforms. Previously
the number of passed filed descriptors (nfds) would have been
calculeted too high, causing the fdrelease() of uninitialized junk.


# 1.93 03-Sep-2005 martin

minor knf tweak


# 1.92 30-May-2005 martin

branches: 1.92.2;
Close additional file descriptors if we set MSG_CTRUNC in a SCM_RIGHTS
message. From der Mouse in PR kern/30370.


# 1.91 29-May-2005 christos

- add const.
- remove unnecessary casts.
- add __UNCONST casts and mark them with XXXUNCONST as necessary.


Revision tags: yamt-km-base4 yamt-km-base3 netbsd-3-base kent-audio2-base
# 1.90 26-Feb-2005 perry

branches: 1.90.2;
nuke trailing whitespace


Revision tags: yamt-km-base2 yamt-km-base kent-audio1-beforemerge kent-audio1-base
# 1.89 30-Nov-2004 christos

branches: 1.89.4; 1.89.6;
Cloning cleanup:
1. make fileops const
2. add 2 new negative errno's to `officially' support the cloning hack:
- EDUPFD (used to overload ENODEV)
- EMOVEFD (used to overload ENXIO)
3. Created an fdclone() function to encapsulate the operations needed for
EMOVEFD, and made all cloners use it.
4. Centralize the local noop/badop fileops functions to:
fnullop_fcntl, fnullop_poll, fnullop_kqfilter, fbadop_stat


# 1.88 22-May-2004 jonathan

Eliminate several uses of `curproc' from the socket-layer code and from NFS.

Add a new explicit `struct proc *p' argument to socreate(), sosend().
Use that argument instead of curproc. Follow-on changes to pass that
argument to socreate(), sosend(), and (*so->so_send)() calls.
These changes reviewed and independently recoded by Matt Thomas.

Changes to soreceive() and (*dom->dom_exernalize() from Matt Thomas:
pass soreceive()'s struct uio* uio->uio_procp to unp_externalize().
Eliminate curproc from unp_externalize. Also, now soreceive() uses
its uio->uio_procp value, pass that same value downward to
((pr->pru_usrreq)() calls for consistency, instead of (struct proc * )0.

Similar changes in sys/nfs to eliminate (most) uses of curproc,
either via the req-> r_procp field of a struct nfsreq *req argument,
or by passing down new explicit struct proc * arguments.

Reviewed by: Matt Thomas, posted to tech-kern.
NB: The (*pr->pru_usrreq)() change should be tested on more (all!) protocols.


# 1.87 18-May-2004 ragge

Fix connect() "bug": If connect() is interrupted by a signal, the connection
attempt is terminated, so if a process needs frequent timer interrupts
it can't ever connect() to a machine far away.

Bug found by Erik Lundgren, bugfix (for the same problem) is similar to
the way FreeBSD solved the same problem.

As a side effect, the new connect() behaviour conformes to Posix.


Revision tags: netbsd-2-0-base
# 1.86 29-Nov-2003 matt

branches: 1.86.2;
Restore a change that made AF_LOCAL sockets block on connect(2) until
accepted. However, this time this behavor is not the default. Instead
it must enabled by using the LOCAL_CONNWAIT socket option on either the
connecting or accepting socket.


# 1.85 29-Nov-2003 perry

Revert a change that altered the semantics of AF_LOCAL sockets. Sadly
this made us API incompatible with other Unixes.


# 1.84 13-Nov-2003 chs

eliminate uvm_useracc() in favor of checking the return value of
copyin() or copyout().

uvm_useracc() tells us whether the mapping permissions allow access to
the desired part of an address space, and many callers assume that
this is the same as knowing whether an attempt to access that part of
the address space will succeed. however, access to user space can
fail for reasons other than insufficient permission, most notably that
paging in any non-resident data can fail due to i/o errors. most of
the callers of uvm_useracc() make the above incorrect assumption. the
rest are all misguided optimizations, which optimize for the case
where an operation will fail. we'd rather optimize for operations
succeeding, in which case we should just attempt the access and handle
failures due to insufficient permissions the same way we handle i/o
errors. since there appear to be no good uses of uvm_useracc(), we'll
just remove it.


# 1.83 04-Sep-2003 matt

Adapt to the new calling conventions of unp_connect2


# 1.82 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.81 29-Jun-2003 fvdl

branches: 1.81.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.


# 1.80 28-Jun-2003 darrenr

Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V


# 1.79 05-Apr-2003 christos

PR/21030: Naoto Shimazaki: fcntl to accepted socket does not work properly


# 1.78 26-Feb-2003 matt

Remove leftover MBUFTRACE asserts.


# 1.77 26-Feb-2003 drochner

deactivate MBUFTRACE related KASSERT()s in the !MBUFTRACE case


# 1.76 26-Feb-2003 matt

Add MBUFTRACE kernel option.
Do a little mbuf rework while here. Change all uses of MGET*(*, M_WAIT, *)
to m_get*(M_WAIT, *). These are not performance critical and making them
call m_get saves considerable space. Add m_clget analogue of MCLGET and
make corresponding change for M_WAIT uses.
Modify netinet, gem, fxp, tulip, nfs to support MBUFTRACE.
Begin to change netstat to use sysctl.


# 1.75 18-Jan-2003 thorpej

Merge the nathanw_sa branch.


Revision tags: nathanw_sa_before_merge fvdl_fs64_base gmcgarry_ctxsw_base gmcgarry_ucred_base nathanw_sa_base
# 1.74 26-Nov-2002 christos

si_ -> sel_ to avoid conflicts with siginfo.


# 1.73 25-Nov-2002 itojun

no need for error check after MEXTMALLOC - jdolecek


# 1.72 25-Nov-2002 itojun

MEXTMALLOC() can fail even if M_WAITOK, if arg is too big for malloc().


Revision tags: kqueue-aftermerge
# 1.71 23-Oct-2002 jdolecek

merge kqueue branch into -current

kqueue provides a stateful and efficient event notification framework
currently supported events include socket, file, directory, fifo,
pipe, tty and device changes, and monitoring of processes and signals

kqueue is supported by all writable filesystems in NetBSD tree
(with exception of Coda) and all device drivers supporting poll(2)

based on work done by Jonathan Lemon for FreeBSD
initial NetBSD port done by Luke Mewburn and Jason Thorpe


Revision tags: kqueue-beforemerge kqueue-base
# 1.70 04-Sep-2002 matt

Use the queue macros from <sys/queue.h> instead of referring to the queue
members directly. Use *_FOREACH whenever possible.


Revision tags: gehenna-devsw-base
# 1.69 31-May-2002 itojun

support setsockopt() with larger data (up to MCLBYTES).
From: Hitoshi Asaeda <Hitoshi.Asaeda@sophia.inria.fr>


Revision tags: netbsd-1-6-PATCH002-RELEASE netbsd-1-6-PATCH002 netbsd-1-6-PATCH002-RC4 netbsd-1-6-PATCH002-RC3 netbsd-1-6-PATCH002-RC2 netbsd-1-6-PATCH002-RC1 netbsd-1-6-PATCH001 netbsd-1-6-PATCH001-RELEASE netbsd-1-6-PATCH001-RC3 netbsd-1-6-PATCH001-RC2 netbsd-1-6-PATCH001-RC1 netbsd-1-6-RELEASE netbsd-1-6-RC3 netbsd-1-6-RC2 netbsd-1-6-RC1 netbsd-1-6-base eeh-devprop-base newlock-base ifpoll-base
# 1.68 11-Feb-2002 jdolecek

branches: 1.68.8;
Switch default for pipes to the faster John S. Dyson's implementation.
Old, socketpair-based ones are available with option PIPE_SOCKETPAIR.


Revision tags: thorpej-mips-cache-base
# 1.67 12-Nov-2001 lukem

add RCSIDs


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2
# 1.66 16-Sep-2001 wiz

branches: 1.66.2;
Spell 'occurred' with two 'r's.


Revision tags: post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.65 17-Jul-2001 jdolecek

branches: 1.65.2;
Expel MSG_COMPAT/COMPAT_OLDSOCK, make the COMPAT_43 wrappers
arrange things as needed. Unfortunately, the check in sockargs()
have to stay, since 4.3BSD bind(2), connect(2) and sendto(2) were
not versioned at the time :(

This code was tested to pass regression tests.


# 1.64 01-Jul-2001 matt

branches: 1.64.2;
Use consistent types for len. Limit sockarg length to reasonable values.


# 1.63 25-Jun-2001 jdolecek

Back off the sendit()/recvit() change, some have problems with it


# 1.62 25-Jun-2001 jdolecek

sys_connect(): fix the call to FILE_UNUSE() so that it's done on return, rather
than immediatelly after getsock() call


# 1.61 25-Jun-2001 jdolecek

Add 'kernsa' parameter for sendit()/recvit(); if nonzero, msg->msg_name
is supposed to point directly to struct mbuf or struct sockaddr in kernel
space as appropriate, rather than being a pointer to memory in userland.

This is to be used by compat/* when emulation needs to wrap
send{to|msg}(2)/recv{from|msg}(2) and modify the passed struct
sockaddr.


# 1.60 16-Jun-2001 jdolecek

Add port of high performance pipe implementation written by John S. Dyson
for FreeBSD project. Besides huge speed boost compared with socketpair-based
pipes, this implementation also uses pagable kernel memory instead of mbufs.

Significant differences to FreeBSD version:
* uses uvm_loan() facility for direct write
* async/SIGIO handling correct also for sync writer, async reader
* limits settable via sysctl, amountpipekva and nbigpipes available via sysctl
* pipes are unidirectional - this is enforced on file descriptor level
for now only, the code would be updated to take advantage of it
eventually
* uses lockmgr(9)-based locks instead of home brew variant
* scatter-gather write is handled correctly for direct write case, data
is transferred by PIPE_DIRECT_CHUNK bytes maximum, to avoid running out of kva

All FreeBSD/NetBSD specific code is within appropriate #ifdef, in preparation
to feed changes back to FreeBSD tree.

This pipe implementation is optional for now, add 'options NEW_PIPE'
to your kernel config to use it.


# 1.59 14-Jun-2001 thorpej

Fix a partial construction problem that can cause race conditions
between creation of a file descriptor and close(2) when using kernel
assisted threads. What we do is stick descriptors in the table, but
mark them as "larval". This causes essentially everything to treat
it as a non-existent descriptor, except for fdalloc(), which sees a
filled slot so that it won't (incorrectly) allocate it again. When
a descriptor is fully constructed, the code that has constructed it
marks it as "mature" (which actually clears the "larval" flag), and
things continue to work as normal.

While here, gather all the code that gets a descriptor from the table
into a fd_getfile() function, and call it, rather than having the
same (sometimes incorrect) code copied all over the place.


# 1.58 06-May-2001 manu

implement the recently introduced EMUL_BSD_ASYNCIO_PIPE emulation package
flag.

EMUL_BSD_ASYNCIO_PIPE notes that the emulated binaries expect the original
BSD pipe behavior for asynchronous I/O, which is to fire SIGIO on read() and
write(). OSes without this flag do not expect any SIGIO to be fired on
read() and write() for pipes, even when async I/O was requested. As far as
we know, the OSes that need EMUL_BSD_ASYNCIO_PIPE are NetBSD, OSF/1 and
Darwin.


Revision tags: thorpej_scsipi_beforemerge thorpej_scsipi_nbase thorpej_scsipi_base
# 1.57 27-Feb-2001 lukem

branches: 1.57.2;
convert to ANSI KNF


# 1.56 10-Dec-2000 fvdl

Make sobind() take a struct proc *. It already took curproc and
passed it down to the appropriate usrreq function, and this
allows usage for contexts that need to be explicitly different
from curproc (like in the NFS code when binding to a reserved port).


# 1.55 24-Nov-2000 jdolecek

define COMPAT_OLDSOCK unconditionally - the code is used virtually for all
emulations besides NetBSD, and this way it's LKM-safe


# 1.54 02-Aug-2000 thorpej

MALLOC()/FREE() are not to be used for variable sized allocations.


# 1.53 27-Jun-2000 mrg

remove include of <vm/vm.h>


Revision tags: netbsd-1-5-RELEASE netbsd-1-5-BETA2 netbsd-1-5-BETA netbsd-1-5-ALPHA2 netbsd-1-5-base minoura-xpg4dl-base
# 1.52 27-May-2000 sommerfeld

branches: 1.52.4;
Reduce use of curproc in several places:

- Change ktrace interface to pass in the current process, rather than
p->p_tracep, since the various ktr* function need curproc anyway.

- Add curproc as a parameter to mi_switch() since all callers had it
handy anyway.

- Add a second proc argument for inferior() since callers all had
curproc handy.

Also, miscellaneous cleanups in ktrace:

- ktrace now always uses file-based, rather than vnode-based I/O
(simplifies, increases type safety); eliminate KTRFLAG_FD & KTRFAC_FD.
Do non-blocking I/O, and yield a finite number of times when receiving
EWOULDBLOCK before giving up.

- move code duplicated between sys_fktrace and sys_ktrace into ktrace_common.

- simplify interface to ktrwrite()


# 1.51 30-Mar-2000 augustss

Get rid of register declarations.


# 1.50 23-Mar-2000 thorpej

Implement fdremove() which is used in place of all the code that
did the "fdp->fd_ofiles[fd] = 0" assignment; fdremove() make sure
the fd_freefiles hints stay in sync.

From OpenBSD.


Revision tags: chs-ubc2-newbase wrstuden-devbsize-19991221 wrstuden-devbsize-base comdex-fall-1999-base fvdl-softdep-base
# 1.49 05-Nov-1999 mycroft

branches: 1.49.2;
Fix recent bug in sys_accept(): we must remove the file descriptor from the
file descriptor table before freeing the file description.


# 1.48 30-Oct-1999 enami

back out unnecessary stylistic changes in recent changes, to keep coding
style closer to NKF.


# 1.47 27-Oct-1999 jdolecek

minor cleanup of previous - avoid goto and code duplication


# 1.46 27-Oct-1999 darrenr

patch from Greg A. Woods to fix panic problems with code that attempts to
recover from failures to accept a socket successfully. Problem suggested
by this:
> It would appear (from two "panic: closef: count < 0" failures in less
> than 12 hours) that Darren's fix to accept(2) for lost file descriptors
> isn't quite correct. His fix inserts a call to closef() to handle one
> of several possible error conditions. However everywhere else in the
> socket code in the same file where falloc() cleanup is necessary the
> function used is ffree().


Revision tags: chs-ubc2-base
# 1.45 01-Jul-1999 itojun

branches: 1.45.2; 1.45.4; 1.45.6;
IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.


# 1.44 01-Jul-1999 darrenr

fix sys_accept() to return EOPNOTSUPP for protocols which don't support
listen/accept (PR_LISTEN flag in protosw) and detect obvious faults in
parameters passed. It is still possible for the address used for copying
the socket information to become invalid between that check and the copyout
so close the connection's allocated fd if the copyout fails so that we can
return EFAULT without allocating an fd and the application not knowing about
it. Ideally we'd be able to queue the connection back up so a later accept
could retrieve it but unfortunately that's not possible.


# 1.43 05-May-1999 thorpej

Add "use counting" to file entries. When closing a file, and it's reference
count is 0, wait for use count to drain before finishing the close.

This is necessary in order for multiple processes to safely share file
descriptor tables.


# 1.42 30-Apr-1999 cgd

add checks for COMPAT_OSF1 in the appropriate places


Revision tags: netbsd-1-4-RELEASE netbsd-1-4-base
# 1.41 10-Feb-1999 kleink

branches: 1.41.2; 1.41.4; 1.41.6;
* Due to addition and use of socklen_t, make the socket option and address
arguments passed to accept(), bind(), connect(), getpeername(), getsockname(),
getsockopt(), recvfrom(), sendto() and sendmsg() unsigned, which also elimiates
a few casts.
* Reflect the (now) signedness of msg_iovlen, which necessiates the addition
of a few casts.


# 1.40 18-Dec-1998 drochner

solve the COMPAT_OLDSOCK/MSG_COMPAT problem differently:
The source files which need MSG_COMPAT define COMPAT_OLDSOCK.


Revision tags: kenh-if-detach-base
# 1.39 26-Nov-1998 mycroft

Revert the functional change in rev 1.38; permit a msg_iovlen of 0.
There are two reasons for this:
* We should be able to pass file descriptors without sending any data.
* We could send zero-length iovecs anyway (but we shouldn't have to do this).
Also, msg_iovlen is already a u_int, so delete a bunch of casts.


Revision tags: chs-ubc-base
# 1.38 04-Aug-1998 kleink

Per XNS Issue 5, calling recvmsg(2) or sendmsg(2) with an msg.msg_iovlen less
than or equal to 0 shall fail with EMSGSIZE; the latter condition was not being
checked for. Also, document the msg.msg_iovlen > {IOV_MAX} case.


# 1.37 04-Aug-1998 kleink

UIO_MAXIOV -> IOV_MAX


# 1.36 04-Aug-1998 perry

Abolition of bcopy, ovbcopy, bcmp, and bzero, phase one.
bcopy(x, y, z) -> memcpy(y, x, z)
ovbcopy(x, y, z) -> memmove(y, x, z)
bcmp(x, y, z) -> memcmp(x, y, z)
bzero(x, y) -> memset(x, 0, y)


# 1.35 03-Aug-1998 kleink

Fix two off-by-one bugs, both present in each recvmsg(2) and sendmsg(2):
* the first one would cause an unnecessary malloc() of iovec storage for
a msg_iovlen of UIO_SMALLIOV although the required amount of memory has
been allocated on the stack.
* the second one would cause a recvmsg() or sendmsg() with a msg_iovlen of
UIO_MAXIOV to fail with EMSGSIZE, which is also a violation of XNS5.


# 1.34 31-Jul-1998 perry

fix sizeofs so they comply with the KNF style guide. yes, it is pedantic.


Revision tags: eeh-paddr_t-base
# 1.33 29-Jul-1998 thorpej

branches: 1.33.2;
When checking for overflow in the residual count, test against SSIZE_MAX.
The read/write system calls return ssize_t because -1 is used to indicate
error, therefore the transfer size MUST be limited to SSIZE_MAX, otherwise
garbage can be returned to the user.

There is NO change from existing behavior here, only a more precise
definition of that the semantics are, except in the Alpha case, where
the full SSIZE_MAX transfer size can now be realized (ssize_t is 64-bit
on the Alpha).


# 1.32 18-Jul-1998 lukem

use AF_LOCAL instead of AF_UNIX


# 1.31 25-Jun-1998 thorpej

defopt KTRACE


# 1.30 25-Apr-1998 matt

Hook for 0-copy (or other optimized) sends and receives


# 1.29 01-Mar-1998 fvdl

Merge with Lite2 + local changes


# 1.28 06-Feb-1998 thorpej

When copying out multiple control messages, ensure that the next control
message is aligned. From David Borman <dab@bsdi.com>.


# 1.27 07-Jan-1998 thorpej

Make insertion and removal of sockets from the partial and incoming
connections queues O(C) rather than O(N).


# 1.26 07-Jan-1998 thorpej

Fix bug in recvit() that would cause recvmsg() to only receive one
control message, even if there were multiple control messages on
the queue. From Jean-Luc Richier <Jean-Luc.Richier@imag.fr>, in
bug report kern/4700.


Revision tags: netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base thorpej-signal-base marc-pcmcia-bp marc-pcmcia-base
# 1.25 26-Jun-1997 thorpej

branches: 1.25.8;
Use UCHAR_MAX rather than "255" when sanity-checking the length of a
sockaddr in sockargs().


# 1.24 26-Jun-1997 thorpej

In sockargs():
- Add a comment describing my feelings about this interface, in general.
- Remove the COMPAT_OLDSOCK length hack. Instead, if the socket argument
is too long to fit in an mbuf, allocate enough external storage to
hold it.
- If the socket argument is a sockaddr, don't allow the length to be
greater than 255, as that would overflow sa_len.
Many thanks to enami tsugutomo <enami@cv.sony.co.jp> for his sanity checking.


Revision tags: is-newarp-before-merge is-newarp-base
# 1.23 22-Dec-1996 cgd

* catch up with system call argument type fixups/const poisoning.
* Fix arguments to various copyin()/copyout() invocations, to avoid
gratuitous casts.
* Some KNF formatting fixes
* Change sockargs()'s second argument to be a const void *, to help
with dealing with the syscall argument type fixups/const poisoning.


# 1.22 14-Jun-1996 cgd

avoid unnecessary checks of m_get/MGET/etc.'s return values. When
they're called with M_WAIT, they are defined to never return NULL.


# 1.21 22-May-1996 mycroft

Pass a proc pointer down to the usrreq and pcbbind functions for PRU_ATTACH, PRU_BIND and
PRU_CONTROL. The usrreq interface really needs to be split up, but this will have to wait.
Remove SS_PRIV completely.


Revision tags: netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.20 17-May-1996 pk

branches: 1.20.4;
Don't touch retval[] in socketpair(); manual page says this system call
return 0 on success (PR#2428).


# 1.19 09-Feb-1996 christos

More proto fixes


# 1.18 04-Feb-1996 christos

First pass at prototyping


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.17 10-Oct-1995 mycroft

Add hooks for COMPAT_FREEBSD, from Noriyuki Soda.


# 1.16 07-Oct-1995 mycroft

Prefix names of system call implementation functions with `sys_'.


# 1.15 19-Sep-1995 thorpej

Make system calls conform to a standard prototype and bring those
prototypes into scope.


# 1.14 12-Aug-1995 mycroft

splnet --> splsoftnet


# 1.13 24-Jun-1995 christos

Extracted all of the compat_xxx routines, and created a library [libcompat]
for them. There are a few #ifdef COMPAT_XX remaining, but they are not easy
or worth eliminating (yet).


# 1.12 10-May-1995 christos

tty_tb.c: need to include ioctl_compat.h in order to compile.
sysv_shm.c: make shm_find_segment_by_shmid global so it can be used by
COMPAT_HPUX. There should be a better way...
rest: Add #ifdef COMPAT_HPUX where needed


# 1.11 05-Mar-1995 fvdl

Extended a couple of defines with "|| defined(COMPAT_LINUX)" to make
things compile without requiring COMPAT_43 and/or COMPAT_09.


# 1.10 30-Oct-1994 cgd

be more careful with types, also pull in headers where necessary.


# 1.9 20-Oct-1994 cgd

update for new syscall args description mechanism


Revision tags: netbsd-1-0-PATCH06 netbsd-1-0-PATCH05 netbsd-1-0-PATCH04 netbsd-1-0-PATCH03 netbsd-1-0-PATCH02 netbsd-1-0-PATCH1 netbsd-1-0-PATCH0 netbsd-1-0-RELEASE netbsd-1-0-base
# 1.8 29-Jun-1994 cgd

New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.7 04-May-1994 mycroft

Add return types where missing. Simplify some of the compat conditionals.
Include compat code if COMPAT_SUNOS with or without COMPAT_43.


# 1.6 18-Dec-1993 mycroft

Canonicalize all #includes.


Revision tags: magnum-base netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.5 17-Jul-1993 mycroft

branches: 1.5.4;
Finish moving struct definitions outside of function declarations.


# 1.4 27-Jun-1993 andrew

* ansifications
* Yuval Yarom's socket recv(2) fixes - access rights problems (see also
uipc_socket.c).


# 1.3 22-May-1993 cgd

add include of select.h if necessary for protos, or delete if extraneous


# 1.2 18-May-1993 cgd

make kernel select interface be one-stop shopping & clean it all up.


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision